Manual failover of keepalived

0 Flares Twitter 0 Facebook 0 LinkedIn 0 Email -- 0 Flares ×

During my tests with keepalived as a balancer for a Linux cluster, I was searching for a way to quickly simulate a node failure and to check keepalived was correctly failing over to the other node. Here is a quick and smart way to do it!

Dummy!

Keepalived can track a service or a network connection, and when one of these resources fails, it starts the failover. The problem during a test phase is quite obvious: in a test scenario, you do not really want to crash a service on purpose or disconnect a network connection to test the failover; you still want to keep the ssh connection open to monitor both nodes for example, and still see the failover happening.

Keepalived does not have a “manual” failover command, but I’ve found a way to do it. Kudos to my friend PJ Spagnolatti, one of his posts in the keepalived mailing list (back in 2001!!!) was a great help to achieve this, plus a couple emails I exchanged with him. The “trick” is really nice: we will load a fake network interface, and by failing it over, we will start the failover. Linux has a network interface called exactly “dummy”, designed for such needs! How cool!

First, you need to load dummy in the kernel:

Then, you configure dummy0 to be up at boot:

Once the device is “up and running” on both keepalived nodes, you add the network interface as a resource to be monitored. I’m posting here my complete keepalived.conf configuration:

As you can read in this configuration file, the “real” monitoring happens against the sshd service and the ens160 interface (this is the new way of systemd in CentOS 7 to name what once was eth0 when it’s a VMware virtual interface…). when anything happens to one of these two resources, the virtual IP 10.2.50.160 is no more published on this node, and the failover happens towards the other node (10.2.50.162 is the real IP of the second node, the rest of the configuration file is exactly the same).

But, by simply adding dummy0 in the track_interface section, a manual failover is as simple as running in the command line:

dummy0 is usually in state unknown:

when we take down the dummy interface, it goes into state down:

and the failover starts. Remember to bring back the interface into the initial state after the failover, what will happen depends on the keepalived configuration: in my case I configured “nopreempt” which disable the failback to the master node, so even if I bring dummy0 back online on the master node, the virtual IP stays into the secondary node.

Once you’ve finished your tests, you can either decide to remove dummy0 from keepalived configuration, or keep it and use it as a way to run manual failovers when needed!