CARP/HA, SYNC and XMLRPC SYNC explained
What is the expected length of time for a CARP failover, given that both pfsense nodes have a dedicated failover interface with a single patch cable (no switch) between them?
There was a similar (old) thread here https://forum.pfsense.org/index.php?topic=11870.0 where it was suggested, if a switch is between the two, that the switch may prevent fast failover. Are there other factors to consider?
My test env looks like the attached layout.png. Both red and green servers are linux kvm hosts. All of the interfaces are bridged like this ..
opt1 192.168.2.x br2
iface enp1s0f1 inet manual
iface br2 inet manual
wan cable modem dhcp br1
iface enp1s0f2 inet manual
iface br1 inet manual
lan poe 192.168.1.0 br0
iface enp1s0f3 inet manual
iface br0 inet static
.. with the exception of the static configured for the "admin lan" on br0. The other (green) server is configured similarly with only one static - 192.168.1.3, with the exception of the missing wan interface (I only have one DHCP IP/cable modem). The two pfsense boxes are vms, one on each kvm host, with a vnics assigned to each host bridge interface. Both hosts also have a dedicated interface (the blue line between red and green) used for sync between pfsense, 192.168.254.1 and 192.168.254.2. These are configured in pfsense only, not on the hosts. CARP interfaces are 192.168.1.254, 192.168.2.254, 192.168.3.254, and 192.168.4.254.
To test failover I'm using two physical hosts on the 192.168.2.0/24 network, pinging 192.168.1.2 (red host static ip in the picture) during a reboot of primary pfsense. I'm seeing a loss of anywhere between 1-5% during the primary reboot :
–- 192.168.1.2 ping statistics ---
117 packets transmitted, 111 received, 5% packet loss, time 116007ms
rtt min/avg/max/mdev = 0.307/0.718/1.123/0.151 ms
The sync interface has nothing to do with failover speed. There are generally three elements to an HA pair above and beyond normal pfSense configurations:
XMLRPC Sync - this syncs configuration from primary to secondary so for instance when you add a rule to the primary, it is mirrored on the secondary. This is typically configured to sync over the dedicated sync interface. Some packages have their own sync configuration for this.
pfsync - This syncs firewall states between the node the state was created on and the other node. This flows in both directions constantly and in near real-time. This is normally configured to sync over the dedicated interface for performance and, probably more importantly, security reasons as there is no authentication in this sync protocol. The reason for this sync is so that, in the event of a failover, the secondary firewall has all the states active so user connections continue to operate. Something like a streaming video or an ssh session will not timeout and need to be reconnected by the user (though it might experience a brief period of packet loss)
CARP - This is the protocol that swings addresses from the primary to the secondary and back. This is accomplished not on the sync interface but on every interface configured for CARP. The master (usually the primary node) sends an advertisement/heartbeat about every second by default. If the backup (usually the secondary node) does not receive that heartbeat in a proper amount of time, or that heartbeat is from a node with a higher advskew than it has on its own interface, it assumes the other node has stopped (or has been demoted) and assumes the CARP MASTER status. This requires good, solid, layer 2 multicast connectivity between the two nodes on all interfaces and has absolutely nothing to do with any dedicated sync interface between the nodes.
For CARP to swing over and traffic to begin to pass through the secondary node, it generally takes a couple of seconds. It can take a bit longer for other things like OpenVPN and IPsec tunnels to come up on the secondary node.
Note that this is not intended to solve layer 2 problems. If your switches stop passing CARP traffic but both nodes see all interfaces up, things will not work properly. You need to design your layer2 redundantly if you want proper failover for router issues only. If an interface is perceived as down on the primary, it will automatically demote itself, but it has no idea there is a problem if all interfaces are up and it can send CARP advertisements on all of them. There is no way for it to know the backup node cannot receive them and there is no way for the backup node to know what is happening. All the backup node knows is it is not receiving advertisements from the primary so it takes over and you have split brain master/master.
Test your raw failover speeds by simply placing the primary/master node in persistent maintenance mode. An HA failover will essentially never be caused (or should never be caused) by manually rebooting a node. It should be placed in persistent maintenance mode first. That should yield the fastest failover time. It also keeps the node from assuming the CARP master mode until it has rebooted and all of its services have started. Then the network administrator decides when it is told to assume the MASTER role and starts passing traffic again when you see that everything is to your satisfaction.
Another reasonable test is disconnecting a link with a CARP VIP on it from the primary. That should also trigger a pretty quick failover and a recovery when it is placed online. Again, if that were my node and I saw the primary lost a link I would manually place that unit in maintenance mode, fix the problem, then take it out of maintenance mode when I was happy - but it should recover ok if the link comes back up on its own. This is generally more difficult to simulate in a virtual environment. It should also never happen in a virtual environment.
In the real world, you will use the persistent maintenance mode method far more often than anything else because you will be swinging traffic to the secondary for maintenance and upgrades far more often (hopefully) than you are actually experiencing failures.
Thanks for the excellent reply. I've retested as you suggested by entering persistent maintenance and there is no packet loss that way (perst maint, reboot, leave persist maint). I am still having a small problem with freeradius xmlrpc sync between the two but I posted that in a separate topic (see https://forum.pfsense.org/index.php?topic=135864.0).