CARP didnt failover after wan became offline

Mr_JinX

Hi,

Does anyone know why my setup didnt fail over to the other node in the HA cluster when the WAN became offline,

I failed it over manually, and everything went fine, restarted the primary node and the gateway came back online.
I thought there was some algorithm that would fail over the VIP's to another node of the GW was down?

Any ideas?

Derelict

CARP/HA does not failover on a Gateway Failure like that. Multi-WAN does.

HA would fail over on a down interface.

You should have the same WANs on both cluster nodes so a failure on one should also be a failure on the other. If that is not the case you need to figure out why the WAN is failing on that node and fix it.

Mr_JinX

Hi,

Thank you for your reply, turns out I put a /32 in the CARP VIP, not a /24 (which is what the network is) hence why it didn't failover to the 2nd node. ( I've tested this in a lab and can confirm the same symptoms in the scenario)

My only concern is that should one of the interfaces on the Vswitch stop forwarding traffic (so technically not a link down state) is there a way the process could check to see if the other node in the cluster can ping the same gateway and if it can, cause a complete failover to the secondary node, or even just failover blindly.

On physical hardware i don't think this would be an issue unless you experience a UDLD issue on fiber as an example, however, on virtual instances, there's a Vswitch to contend with before a physical switch.

Is there any way to cause a full failover if the gateway becomes offline on the primary? rather then just the WAN interface failing over to the secondary, leaving the LAN on the primary.

*you may be asking why a link would stop forwarding traffic, well on our setup I *think there's an issue with the drives its using, its setup to use the realtek drives, however, it's on a QEMU KVM host, I have changed to now use VIRTIO drives as we could only get about 3 days worth of uptime before traffic would stop flowing, I changed it on one interface to virtio and the issue hasn't come back since, so I've changed it on all the interfaces, fingers crossed we can get more than 3 day uptime.

Derelict

pfSense HA/CARP is a Layer 3 failover system designed to failover in most router failure scenarios. Your vSwitch not passing traffic but having link up would be a layer 2 failure. You would need to build in redundancy at layer 2 for that.