HA setup is flapping between primary and backup devices
-
I setup two Sophos XG210 devices to be my primary and backup router/firewall. These devices are identical, and the hardware is working (all ports are functional, etc.). I have setup CARP and HA correctly and the 1:1 NAT for the (ISP Static IP address assignment) WAN addresses are working. Similar setup on the LAN side: for each router/firewall, the LAN ports are connected to ports 40 and 41 on a 2-stack Dell Force10 S55T switch. The switch holds all the VLANs and the router takes care of the inter-VLAN traffic. Both routers are PFSYNCing without any issues.
However, the HA setup (see attached screenshot) keeps flapping between the primary and backup routers. When the switch to the backup router happens, I lose all connectivity to the internet and VLANs. This is becoming a very annoying issue as everything stops! I looked at the configuration on both routers/firewalls, switch, Internet access and it all looks good. What I think is happening is that somehow the heartbeat between the two routers is skipping a few beats and it drops the kicks the switch to the backup router only on the LAN interface. That is about the only thing I noticed that might be a red flag: only the LAN interface switches to the backup router and, when it happens, all connectivity drops.
More on my setup:
Dell Force10 S55T (2 stacked)
VLAN101, VLAN201, VLAN301, VLAN401, VLAN501, VLAN601
Trunk Ports set to interfaces 0/40-43
VLAN101 holds all my servers (AD, DHCP, DNS, FS, Wireless Controller, etc.)Sophos XG210 Rev. 3 (2 identical devices in HA configuration)
Port igb1 set as WAN
Port igb0 set as LAN (also routing all inter-VLAN traffic)
Port igb5 set as the SYNC interfacesI setup both a virtual and physical lab with the same hardware (only one Force10 S55T switch) and the configuration works without any issues. No flapping or unexpected switches between primary and backup. Has anyone encountered this issue before? I would appreciate any assistance with this issue.

-
So I disconnected the backup device and my network is back to normal (even though I haven't removed the CARP and HA settings yet). Just for the sake of testing, I configured two identical Steelheads CX770s with Opnsense and got the same results as with pfSense. I get the same results with two sets of completely different hardware! How can this be possible?!
I thought it was the connection to the switch (since both firewalls connect to the same stack) but as soon as I remove the backup unit from the HA setup, all network connectivity is restored.
Has anyone here encountered this problem before?
Martin M. Mune
US Army Combat Veteran
Operation Iraqi FreedomVolunteer Soldier
International Legion for the Defense of UkraineСлава Україні!
Героям Слава!