[SOLVED] CARP not failing over all links
-
I have set up two fresh installs of pfSense 2.3.4 set up as an HA cluster with CARP handling the interface failover. We have a /30 public IP so I've used dummy private addresses for the interface address and set up the /30 address as the VIP. I have set up a dedicated interface between the two firewalls for pfsync connected with a crossover cable.
When all is up and running and all interfaces connected then FW01 shows as Master for both LAN and WAN and FW02 shows as Backup for WAN and LAN. If I manually failover by unplugging the power from FW01 or I temporarily disable CARP on FW01 then the failover happens as expected.
However, if I unplug the WAN or LAN only the unplugged interface fails over. For example, if I unplug WAN on FW01 then FW02 becomes Master for WAN but is still in Backup for the LAN (FW01 still shows as Master for LAN). Obviously this means that routing breaks as the clients on the LAN side are still trying to route via FW01.
I've confirmed that PFSYNC is syncing states correctly and the configuration is being replicated from FW01 to FW02. In the syslog (below) it appears the demotion happens correctly and that the LAN does fail over but immediately resumes the Master role on FW01.
Can anyone shed any light on what may be going wrong?
Each firewall has the following NIC configurations:
FW01
em0: LAN: 192.168.25.28/27
em1: WAN: 192.168.100.1/29
bge0: PFSYNC: 192.168.101.1/29FW02
em0: LAN: 192.168.25.29/27
em1: WAN: 192.168.100.2/29
bge0: PFSYNC: 192.168.101.2/29Both firewalls share the following CARP VIP's:
LAN: 192.168.25.30/27 VHID:100
WAN: publicip/30 VHID:99The syslog shows the following when the WAN interface is disconnected:
May 12 16:03:36 check_reload_status Carp backup event
May 12 16:03:36 kernel carp: demoted by 240 to 240 (interface down)
May 12 16:03:36 kernel em0: link state changed to DOWN
May 12 16:03:36 kernel carp: VHID 100@em1: MASTER -> BACKUP (more frequent advertisement received)
May 12 16:03:36 check_reload_status Linkup starting em0
May 12 16:03:36 check_reload_status Carp backup event
May 12 16:03:37 php-fpm 4958 /rc.carpbackup: HA cluster member "(116.212.222.66@em0): (WAN)" has resumed CARP state "BACKUP" for vhid 99
May 12 16:03:37 php-fpm 4958 /rc.linkup: DEVD Ethernet detached event for wan
May 12 16:03:37 kernel carp: demoted by -240 to 0 (vhid removed)
May 12 16:03:37 kernel em0: promiscuous mode disabled
May 12 16:03:37 check_reload_status Carp master event
May 12 16:03:37 kernel carp: VHID 100@em1: BACKUP -> MASTER (preempting a slower master)
May 12 16:03:38 php-fpm 42248 /rc.carpbackup: HA cluster member "(192.168.25.30@em1): (LAN)" has resumed CARP state "BACKUP" for vhid 100
May 12 16:03:38 php-fpm 4958 /rc.linkup: Shutting down Router Advertisment daemon cleanly
May 12 16:03:38 check_reload_status Reloading filter
May 12 16:03:38 php-fpm 42248 /rc.carpmaster: HA cluster member "(192.168.25.30@em1): (LAN)" has resumed CARP state "MASTER" for vhid 100
May 12 16:04:09 check_reload_status Linkup starting em0
May 12 16:04:09 kernel em0: link state changed to UP
May 12 16:04:10 php-fpm 42248 /rc.linkup: DEVD Ethernet attached event for wan
May 12 16:04:10 php-fpm 42248 /rc.linkup: HOTPLUG: Configuring interface wan
May 12 16:04:10 php-fpm 42248 /rc.linkup: Accept router advertisements on interface em0
May 12 16:04:10 check_reload_status Carp backup event
May 12 16:04:10 kernel em0: promiscuous mode enabled
May 12 16:04:10 kernel carp: VHID 99@em0: INIT -> BACKUP
May 12 16:04:11 php-fpm 42248 /rc.linkup: waiting for pfsync…
May 12 16:04:11 php-fpm 57358 /rc.carpbackup: HA cluster member "(116.212.222.66@em0): (WAN)" has resumed CARP state "BACKUP" for vhid 99
May 12 16:04:12 check_reload_status Carp master event
May 12 16:04:12 kernel carp: VHID 99@em0: BACKUP -> MASTER (preempting a slower master)
May 12 16:04:13 php-fpm 57358 /rc.carpmaster: HA cluster member "(116.212.222.66@em0): (WAN)" has resumed CARP state "MASTER" for vhid 99
May 12 16:04:42 php-fpm 42248 /rc.linkup: pfsync done in 30 seconds.
May 12 16:04:42 php-fpm 42248 /rc.linkup: Configuring CARP settings finalize...
May 12 16:04:42 php-fpm 42248 /rc.linkup: ROUTING: setting default route to 116.212.222.65
May 12 16:04:42 check_reload_status Restarting ipsec tunnels
May 12 16:04:45 check_reload_status updating dyndns wan
May 12 16:04:45 check_reload_status Reloading filter -
Found the issue, PEBKAC. The LAN interfaces had inconsistent IPv6 settings (one was set to DHCP6 and the other to None). After setting them both to None the CARP failover works as expected.