Netgate SG-3100 - LAN fail not recovering on secondary appliance
I discovered an interesting problem on Netgate SG-3100. I would like to share my experience and maybe help someone with this problem.
I start by describing our initial network infrastructure, before solving the problem. We have 2 Netgate SG-3100 appliances (primary and secondary), WAN is provided by our datacenter with two ethernet ports on top of rack, each port is connected on the first port of each appliance (WAN mvneta2). Then we used the OPT1 port (mvneta0) to connect the two appliances with a direct ethernet cable in order to provide pfSense Sync. Last thing, we connected the LAN port 1 of each appliance with two separated switch; the 2 switches are interconnected with direct ethernet cables in order to have a ring. After performing all the configuration, in order to have CARP up and running, everything was working as expected, apparently. At this point we started to test failover capabilities in various scenarios. We tried to disconnect WAN cable on primary unit and the secondary unit came up as expected (with all interfaces in master), so we reconnected the WAN cable and secondary unit regained backup role. We started to have problems when we disconnected the LAN cable, secondary unit came up only with LAN interface in master mode, WAN was already in backup on secondary unit. In this case we completely lost internet connection because the WAN VIP did not migrate on the secondary as master.
The solution is really simple, but required a lot of "debugging". We swapped LAN with SYNC interface on pfSense webgui and also swapped physical ethernet ports, so the SYNC is now connected on LAN1 port on both appliances with a crossover cable, and LAN is connected on OPT1 port (mvneta0). With this schema CARP is working as expected on ALL interfaces.
The reason? I don't know for sure, maybe because the 4 LAN ports are connected with an internal uplink to the appliance, and that uplink is never brought down also if all external LAN ports are unplugged (this way pfSense never perceives all interfaces down?).
Attached docs show Netgate ports schema.