CARP both becoming master on a subnet
I am having a pretty weird issue and I really appreciate any help.
My issue is "similar" to this one:
Here is what happens:
I have a DMZ also in HA, sometimes both boxes become MASTER for the gateway VIP and therefore network "goes down", in order to fix this we have to disconnect the wifi router from the switch, once we do it, all goes back to normal.
Here is my environment:
2 virtual PFsense on Hyper-v, each one in one host.
each pfsense has 5 network interfaces:
WAN2 - NOT IN USE YET
I have CARP VIPs for the WAN, LAN and DMZ, but the issue only happens on the DMZ network
The connectivity is as follows:
INTERNET LINK -> dumb 8 port switch -> 2 CABLES from this switch connecting to the hyper-v host port dedicated to the WAN interface of each pfsense
LAN -> connected to 2 managed switches that are the network lan. The hyper-v host has a nic teaming of 2 network interfaces. The managed switches have no configuration in place (are "dumb now")
SYNC -> also connected to the lan switches, those are separated network cards, but they also share the network traffic with hyper-v replication and hype-v host management. I am willing to put the sync on a vlan, still using the same cable.
DMZ -> one dedicated network card on each hyper-v host -> each one connected to another 8 port dumb switch. On this switch we also connect the wifi router.
So the other VIPs are not affected, but the DMZ carp vip get lost. Any recommendation on the scenario is very very welcome, any ideas are welcome as well.
I have network captures (what should I look for?) during the issue as well as images of the system logs (last time only box 2 logged events).
Thanks in advance! This is driving me crazy.
Cases like this are almost certainly the switch/vswitch.
The backup can only assume the master role if it stops seeing the advertisements from the primary, or if they arrive too slowly. If both systems are online then the only thing that can do that is the switch between them on that segment.
So that would be the 8 port switch on the DMZ? I thought they would consider the sync lan for this.
Also: why does it only happen when I put the wifi router on it? Is there a way to find the root cause? with the network captures for example..
CARP heartbeats to determine master status happen on the interface where the VIPs reside. CARP has nothing to do with the sync interface, which is only for XMLRPC and pfsync (state sync) traffic.
Beyond that it's hard to tell what it might be, it all depends on your L2 gear and setup. Something on the wifi router might be tripping a security measure in the switch, like broadcast or multicast storm control, which causes the CARP heartbeats to be lost. That's just one possible scenario.
Thanks, I will continue my investigation, if I have any further information or questions I will get back to this topic.