CARP dual Master for short period
I've been running a system with two dedicated XEON Systems in a CARP env (2.4.5-RELEASE-p1)
There are X710 Intel cards 10G with optical connections to our upstream device, two Nokia 7750 who acts as our gateway to the Internet. They do MLAGG. They run VRRP which is nasty because they use the same protocol ID for VRRP as CARP is using, thank you IANA.
ixl0 and ixl2 build a LAGG (LACP) while there are plenty of VLANs being tagged on it.
The system is running fine for months now but yesterday there was an incident where the slave pfSense became Master for a few seconds on some of the VLAN interfaces.
As I was searching for the cause I found that on the Master device (fw2-rx):
Sep 20 13:20:57 fw2-rx filterlog: 53,,,1000000201,lagg0.217,match,block,in,6,0xe0,0x00000,255,VRRP,112,36,fe80::3efd:feff:fee8:fecc,ff02::12,
Between 13:20:57 and 13:21:40 there were 172513 of those blockings. The funny thing is the source address fe80::3efd:feff:fee8:fecc belongs to the master itself, so the master blocked the IPv6 multicasts he himself was sending.
In between the slave (fw3-rx) choose to become Master for some VLAN interfaces, example:
Sep 20 13:21:16 fw3-rx check_reload_status: Carp master event Sep 20 13:21:16 fw3-rx kernel: carp: firstname.lastname@example.org: BACKUP -> MASTER (master timed out) Sep 20 13:21:17 fw3-rx kernel: carp: email@example.com: MASTER -> BACKUP (more frequent advertisement received)
The situation went back to normal in the same minute but questions arise:
- Why were there so many multicasts (172513 in less than a minute)?
- Why were these multicasts being blocked? Right now I also see the same multicasts sent and they're not being blocked but there is only about one packet a second.
As it turned out there was a loop on an interface which caused that behavior, sad but true.