Debug CARP backup promoting itself?
-
I have two Netgate XG-1541 boxes running 2.4.4-RELEASE-p3 (amd64) (The factory installed image) and have several interfaces with CARP.
All of the LAN facing interfaces are on the same 10GB port with dot1q vlans.
The "main" LAN connection, which is the untagged default vlan keeps timing out on the backup server so both will go "MASTER" in the carp table.
None of the tagged vlans have this problem.
Checking both boxes and the switch there are not errors/drops listed.
All CARP IPs are set Advertising Frequency 1 and on the main pfSense box, skew is 0, on the backup, skew is 100.
I'm not sure if switching the default LAN to a tagged vlan would change this so I'm hoping to do some debugging first.
Using tcpdump -i ix1 -ttt -n proto CARP
I can see announcements from both systems on the primary pfsense:
00:00:00.224205 IP 10.1.0.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36
00:00:00.777324 IP 10.1.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
00:00:00.623024 IP 10.1.0.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36
00:00:00.377403 IP 10.1.0.251 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36But on the secondary I only see:
00:00:01.438838 IP 10.1.0.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36
00:00:01.432539 IP 10.1.0.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36
00:00:01.395071 IP 10.1.0.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36
00:00:01.431747 IP 10.1.0.252 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36When I check the VLAN interfaces (ix0.400 for example) I see only the announcements from the primary firewall.
I've verified on the switch that both of the ports connected to the firewalls are in multicast group 224.0.0.18 I have noticed that the "Sender" is listed in the CLI output of my extreme switches as 10.1.0.252 (the IP of the backup firewall)
Any ideas on where to continue debugging this?
[Quick edit] I restarted the secondary box and for the moment things seem to be fine but I'll keep an eye on it and update this post as needed.
Many Thanks!
-
This popped up again last night. The BACKUP promoted itself into MASTER while the MASTER remained MASTER.
Same situation, master sees announcements from both firewalls and backup only sees it's own.
Any hits?
Thanks!
-
That can pretty much only be a layer 2 issue. Investigate your switch. Especially if it has any "smart" multicast or broadcast features like storm control.
-
@jimp said in Debug CARP backup promoting itself?:
That can pretty much only be a layer 2 issue. Investigate your switch. Especially if it has any "smart" multicast or broadcast features like storm control.
Thanks
I found some debugging steps for my switches (Extreme networks) and ended up turning off IGMP snooping and it started to work.
IGMP snooping was enabled on all VLANS but only one was having problems. Go figure, but at least we're back in business.
Thanks!