Packet filter fixes CARP echoes?
As per my setup I was expecting to get into trouble with CARP multicast echoing back onto the master. In fact, when I turn of the packet filter (System->Advanced->Firewall/NAT->Disable all packet filtering) I can see the symptoms appear. The backup router stays in backup mode and the primary router goes nuts switching from master to backup. Now this is expected as the multicast packets saying "I'm alive and well" are being echoed back onto me. This happens because my VM is hooked in a promiscuous port group that also has redundant physical up-links onto different interconnected switches. So the multicast packets go up the tree all the way up to the switch crosslinks after which they comes down again trough another interface, back into my VM. This causes the master router to receive it's own packet saying "I'm alive". Since that packet is also on zero skew it immediately makes room for, uhm well, itself. Final result: Master -> Backup -> Master -> Backup -> Etc… at the precise interval of my set base value.
Now I was already working my way down deciding if it were better to hack my way around VMware's Net.ReversePathFwdCheckPromisc setting (it tends to get reset if you change other VDS stuff). Or to just buy switches capable of teaming the server NICs across the different switches. (Distributed Trunking / Distributed EtherChannel / or whatever other vendors call it).
Until I noticed something strange: Enabling the firewall filters corrected the problem. Now, nowhere can I find information on this "fix". Only more people with my now no longer renaming problem. After further investigation I can conclude I have no special rules. Even removing all the rules and placing * * ALLOW ALL doesn't break it. I can ping all the virtual CARP ip's, internal and external. Even the aliases are working.
Now you may ask why I don't just count my blesses and enjoy the system. It's simple: I don't understand why this is working; and that makes me nervous. Let alone pushing this into production.
Anybody any clues as to why this works while it shouldn't?
All responses are highly welcome.
2x pfsense 2.0.1-release-p6 (amd64).
2x vSphere 5.0 1G RAM, 10G DISK, 3x NIC (WAN,LAN,SYNC) VMs.
2x ProCurve 2824 switches.
4x Gigabit port crosslink trunk. (Also VLAN Trunk)
Dedicated VDS portgroups for WAN, SYNC and LAN.
Port groups have promiscuous mode enabled.
Port groups have MAC Address changes enabled.
Port groups have Forged Transmits enabled.
Dedicated VLAN for LAN.
Dedicated VLAN for SYNC.
Floating ALLOW ALL quick rule for all ICMP traffic.
A big question mark in my head.
And if you have pf on, our rules filter CARP out from (self) so it can't see the reflections.
Thank you for your quick reply. I am aware of the Net.ReversePathFwdCheckPromisc setting and have implemented it successfully. Although in the future we are going to buy new switches supporting multi-switch lagg as we have experienced that settings like ReversePathFwdCheckPromisc tend to get reset with patches, changes in config, or other upgrades.
My question was how, without above setting in place, my setup could have worked properly. It clearly shouldn't have, but it did. In the meantime however our engineers have come up with an answer. Apparently we had IGMP Snooping enabled on our physical switches. This contained the CARP multicast to only those interfaces that had previously been used to send multicast from. Said differently the "I'm alive" echo was blocked by our physical switches by IGMP snooping. Only after VMotioning the pfsense boxes around and thus forcing it to use a different physical NIC on the VDS, did we see problems. What happens is that IGMP Snooping still has the previous NIC in its table causing the multicast packet to be echoed back trough another leg.
In summary, we were experiencing the echo problem as expected. It's just that IGMP Snooping fixed/hid it until ESXi decided to use another real NIC. (we use balancing based on physical NIC load).
By the way, au contraire to my understanding IGMP Snooping didn't break our CARP setup entirely. Is this because our (stupid) switches join any NIC sending multicasts to the snooping table? Or is there a CARP aware IGMP client running inside pfsense somewhere?
There wouldn't have been anything for carp+igmp on pfSense, it must have been something the switches were doing.
I've seen some switches do some really funky things with multicast in the past, that doesn't surprise me at all…
Wouldn't it be prudent for pfsense to have a CARPaware IGMP client? So it can correctly register its multicast membership with the local switch? This would allow CARP on pfsense to become compatible with IGMP Snooping. It might also lift the requirement on ESXi to set the VDS port group in promiscuous mode. I know Windows 2k8 R2 NLB (Network Load Balancer) uses multicast together with an IGMP client, and that runs just fine without compromising security in promiscuous mode.
Maybe a feature request?
Thanks for sharing your thoughts!