Carp vip master not self-pingable, arp replies don't use vip lladdr?
-
Test setup:
Using two boxes running 2.0 RC3 on the lan side plugged into a hub with just one other pc, the pfsense boxes use carp to be the .1 gateway. Native Id's .2 and .3
And on the wan side the two pfsense boxes are plugged into a cable modem with 3 static routable ips, 1/ea per pfsense box and one carp . Ids .250 .251 carp for .252 on a /29 subnet, gateway at .249.
Separate cards - crossover cable for pfsync.
Weird Puzzle 1:
Each pfsense box gets ping replies from its own interface id's on both the wan and lan. Good. Each pfsense box gets ping replies from the other box's interface id. Good. The lan carp vip is pingable from all devices on the lan side – but not the pfsense box that is the vip master using the lan interface. The wan carp vip is pingable from everywhere on the internet, and the backup pfsense box, but not the vip master using the wan interface. What am I missing?
Weird Puzzle 2:
Using 'arping' -- when boxes not the vip master do an 'arping' for the vip-- the reply from the vip master comes with link address or lladdr, not the 00:00:5E:00:01:xx for VHID xx but with the interface's native address. This makes using carp for failover impossible when the cable ISP's box requires a different MAC address for each static ip. (i.e. mediacom...).
Weird Puzzle 3 (like1): arping a carp/vip ipv4 address on the box that is the vip master gets no replies, though all other boxes on the same link gets replies (not with the carp vip link address, but with the interface address -- see problem 2 above). No surprise I suppose but arping of 00:00:5E:00:01:xx gets no replies at all.
-
In the past such problems have been known to be caused by shoddy/low quality switches, such as those found on the back of modems. Using a real switch, even a small one, tends to make things smoother.
On a working VM cluster I have here, I can ping all of the master CARP VIPs from the boxes themselves.
-
Thanks for the reply. The switch involved doesn't seem to matter. The test jig uses a netgear 1000M capable 5 port switch on the lan side, connecting the two routers (192…2 and 192...3) carp sharing 192..1. Whichever of them is the carp master can't ping .1, while all other systems can. The system hosting as the carp master .1 can ping its own native address, but not .1
It seems the actual mac address on the outbound frames routed via the carp interface is not the carp lladdr, but the interface native address. So, the mediacom cable modem, which must bind to a specific mac address, won't bind to the carp llaaddr (which is VHID specific). This makes PFSense useless in a failover router setup with that ISP.
I've 'worked around' the problem by setting up a third little pfsense box with just two ports, acting as an extension of the cable modem, can't really use 1-1 nat, so I just port forward what little I need. It's a single point of failure, but then so is the cable modem and it still only risks that ISP's connection, the others are still protected by the failover router pair.
Still, if CARP could be improved to use the CARP logical link address when transmitting packets sent out the carp interface, and not the interface's native MAC address, then I could avoid maintaining an extra router and dealing with NAT issues on what should have been a native connection to the net.