CARP totally working, except that it's not



  • Wits end here.

    Sorry sorry, long time reader, first time poster.  Love pfSense.  FreeNAS, too, if that makes me cooler.  Left ClearOS, Zentyal, and/or ubuntu command lines for it a while back and have never doubted that move.  Great stuff.

    So here's what I got.  Two pfsense (2.0-release, dated mid September 2011) boxes that I call router1 and router2.  A /29 block from Comcast (.169-.173 usable).

    So I read all (well, obviously not all) the tutorials, howtos, and forum posts that I could find.  I set my WAN IPs to .171 and .172 respectively.  LAN IPs to .4 and .5 respectively.  VIPs are .169 (WAN) and .1 (LAN).  VHIDs are 15 and 20, for no particular reason.  My problem also occurs when they're 1/2 and 1/3, so I don't think it matters much so far.

    When I turn everything on, the VIPs flow over from router1 (master) to router2 (backup) just as expected (+100 on the skew).  Rules and NAT also.  Everything looks peachy.  Only nobody wants to actually take the VIPs.  They seem aware of each other's presence.  If I reboot router1, router2 switches to master status almost immediately.  When router1 comes back up, it shows master status and router2 again shows backup status.  So they definitely appear to be syncing and aware of each other's online-ness.  Only nothing ever anywhere under any circumstances actually grabs the two virtual IPs and starts acting like a router on those IPs.

    Caveats.  I'm using only two NICs.  My sync interface is also my LAN interface.  Hey, all the tutorials say "strongly recommended," not "required."  They weren't built with CARP in mind and space is extremely tight.  Ummm, I have several unused serial ports available on each if there's like a null modem sync interface option somewhere.  How old school would that be?  Ooo, ooo, I hope it cares how many stop bits I'm using!  Yeah, I got your out-of-band right here.

    Both WAN ports are plugged directly into a Comcast-supplied SMC-made modem with a 4-port switch on the back.  I've heard that these (commodity CPE modems in general, not SMCs in particular) sometimes don't move CARP traffic around so well.  I have a spare Motorola modem, also with a switch on the back…I'll swap out if I have to.

    I'm happy to change either of these circumstances if it's obvious to someone else here why that's the problem.  However, I'm not inclined to start taking shots in the dark.  Adding a switch to my rack between the routers and the modem will be inconvenient.  Adding a third NIC to each will also be quite inconvenient.  I bumped up the VHIDs because I heard that occasionally an ISP will run CARP/HSRP/VRRP as well and we'll wreck each other's world.  From what I've read it seems most likely that one of those will have to happen, but if anyone has any other ideas I'd much rather find a software-based solution.  Also tried setting the direct sync to .5 instead of the multicast, no change in behavior.  Think I played with the skew a little, maybe tried turning it up to 254.  Maybe not.  It's late...

    Appreciate any pointers!



  • When you fail over to the secondary box, are you able to ping the LAN VIP?



  • No.  As far as I can tell, neither the master nor backup is taking either the LAN or WAN VIP, regardless of which is currently turned on or both are turned on.  I can ping any of the four real IPs as long as their associated boxes are on.  That is, .171 and .4 for router1, .172 and .5 for router2.



  • If you cannot ping the LAN VIP while you are failed over, then it probably is not the modem you have in place. What type of NICs do you have in the firewalls? What kind of switch do you have for your LAN?
    Try this, fail (power off) router 1. Then unplug the LAN on router 2 and plug it back up. This will test for ARP table issues. The MAC address should remain the same when it fails over. It just changes ports.

    If you don't' have a dedicated NIC, then you really want to set "pfsync Synchronize Peer IP" option.



  • Fixed it.  I was suspecting deep ARP issues, too.  Thanks for the suggestions, but it turned out to be 100% firewall irregularities on my part and I didn't realize that pfSense was quite that granular in its rule-matching.  Nice to know and it could be useful.

    And yes, podilarius, I believe I will go non-multicast on the pfsync.  I can't imagine any reason I'd want all that traffic going everywhere.  Even a few cell phones on that subnet from time to time, sooo…



  • Would you mind posting what the resulting rules issue you found and what changes you made? In case someone else runs into a similar situation.



  • Yeah, no problem.  The router had already been in production for a while and had some NAT port forwards configured, and the associated firewall rules autoconfigured.  I assumed those rules would carry right over to the CARP setup because the destination was WAN.  I went to make a new rule for some reason or another and noticed that there was a new destination choice called WAN CARP (what I had named that VIP).  When I realized the firewall was discriminating between real IPs and virtual IPs, I had my answer.  I guess I just assumed that my rules were all per-interface, but they're actually more granular than that.  Changed all my regular stuff to the CARP destination and set ICMP to pass on anything and everything worked correctly.

    I thought I'd have to do some manual outbound rules as well, but so far that doesn't appear to be necessary.  I'll have to read more about that to know for sure, though.


Locked