CARP IP failover on WAN/LAN ping fail?



  • I have 2 pfsense running and each are connected to 2 separate switches for redundancy.  I guess I didn't think about this problem until I ran into it couple days ago.  I had CARP IP setup on this HA-pair Pfsense (2.3.4), if one "goes down" (originally just thought about the physical unit goes down, e.g. lost of power or HD goes bad), the "backup" pfsense would pick up the slack.

    The other day, what happen is that one of my switch went down, but it seems pfsense that was on that switch didn't switch over to "backup" pfsense.

    So it leads me to think that the pfsense will ONLY switch to the backup when the "physical" machine goes down?  which means CARP is using the "peer IP" on the SYNC port to detect "problem"?  Only when it can not ping the other unit, it will switch?

    With that in mind, is there a way where I can have the pfsense "switch" over if it can not ping like 8.8.8.8 or some IP?

    I can not find any setting for that purpose though…

    Thank you!


  • Netgate

    No. pfSense HA is router (layer 3) redundancy, not switch (layer 2) redundancy. If the link stays up, the CARP master has no idea anything is wrong and no way to know anything is wrong. You can end up with MASTER/MASTER split brain.

    Please see the post here regarding the sync port and more relevant information:

    https://forum.pfsense.org/index.php?topic=136085.msg744802#msg744802



  • thank you very much for the reply, I posted on another section (in this forum) about using LAGG.  But for my scenario, I have:

    1. 2 pfSense each with 4 port (3 used , 1 WAN 1 LAN and 1 SYNC)  Although if I go to LAGG I probably do 5 port (2 team port for WAN  and 2 team port for LAN and 1 direct link between the 2)

    2. I have 2 layer 3 switch (I think, it is a supermicro 48 port switch)

    3. so currently I have each pfsense going into a different 48 port switch.  I was hoping that when the 48 port fails (e.g. power out of physical port/cable fail) it would cause the pfsense to switch over.  Which I've tested it does.

    But the main issue now is that if the switch simply "HUNG" and the interface is up but the processor "hung" then it seems what I have doesn't work anymore.

    Even if I use LAGG and connect each one of the WAN/LAN (e.g. take one pair of LAN/WAN of the pfsense hook up to one of the supermicro and then take other pair to another), but if again, the switch hung, will the team switch over?

    It seems I really need pfsense do some sort of "monitoring" (e.g. ping) to do the switch over rather then just detect the interface up or down… is there such a fuction in pfsense?

    Thank you very much for your help!


  • Netgate

    No, that is a layer 2 failure. link up but doesn't forward traffic. That is a layer 2 problem, not a layer 3 problem. It is not the router's job to work around layer 2 problems.

    Attached is how you might design around that.

    Hint: Use switches that don't have a tendency to "hang."




  • Like Derelict says, checking for general connectivity (like pinging 8.8.8.8 ) is outside of the scope of what CARP sets out to do. CARP is primarily for handling entire host failure, rather than the failure of a single link.

    If you really need this you could probably script something to do this for you, but it's not something pfSense will do out of the box.

    The easiest way to handle the ping failure would probably be to enter maintenance mode, which can be done on the shell with pfSsh.php run enablecarpmaint and pfSsh.php run disablecarpmaint. I'd test it in a lab setup before trying to put it into production though.

    EDIT: Fixed formatting typo.



  • Thank you guys for the clarification!