PfSense 2.0.1 failover on NIC failure



  • Howdy.

    We have been testing scenarios with pfSense for a few weeks now and find it works great with CARP in many ways, but I am having trouble determining how to handle one particular failure.

    Currently we are only testing with two pfSense boxes but plan to layer in an additional two for IPS.

    Current testing environment:

    We have one internet feed. Two routers with CARP for redundancy. Two NICs in each router for the local network, combined using LAGG. The local network is a HA network meaning we have every machine plugged into LAN1 and LAN2 in case of a switch failure.

    Right now as it is I experience 50% packet loss whenever I unplug NIC1 on router1.

    I am using "round robin" for the LAGG interface. This design allows for the machine to receive packets on all NICs in the interface but transmits utilizing a "round robin" method between the combined NICs.

    Anyways, the machine continues to use the unplugged NIC1 in it's attempts to respond thus my 50% packet loss.

    Any ideas how to keep the LAGG interface from responding on a dead NIC?

    Seems like I may need to script something together to detect the issue, because the LAGG driver is not, and then maybe communicate between the routers to fail-over or take down all the first NICs (emulating a switch failure).

    Actually, a switch failure on round robin would produce the same results: 50% packet loss. Hrm.

    Thank you for your consideration,
    -bo
    ![Test Network.png](/public/imported_attachments/1/Test Network.png)
    ![Test Network.png_thumb](/public/imported_attachments/1/Test Network.png_thumb)



  • I know it is not the answer you are looking for but why don't you configure this with just lan switches + vlan and also do a fail over based on pfsense instead of wan lagg interface?

    |–-pfsense1(vlan100 + vlan200)
                                                          |
                    ----router1--vlan100--lan_switch1--lan(vlan200)
    internet----|
                    ----router2--vlan100--lan_switch2--lan(vlan200)
                                                          |
                                                          |--pfsense2(vlan100 + vlan200)

    This way:

    • on switch1 failure, pfsense2 will get all traffic as well router2

    • on master pfsense failure, the backup pfsense will get working

    • on router failure, the load balance rule will forward traffic to other gateway

    Of course it's just a suggestion.  ;)



  • What type of lagg are you using?



  • Wow. I never received notification of anyone replying to this thread. I am sorry I failed to reply!

    @marcelloc: i will mull over your suggestion some more and talk with my partner. not exactly sure what you are suggesting through; are you saying we should have four pfSense boxes or are pfsense1/2 also router1/2 on your ascii graph?

    our 2.0.1 pfSense machines are our routers. we only have two pfsense boxes. one is active the other is standby or whatever–my brain is fried from being up too late. ;)

    our colo facility has provided HSRP to our two pfsense routers which works great with virtual IP tech and CARP on pfSense, except for one problem i will detail in another thread (with notifications this time).

    we skipped the carp switches and just plugged our two pfsense boxes into each other directly--two NICs combined as lagg active/standby.

    @cmb: we were trying round robin, which kept dropping every other packet, so we have switched to active/standby which is fine because our provider would charge us double if we used two ports at once for bandwidth.

    EDIT: y'all feel free to change the subject to reflect the proper version "2.0.1" to avoid confusion with the upcoming 2.1 release. i cannot seem to figure out how to do this myself. :/

    -bo



  • @muffinman2030:

    are you saying we should have four pfSense boxes or are pfsense1/2 also router1/2 on your ascii graph?

    Only two pfsense boxes.



  • @cmb: we were trying round robin, which kept dropping every other packet, so we have switched to active/standby which is fine because our provider would charge us double if we used two ports at once for bandwidth.

    Some NIC don't like anything but Active/Standby. These are usually realtec. Could be others as well. Active/Standby is a better mode IMO for this situation.



  • Some NIC don't like anything but Active/Standby.

    Please tell me more about this - I'm having failover issues also, and the NIC involved is a Realtek.


    If you don't fix it the first time, use a bigger hammer.



  • This is all over the internet. Others with more experience can tell you the why, I only know that in 2 instances where realtek nics were involved, the bond (linux in this case) would only do active/passive. Probably has to do with the MAC address dynamic update that actie/active type bonds use.


Log in to reply