CARP/HA with keepalived in the network!



  • hey guys....

    Having a strange issue since a while and can't figure out why/where!

    Sometime, out of no where, CARP failover to backup and back to master in the same minute but not all the vhid at the same time...

    When it happen, it screw up the network traffic for this short period... but after thinking a little bit more, I remember that in a particular VLAN, we are having 2 VMs that failover an IP using KEEPALIVED linux package using VRRP!

    Is it possible that our VRRP config in both VM, out of no where, can screw up pfSense CARP ?

    In keepalived I never never have a log or something when it happen to pfSense...


  • Rebel Alliance Developer Netgate

    CARP and VRRP are very similar. They can interfere with each other but usually that interference only happens if your VRRP VRIDs and CARP VHIDs overlap. The first thing to check is to see if you are using the same IDs there. If so, move to different IDs for one or the other.

    It's also possible that the presence of the traffic for both is triggering your L2/Switch to rate limit or block multicast (might be called "storm control")



  • thanks @jimp

    Here's a snippet from our keepalived config...

    vrrp_instance VI_1 {
        state MASTER
        interface ens18
        virtual_router_id 101
        priority 101
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass xxxxxxxxxxxx
        }
        virtual_ipaddress {
            xxxxxxxxxxxx
        }
    }
    
    vrrp_instance VI_2 {
        state MASTER
        interface ens18
        virtual_router_id 102
        priority 101
        advert_int 1
        authentication {
            auth_type PASS
            auth_pass xxxxxxxxxxxx
        }
        virtual_ipaddress {
            xxxxxxxxxxxx
        }
    }
    

    and here's our Virtual IP in pfSense (master)
    0d3f5f71-0c82-41c5-a9d3-79a794c10cfa-image.png

    and on the other pfSense, our skew is 100 with a base 1


  • Rebel Alliance Developer Netgate

    At least by ID, I would think that is safe. So you might take a closer look at your switch/L2.



  • We are having L3 switches and I can't find anything about that... I'm thinking more and more that our pfSense appliance is having a hard time with the traffic!


Log in to reply