Troubles getting CARP working with ESXi 5.5



  • I have been working on this problem for a few days now and I am not sure what the problem is.

    Here is my setup with diagrams:

    I have 2 ESXi 5.5 servers running on top of 2 HP DL360g5 servers
    They each have 2 onboard Broadcom NICs, 1 4-Port Intel NIC, and 1 2-Port Intel NIC for a total of 8 NICs

    http://imgur.com/a/jM2im

    I have loaded the pfSense 2.2.2 Virtual appliance on each Host and have broken out the NIC cards into 4 vSwitches

    vSwitch0 - Internet/Metro-E - Basically my WAN connection -CARP working as intended
    vSwitch1 - Servers - Server Network - CARP Working as intended
    vSwitch2 - Computers/LAN - Network for our Corporate Office LAN, 4 Vlans - CARP not working
    vSwitch3 - Sync/Failover - Network for failover Sync - Directly connected via crossover cable.

    http://imgur.com/a/W67SH

    Here are the screen shots of the Network setup for each vSwitch on each ESXi server

    Host1:
    http://imgur.com/a/4wvXw

    Host2:
    http://imgur.com/a/quuzP

    Here is the physical Wiring Layout:

    http://imgur.com/N528M6I

    What I am trying to accomplish is redundancy in case of a network switch failure, the rest of the devices on the remaining switch will continue to work.  With each vSwitch connected to a port on each switch, I can lose one switch and still have connectivity to the CARP Virtual IP regardless of which firewall is acting master.  Having the firewalls on separate hosts gives up hardware redundancy in case of server hardware failure.

    The problem I have is that CARP on the Computers/LAN network only works when I have 1 interface of the vSwitch2 plugged into each switch.  When I plug them in redundantly, CARP breaks and both show as  "Backup"

    The odd part is that, as my screen shots show, the Servers and Internet/Metro-E are wired the same, and they  have CARP working properly.

    FW1 CARP Status
    http://imgur.com/FDwmtPn

    FW2 CARP Status
    http://imgur.com/W8MnYuD

    Carp Settings:

    FW1:
    http://imgur.com/hyIF30Z

    FW2:
    http://imgur.com/OM5mmIS

    I know that the CARP recommended settings are for each vSwitch to use IP address hash instead of MAC address hash, but I was unable to get CARP working on the Metro-E and Server networks with my current setup with IP address hash.

    What I have attempted to do to resolve:

    1. Net.ReversePathFwdCheckPromisc and set to 1 (this resolved the problem for the WAN/Metro-E and Servers networks, after a reboot of each Virtual Host)
    2. Toggled Promiscuous mode to Reject and then back to Allow for each vSwitch, and then rebooted each Virtual Host
    3. Found the most basic/dumb network switch and placed it in place of the two Computer Network Switches to rule out any blocking of multicast traffic.
    4. Verified that I the MAC addresses for each connection to my VM matched the MAC address reported in pfSense to rule out interface mismatches.
    5. I am able to ping the addresses of the Computer Network facing interfaces on each firewall.
    6. Verified that pfsync and XMLPRC Sync are working by adding a firewall policy which synced properly.
    7. Removed VLANs bound to my Computers Network.  Ill add those back later if I get this figured out.

    Any guidance would be greatly appreciated.



  • I also noticed this in the log file this morning, repeating every 3 seconds:


    Jun 29 11:35:42 check_reload_status: Carp backup event
    Jun 29 11:35:42 kernel: carp: VHID 5@vmx2: MASTER -> BACKUP (more frequent advertisement received)
    Jun 29 11:35:42 kernel: carp: VHID 5@vmx2: BACKUP -> MASTER (master down)
    Jun 29 11:35:42 check_reload_status: Carp master event
    Jun 29 11:35:40 php-fpm[67391]: /rc.carpbackup: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "BACKUP" for vhid 5@vmx2
    Jun 29 11:35:40 php-fpm[67391]: /rc.carpmaster: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "MASTER" for vhid 5@vmx2
    Jun 29 11:35:39 check_reload_status: Carp backup event
    Jun 29 11:35:39 kernel: carp: VHID 5@vmx2: MASTER -> BACKUP (more frequent advertisement received)
    Jun 29 11:35:39 kernel: carp: VHID 5@vmx2: BACKUP -> MASTER (master down)
    Jun 29 11:35:39 check_reload_status: Carp master event
    Jun 29 11:35:37 php-fpm[67391]: /rc.carpbackup: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "BACKUP" for vhid 5@vmx2
    Jun 29 11:35:37 php-fpm[67391]: /rc.carpmaster: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "MASTER" for vhid 5@vmx2
    Jun 29 11:35:36 check_reload_status: Carp backup event
    Jun 29 11:35:36 kernel: carp: VHID 5@vmx2: MASTER -> BACKUP (more frequent advertisement received)
    Jun 29 11:35:36 kernel: carp: VHID 5@vmx2: BACKUP -> MASTER (master down)
    Jun 29 11:35:36 check_reload_status: Carp master event
    Jun 29 11:35:34 php-fpm[64500]: /rc.carpbackup: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "BACKUP" for vhid 5@vmx2
    Jun 29 11:35:34 php-fpm[64500]: /rc.carpmaster: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "MASTER" for vhid 5@vmx2
    Jun 29 11:35:33 check_reload_status: Carp backup event
    Jun 29 11:35:33 kernel: carp: VHID 5@vmx2: MASTER -> BACKUP (more frequent advertisement received)
    Jun 29 11:35:33 kernel: carp: VHID 5@vmx2: BACKUP -> MASTER (master down)
    Jun 29 11:35:33 check_reload_status: Carp master event
    Jun 29 11:35:31 php-fpm[64500]: /rc.carpbackup: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "BACKUP" for vhid 5@vmx2
    Jun 29 11:35:31 php-fpm[64500]: /rc.carpmaster: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "MASTER" for vhid 5@vmx2
    Jun 29 11:35:30 check_reload_status: Carp backup event
    Jun 29 11:35:30 kernel: carp: VHID 5@vmx2: MASTER -> BACKUP (more frequent advertisement received)
    Jun 29 11:35:30 kernel: carp: VHID 5@vmx2: BACKUP -> MASTER (master down)
    Jun 29 11:35:30 check_reload_status: Carp master event
    Jun 29 11:35:28 php-fpm[64500]: /rc.carpbackup: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "BACKUP" for vhid 5@vmx2
    Jun 29 11:35:28 php-fpm[64500]: /rc.carpmaster: Carp cluster member "10.9.1.1 - ComputersVIP (5@vmx2)" has resumed the state "MASTER" for vhid 5@vmx2



  • Where it cycles endlessly like it looks to be doing, that's what happens when something is looping multicast traffic. In the case of VMware, that's: https://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting#Changing_Net.ReversePathFwdCheckPromisc



  • If I set this on each Virtual Host, wouldn't that apply to all of the vSwitches?

    Like I said, this only looks to be happening on one of my CARPs.



  • Yeah that should apply to all vswitches on the host. Is it all the CARP VIPs on one interface, or just one on that interface that has others that work fine?