Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    CARP on 2.2.1, VMWare 5.5 with dvS

    HA/CARP/VIPs
    4
    12
    1676
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H
      hphan082 last edited by

      hi everyone,
      I'm running into this old issue where both firewall are stucked in Backup status.
      I followed https://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting#VMware_ESX.2FESXi_Users, and here: https://forum.pfsense.org/index.php?topic=64022.0

      We use dvS, and we already enable the advance setting. I have promiscuous enabled on LAN and WAN port-group, but not the sync port-group.

      My firewalls both still stuck in Backup.

      Below is a short capture of my log in the master.
      Mar 26 17:17:42 php-fpm[58491]: /rc.carpbackup: Carp cluster member "10.22.2.254 - daas.management.vip (2@em0_vlan2202)" has resumed the state "BACKUP" for vhid 2@em0_vlan2202
      Mar 26 17:17:42 php-fpm[57375]: /rc.carpbackup: Carp cluster member "10.22.5.254 - daas.dmz.vip (3@em0_vlan2205)" has resumed the state "BACKUP" for vhid 3@em0_vlan2205
      Mar 26 17:17:44 check_reload_status: Carp master event
      Mar 26 17:17:44 kernel: carp: VHID 1@em1: BACKUP -> MASTER (master down)
      Mar 26 17:17:44 kernel: carp: VHID 1@em1: MASTER -> BACKUP (more frequent advertisement received)
      Mar 26 17:17:44 check_reload_status: Carp backup event
      Mar 26 17:17:44 check_reload_status: Carp master event
      Mar 26 17:17:44 kernel: carp: VHID 3@em0_vlan2205: BACKUP -> MASTER (master down)
      Mar 26 17:17:44 kernel: carp: VHID 2@em0_vlan2202: BACKUP -> MASTER (master down)
      Mar 26 17:17:44 kernel: carp: VHID 3@em0_vlan2205: MASTER -> BACKUP (more frequent advertisement received)
      Mar 26 17:17:44 kernel: carp: VHID 2@em0_vlan2202: MASTER -> BACKUP (more frequent advertisement received)
      Mar 26 17:17:44 check_reload_status: Carp master event
      Mar 26 17:17:44 check_reload_status: Carp backup event
      Mar 26 17:17:44 check_reload_status: Carp backup event
      Mar 26 17:17:45 php-fpm[58491]: /rc.carpmaster: Carp cluster member "198.51.168.254 - daas.pub.vip (1@em1)" has resumed the state "MASTER" for vhid 1@em1
      Mar 26 17:17:45 php-fpm[58491]: /rc.carpbackup: Carp cluster member "198.51.168.254 - daas.pub.vip (1@em1)" has resumed the state "BACKUP" for vhid 1@em1
      Mar 26 17:17:45 php-fpm[58491]: /rc.carpmaster: Carp cluster member "10.22.5.254 - daas.dmz.vip (3@em0_vlan2205)" has resumed the state "MASTER" for vhid 3@em0_vlan2205
      Mar 26 17:17:45 php-fpm[58491]: /rc.carpmaster: Carp cluster member "10.22.2.254 - daas.management.vip (2@em0_vlan2202)" has resumed the state "MASTER" for vhid 2@em0_vlan2202
      Mar 26 17:17:45 php-fpm[58491]: /rc.carpbackup: Carp cluster member "10.22.5.254 - daas.dmz.vip (3@em0_vlan2205)" has resumed the state "BACKUP" for vhid 3@em0_vlan2205
      Mar 26 17:17:45 php-fpm[58491]: /rc.carpbackup: Carp cluster member "10.22.2.254 - daas.management.vip (2@em0_vlan2202)" has resumed the state "BACKUP" for vhid 2@em0_vlan2202
      Mar 26 17:17:47 check_reload_status: Carp master event
      Mar 26 17:17:47 kernel: carp: VHID 1@em1: BACKUP -> MASTER (master down)
      Mar 26 17:17:47 kernel: carp: VHID 1@em1: MASTER -> BACKUP (more frequent advertisement received)
      Mar 26 17:17:47 check_reload_status: Carp backup event
      Mar 26 17:17:47 check_reload_status: Carp master event
      Mar 26 17:17:47 kernel: carp: VHID 2@em0_vlan2202: BACKUP -> MASTER (master down)
      Mar 26 17:17:47 kernel: carp: VHID 3@em0_vlan2205: BACKUP -> MASTER (master down)
      Mar 26 17:17:47 kernel: carp: VHID 2@em0_vlan2202: MASTER -> BACKUP (more frequent advertisement received)
      Mar 26 17:17:47 kernel: carp: VHID 3@em0_vlan2205: MASTER -> BACKUP (more frequent advertisement received)
      Mar 26 17:17:47 check_reload_status: Carp master event
      Mar 26 17:17:47 check_reload_status: Carp backup event
      Mar 26 17:17:47 check_reload_status: Carp backup event

      1 Reply Last reply Reply Quote 0
      • C
        cmb last edited by

        That's the symptoms of the VMware looping multicast issue.
        https://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting#Changing_Net.ReversePathFwdCheckPromisc

        1 Reply Last reply Reply Quote 0
        • R
          rickbaran last edited by

          Also might check the version of exi 5.5 your are on. Had some other issues when using the 5.5 1331820 before we upgraded to 1623387

          1 Reply Last reply Reply Quote 0
          • H
            hphan082 last edited by

            hi CMB,
            i followed that document, but it doesn't work.

            Rick, we are running 5.5 1892794. :) I'll talk to our VMWare team to see if they have newer version to upgrade for these hosts.

            1 Reply Last reply Reply Quote 0
            • KOM
              KOM last edited by

              Current build for 5.5 is 1993072 I believe.

              1 Reply Last reply Reply Quote 0
              • C
                cmb last edited by

                @hphan082:

                hi CMB,
                i followed that document, but it doesn't work.

                That most definitely fixes the problem you're seeing. It has to be set on every host that has a promiscuous port group so none of them loop multicast. I've done that on many, many ESX hosts from a variety of versions and it's always immediately worked with one odd exception - one ESX host in particular just wouldn't obey that config setting until rebooting ESX. Most of the time though, when that doesn't work it's because it wasn't set on all the hosts and some host is still looping the multicast.

                The other possibility is there is something else on your network that's looping multicast traffic, but that's unlikely.

                1 Reply Last reply Reply Quote 0
                • H
                  hphan082 last edited by

                  hi cmb,
                  I seriously tried everything I can to get this to work.
                  I will be away for a 10-day bootcamp. I'll ask our virtualization manager to reboot both hosts while I am gone and will try again when I'm back.

                  1 Reply Last reply Reply Quote 0
                  • H
                    hphan082 last edited by

                    hi cmb,
                    we did the entire thing one more time, reboot both hosts. I still get nothing. I attached a few screenshot here for your review, including Host_Advance_Settings, the dvS port-group setting, and also the firewall Log







                    ![Host Advanced Settings.JPG](/public/imported_attachments/1/Host Advanced Settings.JPG)
                    ![Host Advanced Settings.JPG_thumb](/public/imported_attachments/1/Host Advanced Settings.JPG_thumb)
                    ![firewall log.PNG](/public/imported_attachments/1/firewall log.PNG)
                    ![firewall log.PNG_thumb](/public/imported_attachments/1/firewall log.PNG_thumb)

                    1 Reply Last reply Reply Quote 0
                    • C
                      cmb last edited by

                      That all looks correct. You can verify the looping multicast with a packet capture. Via SSH command prompt:

                      tcpdump -nei em0 vrrp
                      

                      The system will send 1 per second. You'll see similar to the following.

                      22:00:59.909437 00:00:5e:00:01:0a > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 10.0.0.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 10, prio 0, authtype none, intvl 1s, length 36
                      22:01:00.910396 00:00:5e:00:01:0a > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: 10.0.0.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 10, prio 0, authtype none, intvl 1s, length 36
                      
                      

                      You want to see only one per CARP IP on that interface per second. You'll see duplicates there more than likely.

                      1 Reply Last reply Reply Quote 0
                      • H
                        hphan082 last edited by

                        hi cmb,
                        I ran tcpdump on both firewall, and below are the screenshot of what I see. Every second, I see 2 message from 198.51.168.252 (VRRP) to 224.0.0.18, look like I get 2 packets every second.

                        So this should confirm that we are hitting a bug with VMWare again?


                        1 Reply Last reply Reply Quote 0
                        • C
                          cmb last edited by

                          Yes, look at the timestamp on the colored lines there, that's the same packet only 0.0001 seconds later. Something is looping that system's multicast traffic back to it. VMware is the most likely candidate because it generally doesn't happen on physical switches, but it's possible that you have the ESX hosts configured fine and some other device looping the traffic. That does 100% confirm the issue is looping multicast at least.

                          1 Reply Last reply Reply Quote 0
                          • H
                            hphan082 last edited by

                            Thanks CMB. I will work with the VMWare team to look into this.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post