Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    CARP and VMware ESX 3 not working across redundant switches

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    12 Posts 4 Posters 11.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      MattMeyer
      last edited by

      Just for giggles, I tried the latest build of version 2 and it also had the same problems.  I'm still poking at it to see if there is a way to get this working in my network topology.

      1 Reply Last reply Reply Quote 0
      • B
        bobwondernut
        last edited by

        Would love to know if you've made any progress here - spent the morning beating my head in trying to figure out why my CARP VIPs wouldn't come out of backup until I saw this post :)

        1 Reply Last reply Reply Quote 0
        • M
          MattMeyer
          last edited by

          Sorry, no more progress on my end.  I finally gave up.  It's probably early enough in the development of 2.0 to get this fixed though and I will probably start a bounty to get it resolved.

          I have a very high-availability VMware VI (redundant switches, NICs, HBA's etc), and it sounds like you do too.  Ideally, I'd like to get a hot standby pfSense box up too, but for now, VMotion and scheduled downtime will have to do.  My plan for unplanned downtime for now is to backup the config and have a cold router waiting to be powered on and restored.

          I have no idea if it can even be resolved though and if it's a pfSense problem or a FreeBSD problem.  I know for sure it's not a VMware problem though, as I've done everything to rule that out.  Hopefully a developer can chime in and let me know how I can provide logs to help in a resolution.

          1 Reply Last reply Reply Quote 0
          • B
            bobwondernut
            last edited by

            Thanks for the update.

            I'm going to attempt to deploy a topology where there's 2 vswitches, each bonded with a single NIC, which connect to 2 separate upstream switches that are trunked upstream.  first i plan to see what results I can obtain with both a single pfsense instance with an interface connection to both vswitches, and if there's any success there proceed toward two instances doing the carp magic, both speaking to either switch.

            too bad i can't attach an SLA to this configuration as it stands.  i guess i can't get rid of the netscreens just yet.

            i'll happily contribute out of pocket toward that bounty, drop a PM when you post it.

            1 Reply Last reply Reply Quote 0
            • M
              MattMeyer
              last edited by

              I don't know about you, but I use 802.1q port groups for my vSwitches.  One thing I have not tried yet was creating a dedicated port group for pfsense (using the same VLANID), and then modifying the portgroup to use only a single outbound nic instead of inheriting the config of the vSwitch.  In theory it should work as long as the pfsense boxes are on separate physical boxes as you are guaranteeing only 1 pnic is being used for multiast traffic.

              I'll give this a try and post my results.

              1 Reply Last reply Reply Quote 0
              • M
                MattMeyer
                last edited by

                Nope, no dice.  The only way I was able to get it to work was to remove the pnic connection to the virtual switch.  Disabling or placing the nic into standby did not do it.

                1 Reply Last reply Reply Quote 0
                • B
                  bobwondernut
                  last edited by

                  same here (and, yes, we're using vlan trunking as well) - any assignment of a second nic into a vswitch, disabled/standby or active immediately results in the carp turning pink.

                  1 Reply Last reply Reply Quote 0
                  • M
                    MattMeyer
                    last edited by

                    If you can get it working in some crazy workaround way, please post.  I beat my head for awhile, weighing the pros and cons, and in the end decided that a single VM in a DRS/HA cluster is better than a single physical box.  I'm taking a risk with the SLA, but a risk I can afford to take.  I really don't want to go physical, as it goes against our philosophy.  We are on a quest to get 100% virtual (minus the ESX hosts).  All our VMs, regardless of size or performance needs are now virtual.  We are also working to convert our Cisco routers to Vyatta, moving those devices into VMs.  The only thing that will remain physical are the layer-2 switches, and there is nothing I can really do about that.  The idea of moving pfsense back out to physical is just was not an option for us.

                    1 Reply Last reply Reply Quote 0
                    • C
                      cmb
                      last edited by

                      Never seen this personally, and I have CARP working fine in ESX, but it appears to be a CARP bug triggered by a VMware bug. VMware loops the multicast back to the system in some circumstances (exactly what those are is unknown), which should never happen, and CARP sees it as another host sending it multicast. The same looping would happen to the VRRP traffic, but VRRP in Linux must ignore traffic from itself.

                      Matthew Grooms, who is a pfSense developer, ran into this recently and found the cause described above:
                      http://thread.gmane.org/gmane.os.freebsd.devel.net/26286

                      and has a patch, but we're currently unsure of its correctness, and what potential ramifications it could have. Unlikely that you'll see this patch in 1.2.3, but hopefully at some point in the future if a FreeBSD developer can review and give their blessing.

                      1 Reply Last reply Reply Quote 0
                      • Q
                        quentin
                        last edited by

                        Solved, with work around. See my other posting with subject: VMWARE ESX 3.5 / vSwitch w/ 2 Physical NICs / CARP / PFSense 1.2.3
                        NIC-teaming/fail-over in vSphere seems to be the problem.

                        Best regards,

                        Quentin

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.