CARP seems to work everywhere but on LAN interface
-
Hi,
I am not very skilled with pfSense but have been tasked with building a failover pair. I have been working on this for quite some time now and have come quite a bit further but am now stuck in a rut. I have read through quite a few questions on the forum but did not see anything that was of use for the issue, any help would be much appreciated.
Hardware:
-HP C7000 Blade Center
-Has 2 built in WS-CBS3020-HPQ Cisco switches. These are connected via trunk links to
-2 Cisco 2960G’sSetup:
I currently have the pfSense firewalls set up on the virtual platform, and some servers set up which are connected to the LAN. The firewalls were set up according to http://mirror.qubenet.net/mirror/pfsense/tutorials/carp/carp-cluster-new.htm . There were a few small differences though as I gather that tutorial is a bit old but it was still very helpful.
Issue:
Currently I can ping all the outside IP’s from externally including that of the CARP. I can ping all of the IP’s from the active firewall, including the servers IP’s. There are ARP entries on it for everything except for the CARP IP’s. On the standby firewall I cannot ping the CARP IP’s but there are entries for them in the ARP table. I was having an issue before but I changed the Net.ReversePathFwdCheckPromisc as per this document http://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting , this all works as I would expect it to now which is great.
The issue is that the servers can ping each other, as well as all the real IP addresses of the firewalls but cannot ping the CARP IP for the LAN subnet which is the default gateway which is a real problem as you can imagine. Please help.Thanks,
Chris -
Sounds like you're missing promiscuous mode on the vswitch involved there.
One other potential issue you could be hitting, there is a bug in VMware's VDS upgrade code. If you use VDS (Virtual Distributed Switches) in 4.0 or 4.1 and upgrade from 4.0 to 4.1 or 5.0, the VDS will not properly pass traffic for anything other than the real MAC addresses (which breaks CARP). If you create a new VDS on 4.1 or 5.0, it will work, but the upgraded VDS will not. Quickest work around to that is to create a new VDS and delete the old one. VMware has said a fix is coming at some point, but that's the only work around for the time being.
-
Thanks for the prompt reply cmb! Promiscuous mode is enabled on the switch. I have spoken to our systems administrator and he says that the virtual switch is using version 4.1 and that he was planning on upgrading it soon. He said it is not possible to create the new VDS though as any issues could affect the entire platform which has hundreds of VM's on it. That there would be no recovery other then to revert to standard switches which would cause a long downtime to get all the VM's back online, VLAN by VLAN :(
I noticed you said that was the quickest work around, I am hoping that you know another way we can go about to get this working? Perhaps a way to upgrade the version that would make it work?
Cheers,
Chris -
There is no work around other than deleting and recreating the VDS. Once you've upgraded it, it's internally broken in a way you cannot fix any other way. One of our customers had to spend hours on multiple calls with VMware (some of those with me on the line) to get them to confirm and find the problem, by which point we were working with very senior level engineers at VMware. Maybe fixable by getting into the unsupported console and digging into manual edits of something deep in the system, but VMware refused to disclose any such options if they do exist. I suspect they would be far, far riskier than just deleting and recreating the VDS anyway.
I have spoken to our systems administrator and he says that the virtual switch is using version 4.1 and that he was planning on upgrading it soon. He said it is not possible to create the new VDS though as any issues could affect the entire platform which has hundreds of VM's on it. That there would be no recovery other then to revert to standard switches which would cause a long downtime to get all the VM's back online, VLAN by VLAN
The setup where we found out the hard way about this new VMware bug has several hundred plus VMs with a ton of VLANs too. That statement that there's no recovery doesn't really add up for most cases, though it may, depending on exactly what all you have setup there. You wouldn't have to revert back to standard switches unless something really, really blew up which is unlikely. We had no issue recreating the upgrade-broken VDS on the aforementioned setup.
-
Thanks I will discuss it with him. Thanks so much for the help!
Chris