HP Virtual Connect and CARP with Vmware
-
Has anyone any experience in setting up a pfsense CARP cluster on Vmware running on HP Blades with Virtual Connect as the underlying network layer?
We have created our setup as follows:
All blades ESXi 5.1 and Vsphere Server managing them for HA, DRS etc.
2x pfsense VMs running on different blades with affinity rules to keep them from running on the same blade if at all possible.
2x Uplinks to the Data Center network with a fully routed setup for CARP (/29 ips for CARP etc. and /24 IPs behind pfsense for VM-Network)
HP VC-Enet configured to support Vlans and using 2 modules for redundancy. One DC uplink goes to each module.
A vNet is created for each DC uplink with a different Vlan id and this is passed to nic0 - on each blade.
In Vmware networking we then created a seperate vSwitch for each vNet on the virtual connect modules with matching vlad ids.
This gives us the ability to have 2 pfsense VMs running with two wan connections each mapped to a separate DC uplink, in theory giving failover if a DC uplink is lost.The issue is this, if we pull the DC uplink cable on module 1, the master firewall does not detect a ink down event because of the virtualisation layers, it merely experiences packet loss.
The backup firewall then takes the Wan IP as master but the other connections including the VM-network CARP ip stay as backup.
Result is that all VMs lose internet connectivity because their gateway is still on the master firewall but the WAN Ip has shifted to the backup firewall.Plugging back in the DC uplink restores everything is a second or two.
Now, if I reboot the master firewall all CARP ips shift to backup firewall as expected, states are synced fully via CARP and the backup takes over without a hitch. When the master reboots it takes full control back in the same manner, barely a ping dropped and all stated fully synced with no connections dropped. The backup firewall switch es its CARP ips all to a backup state and it sits there dutifully with no drama.
So I have a pretty much working CARP setup in one sense but quite a vulnerable on in another sense.
What I would like to achieve is this:
Master and Backup firewalls both as VMs with enough flexibility for vmotion around on blades. i.e not tied to a specific nic or blade. (currently have this)
CARP syncing on a vlan interface between pfsense VMs (currently have this, using a vNet with its own vlan id and no uplink so its purely a private link between pfsense VMs with no multicast pollution on the network)
DC uplink redundancy with ability to lose one uplink or one switch and have backup firewall take over seamlessly (have this in theory but needs work)
VM-Network switch redundancy (have this with 2x VC modules and VM-Network vNet running on both)Is there a better way for me to configure this or has anyone any experience doing so?
Thanks.
-
Nobody any input on this one?
-
Would it be better to run these as High availability than CARP fail-over?
-
How exactly do you mean run them as high availability?
-
I was not thinking. If I were to attempt redundancy the way you want, I'd end up either pulling my hair out or using separate boxes for everything.
-
If that has not been answered yet….
There is a place in virtual connect (can't remember where) that will force all down ports (servers) to be put offline if you're uplinks are disconnected...
This would work flawlessly with "hardware" servers (I'm doing it this way I have 2 firewalls acting as perimeter devices on a dual GigE link)...
now you would need to implement the same trick in VMware in your case... so the chain of event would look like this:
1- Datacenter link goes down
2- VC module reacts by putting all down links down. -> my failover happens here because of nic going down
--------------------- Your part starts here ---------------------------
3- Uplink of Vmware vSwitch goes down (consequence of NIC being diconnected by VC module)
4- have the Vswitch do the same thing the VC module did... no uplink... kill all the down links.
5- this would trigger failover correctly.And go see this :
http://h30499.www3.hp.com/t5/HP-BladeSystem-Virtual-Connect/Network-failover-not-working-with-ESXi5-hp-Virtual-Connect/td-p/5501209#.UmyhSxDO35w
-
I have no blade-related experience, but did you put the vswitch ports that connect to your pfsense
vm's, and which are to take part in CARP, into a port group that has promiscous mode enabled?
I usually create a duplicate port group (same vlan, same vswitch), which has promiscous mode
enabled, and put the pfsense interfaces into that port group, and all vm's that use the pfsense as
a gateway into the port group with promiscous disabled.