On failover master still master, backup also master ESXi 4.0

vmwpfsense

I've run into a head scratcher. I've configured dual firewalls which sync ok and NAT ok, the outbound packets sourcing the WAN (CARP) VIP. When I pull the WAN plug from the master firewall, the backup firewall takes over the CARP (VRRP) multicasts and becomes master, so far so good.

The problem is that the master still thinks he's master too. Both firewall's are running as virtual machines in ESXi 4.0 (I have promiscuous mode enabled, etc. on the port group). The WAN interface status on the master firewall still indicates it's up. The vmnic indicates it's down.

I'm maybe naively thinking the link down indication did not get propagated to the firewall vm? Any ideas from the pro's out there.

jimp

See point 4 here:

http://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting

Supermule

Are you running carp in 2 virtual machines on the same host??

vmwpfsense

Thanks jimp, I've already set the port group security settings per point 4.

Supermule: I am running carp in 2 virtual machines on the same host. The WAN and sync ports are all in separate
vSwitches (4 total), each with different vmnics (pnics). I've externally looped back the vmnics. The WAN ports
connect to an external switch which connects to my internet connection. When I remove the WAN Ethernet cable
to the external switch is when the master stays master. (This is a staging environment, will move
all into a Data Center soon and migrate one WAN and one sync port groups to another machine).

vmwpfsense

I've narrowed the issue to the pfSense vm not getting notified that the WAN link is down.
After the cable pull ifconfig on the pfSense vm sees the i/f as active. Only after I deselect "Device
Status connected" option does the pfSense failover. Selecting the "Device Status connected"
option brings the link up and the fail back occurs correctly.
Still don't know why ESXi is not notifying the pfSense vm that the link is down ….

Supermule

Have you configured it correctly under the vswitch properties??

@vmwpfsense:

I've narrowed the issue to the pfSense vm not getting notified that the WAN link is down.
After the cable pull ifconfig on the pfSense vm sees the i/f as active. Only after I deselect "Device
Status connected" option does the pfSense failover. Selecting the "Device Status connected"
option brings the link up and the fail back occurs correctly.
Still don't know why ESXi is not notifying the pfSense vm that the link is down ….

vmwpfsense

I think the vSwitch is programmed correctly: WAN, WAN1, CARP, CARP1 vSwitches (and port groups) are all Accept for promiscuous mode/mac addr changes/forged transmits. Network failover is link status only, no standby adapters.

bobwondernut

This has happened to me in the past when a vSwitch is bound to multiple NICs which are attached to a physical switch that isn't configured correctly for trunking. What ends up happening is that the multicast CARP announce goes up one leg of the trunk, the physical switch sees the request, doesn't know that the other uplinks are trunk members, and shoots the same multicast packet right back up the other pipes back to the ESX server. ESX passes the packet back into pfsense, and it thinks oh dear someone's responding already for that CARP IP and either marks it down or both members as master.

Solution is to either remove any redundant uplinks in a vSwitch (whether active or not if they are physically in the config it'll screw it up) or properly configure the physical switch to handle trunking.

ESX doesn't support LACP, so this gets a little tricky if you go for the latter. Juniper gear needs a recent JunOS and ae interfaces configured with LACP mode set to none. You can't use beaconing. Ciscos should be set up properly with separate EtherChannel configs for the trunks.