Odd problem with CARP w/ multiple firewalls

seanlee

I have an interesting problem. I have an environment where many pfsense firewalls (I have about 16 pfsense instances) share the same WAN switch, which apparently is causing problems with CARP. Regardless of how many different class C blocks I have, it seems as though I can only use a total of 254 VHID's combined (across all firewalls) meaning firewall-1 can have 30 CARP VIP's, firewall-2 can have 150 etc. not to exceed 254 on a single physical network. Allow me to explain the environment a little better…

pfsense 1.2.3 release on vmware (vswitches have promiscuous mode enabled)
multiple firewalls on multiple vmware ESX servers
all ESX servers have an uplink to the same physical L3 switch (which goes out to the internet)

For this example, I'll stick with 2 sets of firewalls (primary + HA). Each firewall has a WAN, LAN and FAILOVER interface. CARP is setup properly, using the FAILOVER interface to sync. The WAN network for each set of firewalls is a completely different /24 public IP space.

ESX Server #1
firewall-1a (primary) example WAN is 65.66.67.1/24
firewall-1b (failover)

ESX Server #2
firewall-2a (primary) example WAN is 230.240.250.1/24
firewall-2b (failover)

On firewall-1a, I have 100 CARP VIP's setup (using VHID 1-100) which are syncing over to firewall-1b without any problems.

On firewall-2a, I also have 100 CARP VIP's setup. Now... if I start assigning VHID's again at 1, (even though it is a separate network), firewall-1a and firewall-1b seem to fight over the VIP's on firewall-2a and firewall-2b. This causes the VIP to not answer sometimes (HTTP/HTTPS) and sometimes I see an ARP with a firewall-2a VIP over on the firewall-1a ARP tables. Alternatively, if I start assigning VHID's at 101 on firewall-2a, everything is fine. It's acting as though I can only use a total of 254 VHID's on a single physical link... even though I am using 2 different class C's on 2 different firewalls...

Any ideas?

Thanks,

-Sean

jimp

That is correct. the CARP protocol only supports 254 unique VHIDs on a single broadcast domain.

We recently tried to bump that up to 65,536 on 2.0 but it was … problematic.

Instead 2.0 was changed to allow you to make IP aliases on CARP VIP interfaces. So you could have multiple IPs per VHID, circumventing all limits without causing the increased network load that 65,536 CARP heartbeats per second would incur.

seanlee

@jimp:

That is correct. the CARP protocol only supports 254 unique VHIDs on a single broadcast domain.

We recently tried to bump that up to 65,536 on 2.0 but it was … problematic.

Instead 2.0 was changed to allow you to make IP aliases on CARP VIP interfaces. So you could have multiple IPs per VHID, circumventing all limits without causing the increased network load that 65,536 CARP heartbeats per second would incur.

Thanks for the response… in all the Google-ing I did, I never found this limit documented. Let me ask you this, if the limit is 254 in a single broadcast domain, how is this limit hit when I have the FAILOVER interfaces for each pair of redundant firewalls on a separate VLAN? I also tried putting the firewalls on separate vSwitches in ESX... I assume it doesn't matter once it hits the switch? Would VLAN-ing the WAN interfaces (WAN-1, WAN-2 etc) eliminate this problem?

Thanks,

-Sean

jimp

The CARP heartbeats happen on the interface where the CARP VIPs reside, not the sync/"failover" interface.

You could have, say, a CARP VIP with a VHID of 1 on WAN and another on LAN with a VHID of 1, but no two WANs that end up in the same broadcast domain could have the same VHID.

seanlee

Great. Thanks for clearing this up for me.

-Sean