Weird possibly CARP-related behavior with single firewall

hefferbub

I recently did a major switchover of core network equipment, replacing some older HP 1G switches with 10G capable UniFi Pro models and installing a new version of PFSense + on a Supermicro 10G-capable server.

I now do one 10G physical connection each for LAN and WAN, with 5 subnets defined as VLANs on the LAN interface and on the switches.

In PFSense I am planning to implement a CARP/HA setup with a second identical firewall server. However, I won’t have access to enough WAN IP addresses until fall.

To get ready for that, I went ahead and defined each of my LAN-side VLAN interfaces with CARP in mind, so I created the interface IPs as x.x.x.11 and then created a CARP VIP for each as x.x.x.1. This way when I am ready to add the second PFSense box, nothing much will need to change on these subnets. I tell DHCP to give out x.x.x.1 as the gateway, DNS and NTP server.

Anyway, this all works great on 4 of the 5 subnets, but on one I get anomalous behavior, and there is some kind of time-related variation on the order of 15 minutes. This subnet is physically distributed on my campus through one HP switch that links to 3 other HP switches.

So the weird behavior is that on half of this subnet (2 of the 4 switches) the CARP address becomes unavailable periodically. That is, those machines can ping the actual interface address x.x.x.11, but they cannot ping the CARP address x.x.x.1. 10-15 minutes later, they can ping it and everything comes up fine for a while, then the problem recurs.

On the other half of the subnet, it works all the time, as it does on several other similar subnets.

Obviously one would suspect the switches that feed the failing part of the network. My remote switches are a variety of models of HP 1910 and 1920 series small business switches (which HP bought from 3COM).

There was one in a key position that was a oddball 8 port model that I thought might be different. I swapped that for another newer switch and things started working—until they failed again 15 minutes later.

Finally, to restore service I had to change DHCP to use the x.x.x.11 interface as gateway and it then worked consistently. On the other subnets it continues to work without issue on x.x.x.1

I’m racking my brain to understand how this behavior might happen, and wondered if anyone has thoughts. The switches are all level 2 and VLAN-aware.

So how could the CARP address intermittently become unable to respond to hosts on half of a subnet?

Am I misguided in thinking a non-paired CARP interface can be used in this manner?

Any clues, guesses or advice would be welcome.

Derelict

@hefferbub said in Weird possibly CARP-related behavior with single firewall:

so I created the interface IPs as x.x.x.11 and then created a CARP VIP for each as x.x.x.1.

The interface IP addresses need to be different. Like x.x.x.11 and x.x.x.12.

hefferbub

@derelict At the moment, I only have one firewall.

I will add the other one later when I have more WAN addresses.