CARP causes NICs to be unresponsive

nicholaswy

Within 12 hours of turning on HA we can't reach the webGUI of the Master, ping the pfSense machine, or have the pfSense machine ping anything. Prior to doing HA everything was working fine. After having this issue before, we did a fresh install/config and are experiencing the same problem with HA again.

Our setup is 2 pfSense 2.3 firewalls, with CARP VIPs on LAN and WAN. pfSync occurs on a dedicated interface (allowing all traffic to pass) via a crossover cable and the configuration sync occurs on our primary vlan. NAT is setup for all the vlans. Fail over itself works fine but for some reason we loose connectivity to at least one machine. After checking things and rebooting the machine, the next day it's down again.

We're at the point where we either get help in solving this issue or do without HA.

Addition: No issues in the logs or on the display.

pfSense.png_thumb

Derelict

Are all of your interfaces identical on both nodes? Almost sounds like it might run until someone makes a firewall rule change then the firewall rules are synced and the rules on the wrong interface on the secondary get clobbered. That in itself should not affect overall connectivity through the primary/master but it can cause all sorts of uncertainty.

You want to make sure everything on Status > Interfaces is matchy-matchy on both nodes. Physical interface names, optX interface names, etc. They should all be identical and in the same order.

Do you attempt access to the nodes on their interface addresses or are you just using the CARP VIPs?

Anything on the consoles?

If the master stops responding there should absolutely be a CARP failover event and the backup should take over. This should log all sorts of nice things in the system log on both nodes. You said "no issues" in the logs but what is there?

nicholaswy

Interface Names & Order are identical in Status > Interfaces.

I've tried both CARP and interface addresses and neither the CARP nor the Master responds to pings or is in the ARP cache.

Nothing on the consoles either, other than login events.

Your last paragraph mentions something that puzzles me too. The backup doesn't take over or detect the master is down. However when tested by disabling CARP on the Master the backup does take over.

After rebooting the Master this morning I see a reconfiguration occurred on Saturday afternoon. That's the only entry between the reboots on Friday morning and this morning.

I'll try disabling the firewall rule sync to see if that helps.