Removing HA member causes switch lockup
-
Hi All,
I'm planning on moving/reconfiguring our pf HA install. It had been working fine for over a year. The firewalls are connected to an Extreme switch on the WAN side and an Aruba CX switch on the LAN side.
Yesterday morning I shutdown our secondary pf and unplugged the network cables. Within an hour our Aruba switch locked up and stopped passing traffic (I could still access the OOB Mgmt and was not able to ping from that switch out either). At this point, we did not know what the problem was.
All day and overnight, the switch would lock up anywhere from about 50 minutes to 2 hours. (Requiring a reboot each time) As part of troubleshooting, we unplugged all devices other than the single remaining pf and our internal core switch. We even replaced the switch to no avail.
This morning on a whim, I plugged the secondary pf back in and the switch instantly came back to life without having to reboot!
So... what would cause this? Shouldn't we be able to run with one member of the HA cluster down/offline for more than an hour? Since I didn't see a problem with the wan side Extreme switch, could this be a bug with Aruba handling the CARP heartbeats?
Hoping to figure this out before I try to break the HA again (or if one just dies)..
Thanks,
J
-
Well,
I'm trying to replicate the issue in a test environment with a single pfSense box using CARP IPs and a spare Aruba CX switch. Of course I can't
I guess my next step is to actually setup a HA setup and then remove the secondary and see what happens...