carp + failover group
I'm having some trouble with a multiwan network.
I'm having 2 pfSense routers in carp config. Each router has 2 uplinks to 2 different routers from different ISP's. The 2 gateways are put in a failover gateway group.
Everything works fine in that I can kill any of the isp connections and the pfsense will properly failover as it should. If I shut down one of the routers carp does fine and switches over to the other one. Even in demoted carp status everything works exactely as it should. Cudos. So far so good.
However, when one of the isp routers goes down (doesn't matter if it's the tier 1 or tier 2) I get these strange interruptions in connectivity. They're very short. In some cases, ping testing doesn't even catch up to it (although from time to time a single echo is dropped) but the problem becomes obvious in rdp sessions (slight hickup, then reconnects almost immediately) or when copying files through VPN (amazing robust windows copy obviously screws up when the hickup happens).
Any ideas what might be causing this? I'm currently thinking about state resets are the main culprit but I'm not sure at all.
Further testing shows that, when one of the isp routers goes down, I can disable that connection and things work fine again.
However, the entire idea of a setup like this is obviously that I don't need to make manual changes when connectivity goes down...
Going to need more information about what when one of the isp routers goes down actually means.
If they start flapping, it's going to be painful unless you manually disable it.
If they go down hard, it should be fine after everyone reconnects on the other WAN.
The gateway logs and quality graphs would be good places to start looking.
Not sure that what you are seeing has anything to do with HA if the MASTER/BACKUP roles aren't failing over. Doesn't sound like they should be.
Is it possible you're using an on-board switch in the ISP router as the layer 2 between the HA nodes?
I can see how that would be tempting but it would certainly cause a problem if powered off entirely.