Single VIP inappropriately failing over
-
Quick setup:
pf1:
physical int 1: WAN, ..*.253
physical int 2(untagged): LAN1, 172.16.0.253
physical int 2(tagged 200): LAN2, 10.0.0.253
physical int 3: pfsyncpf2:
physical int 1: WAN, ..*.254
physical int 2(untagged): LAN1, 172.16.0.254
physical int 2(tagged 200): LAN2, 10.0.0.254
physical int 3: pfsynccarp:
WAN ..*.2
LAN1 172.16.0.1
LAN2 10.0.0.1All interfaces in pf1 have an advertising frequency of 0, pf2 has an advertising frequency of 100.
Yesterday we had an issue where pf1 went into backup for LAN2 and pf2 took master. The log on pf1 said it saw a member with a higher broadcast frequency. Nothing else failed over, so this broke LAN2s connection.
I don't believe this was an issue with broadcast traffic as if I disabled carp on pf2, LAN2 would become master again on pf1, and if I disabled carp on pf1, pf2 would take over master for every other interface as well.
Rebooting pf2 fixed nothing, it would take over LAN2 as soon as it came up. After rebooting pf1, pf1 re-associated master on LAN2 and seems to be working properly again.
Any idea why this happened and how to prevent it in the future? If all interfaces would have failed over it would have been much less of a problem, but having just a single one do so broke things.
-
I don't know if it would be related to what you are seeing, but it is not recommended to mix tagged and untagged traffic on any interface where you use VLANs.
So you really should be tagging both VLANs on interface 2. Change the switch to tag on the port's default VLAN and don't have any untagged VLANs on the port.
-
Just happened again and we have both interfaces tagged.
If one carp interface fails, all interfaces should transfer to the next highest peer, correct?
-
That is correct; If one fails they all should fail.
If the settings are right, the next thing to look at would be switches and cabling. It's possible that there is an issue there. I talked to someone last week who was getting odd master/slave flips and it turned out they were using the little switch on the back of their cable/dsl modem and it was not up to the task. Merely sticking a different switch in that role fixed all of the issues.