[SOLVED] CARP swiching with apparent no reason from master to slave
from the latest build of 2.0 RC3 x86 and also on the 2.0 Release, sometimes the backup machine starts to act as a master machine, apparently with no reasons.
My config has 83 carp vips, each is configured with the options:
Master Adv Frequency: Base: 1, Skew 0
Slave Adv Frequency: Base: 1, Skew 100
I checked for interface errors, there are no errors reported on the 5 interfaces of the two fireewals, just comparing the logs, both machines are configured with the Time zone "Europe/Rome", on both system logs there is really nothing before the switch from master to slave or slave to master on both machines…
The only way to let the master machine to act as a master again is to disable CARP on the secondary machine, then reenable it... but this really never happened before the 2-3 latest versions of RC3...
Do you have any idea why this is happening?
maybe was the NIC having problems… was a BGE network board (integrated in the motherboard), I could see randomic errors in the "interface status" (the tricky part was entering in the webconfigurator since the LAN interface was having problems) and on the switch.
I replaced it with an FXP (Intel by HP?) NIC I had in my "wondersbox", let's hope this solves the problem...
thanks to the help of Jim, the problem has been identified. Some of the outbound NAT rules had "source=any", so also the CARP packets were natted somehow and this brought to an "inconsistent" CARP state.
The problem was solved assigning to each outbound NAT rule a proper source different by "any".
After this, Ermal added some code (that will be released with 2.0.1 RELEASE) to avoid this issue in any case (http://redmine.pfsense.org/issues/1954).
Thanks to Jim and Ermal for supporting!