Something unbelievable
-
Hi,
One of my cluster is being dumb !!!!
It's a four-homed cluster, lan/wan/wan2/dmz. After 50 days without any problem, the first node has stopped being master on his carp interfaces giving the power to the slaveā¦.I don't know why (prehaps a switch problem....). when testing the hardware we have changed the assignation of wan interface (there are 7 NICS, 2 were free) in order to check that the NIC wasn't the problem...it wasn't.
We then rolledback to the first configuration. Now, when node 2 is down node 1 is master and everything is ok. When node 2 comes up, node2 becomes master. After recreating every carp interface (auto synchronised with slave node), slave node still becomes master !!!
I've done some tcpdump focusing on CARP protocol and I can see something really weird, the machine does not broadcast carp-vrrp wan information anymore ! and the best, it broadcasts dmz carp-vrrp information on the wan interface! it's all messed up!I've done 345 reboots, destroyed carp interface, rebuilt, forced assignation of interfaces....nothing changes !! even if do it by hand on a shell (ifconfig carp destroy/ifconfig carp create/ ifconfig carp0 etc....)! It looks like the kernel keeps wrong assignation somewhere...
unfortunately the "carpdev" keyword isn't supported on freeBSD's CARP implementation.... I can't force relationship between carp and nics.I think I will reinstall with the lastest snapshot but before doing that, has someone already seen this ????? :-D
It is a 1.0.1 pfsense.
-
We are having a similar or same issue:
We have 2 pfS boxes, each with 4 NICS (LAN/WAN/WAN2/CARPFailover) running 1.0.1-SNAPSHOT-03-27-2007. The 2 WAN links are load-balanced with failover on each box. We built the first box, everything worked great. Setup the second box independently, worked great. Setup CARP, making box 1 advfreq 0 and box 2 advfreq 100. Sync worked great and everything appeared to be working fine, except ping was returning duplicates. After trying to track it down, we realized both machines were acting as master, and box number 2 kept resetting its advfreq to 0. We can go edit, set advfreq to anything higher than 0 on either box, but once the rules are applied and the 2 boxes sync, both machines have an advfreq of 0 again.
We have tried changing the advfreq on both boxes. They both automatically and immediately want to become the master again. We have tried removing and re-adding interfaces with no luck.
Let me know if you need anymore information for troubleshooting. Thanks in advance for any help.
-
anytime i have seen an issue like this it is either because the interface the carp is on cannot talk to the other carp both on the same interface. or there is another carp vrrp hscp implementation on the subnet using the same carp id number 1-255 these numbers cannot be the same within a subnet
-
Ok, We have a little more information on the problem.
Aldo, thanks for your suggestions, but neither of those are the issue. We're not sure if this is a 'LAN Only' issue, or if it just appears that way since it was the first interface we tried to configure. Maybe the bug is with the 1st VIP created, we're not sure yet.
We have added 2 CARP IPs, a virtual IP for each of the WANs to make routing and VPN access easier. The carp1 and carp2 interfaces work perfectly. Master <-> Backup <-> Master is working great. We have tried to change the settings on the LAN IP, and ultimately tried to delete the virtual LAN IP to start over and try again, but once we change a setting on the LAN VIP, it immediately reverts. When we delete the IP, it immediately returns, with the same settings. We just cannot get rid of or change the VIP on the LAN. This IP is causing duplicate traffic and other issues so we have the secondary box LAN interface unplugged for now, but the this will be a problem for failover in the future.
Does anyone have some more info on this, or an idea when the next snapshot or beta will be out?
Thanks.
-
some this issue is only on your lan yes i have seen a similar issue a time ago now i just dont sync the lan with a vip i keep it out of the loop so agreed there must be a bug introduced somewhere. i think it only happened when the wan was multiwan though. can't be sure and dont have time to test ift for you.
could you put the lan on a vlan and not use it and put the subnet in question on an opt and see if it goes away
maybe try adding a ticket for it or wait a bit to see if someone can confirm it