Problem with failover, propably ARP problem
Pfsense 1.0.1 with six interfaces(some of them on the same switch), load balancing in and out, CARP/VIP
First at all I have problem with using pfsense just after it starts, I can see in logs ARP request failed for my VIP's. Everytime I need go to webGui and press Save for ARP Suppress, after everything is ok.
Second problem when I try shutdown or reboot (any of my two machines) they freeze after UpTime. When I didn't have CARP I didn't have that problem.
Third problem I configured 2 machines with CARP/VIP, it copy settings beetwen machines from active to passive and vice versa but when I shutdown active , I don't have access to networks. It seems that IP are still hold somewhere, maybe switch.
Sometimes my computers freeze when machine starts and CARP is sync between servers.
I did small test, I press save for ARP Suppress for first server and next for another. It caused both machine to freeze. I had to unplug some network cables from switch , it unblocked servers.
All these problems seems caused by ARP requests, maybe it is switch problem or servers can't free some IP addresses.
Why do you have everything connected to the same switch? This makes 0 sense to me. Either use a properly configured vlan switch to break it up in several segments that don't see each other or use more than one switch.
Do I read between the lines that you sync from master to slave and from slave to master? This will cause a syncloop and lead to issues that might result in borken config.xmls when losing power or other unpredictive shutdown events. You only should sync from master to slave.
Also have a look at http://doc.pfsense.org/index.php/Setting_up_CARP_with_pfSense and http://pfsense.com/mirror.php?section=tutorials/carp/carp-cluster-new.htm . It somehow sounds that you have something configured incorrectly.
I didn't connect everything to one switch.
WAN and WAN2 (master and backup together 4 cables + 2 for internet connections) is connected to the same switch1.
LAN (master and backup) to switch2…..
OPT2 (master and backup) to switch3...
According to doc I can connect 2 WAN (4 interfaces if you count master and backup to the same switch + 2 modem connections) on the same switch. I prefer in that way because of lack of space on my server rack.
In one point of flash howto I saw check Sync Enable for backup machine. So now for master I have Sync enable + sync all nat, aliases etc., for backup I have Sync Enable. Should I uncheck Sync Enable on backup machine?
But all these doesn't explain why my servers hangs during rebooting, just after Uptime message. It hangs even if I restart single machine without any connection to networks.
Make sure you didn't use any vhid twice. At the backup machine the sync checkbox has to be enabled but you should not enter a syncto IP at the bottom unless there is a 3rd machine in the cluster that you want to sync to(master->backup1->backup2->…).
Not sure what causes the machine to not reboot. I once had a similiar situation with a nexcom but after configuring Powermanagement settings in the bios correctly it started working.
What even more stranger.
When main machine is active, I start backup machine.
On backup machine I can see during start CARP settings done. In this moment I can see that my switch (for WAN) goes crazy and on main machine I can see big traffic (max for my optic connection) 5Mb from my WAN to WAN2. And it runs fully until I confirm again ARP Suppress in Advanced settings of my main machine.
After this Save traffic backs to normal and backup machine finish boot.
I'm not sure what I should do. Maybe try change switch??
VHID's are unique for all VIP. Interface doesn't allow write twice the same settings
I checked once again and both main and backup machine on WAN, WAN2 have 5 MB traffic
Try to seperate the WANs to see if that does the trick.
I put two separate switches for each WAN. It helps , I don't see big traffic between WAN's.
But I still have problem with Backup. If I shutdown Main Machine, Backup takes over all VIP's , but I still don't have access to my WAN's,DMZ.
When I start Main machine again it takes over VIP's from Backup. But if I want have access to internet I need Save ARP Suppress.
Maybe if I want Backup machine work , after I shutdown Main, I need Save ARP Sup. for Backup (I will test it tomorrow).
Is it possible to workaround for ARP Suppress , so I don't need everytime after start Save it again and again?
Additionally I can see when my Main machine starts message:
ERROR: expected 8 data source readings (got 11) from N:U:U:U:U:U:U:U:U:…
Something definately is wrong with your setup. You generated a carp vip for the lan and dmz subnet too and use this ip as gateway for the clients, right?
I did few changes and almost it works.
Of course still have problem shutdown machines, but it isn't so important (maybe except if I want do reboot).
Generally CARP works, it means when one machine goes down, another one takes over VIP.
But I have two problems:
-first on one interface I have DHCP server, when it goes down second machine doesn't take over lease. Yes I configured IP of second machine.
-second my backup machine shows MASTER , even if Main machine is UP. It shows only for one or 3 interfaces. Other are OK. The most often I can see MASTER (on backup machine ) for interface with DHCP server. It shows Master just after it starts, even if in that moment Main server is UP and shows for this interface Master. Maybe that is why later I have some problems with taking over DHCP.
I noticed that interfaces and functions have to be in the same order on both machine ex:Main : WAN, LAN, OPT1(test),OPT2(test2) , Backup: Main : WAN, LAN, OPT1(test),OPT2(test2).
Because rules are copied according to names OPT not internal names like "test". I think it is pretty imprortant and it should be in doc about this.
I wanted install on backup machine the same packages like my Main machine. I installed, but after manual Sync between firewalls, packages doesn't show on webGui (I didn't notice moment when they disapear)
After I disable CARP, I can shutdown machine. It means that CARP causes my problems with shutdown or reboot servers. What can be the reason?
any help or comment?