CARP and MultiWAN
-
I have a 2 PFsense cluster (PFS 1.2.2) with one LAN and three WAN connections each going to a Draytek ADSL modem. Each ADSL connection has two IP addresses.
The LAN default gateway and each external IP address (six) has a CARP interface defined.
I have a separate CARP interface.
I use a loadbalancer and failover devices for various protocols outbound.
The first PF was set up over a year ago and has worked flawlessly. I set it up from the start to use CARP with a view to putting in a second device.
On creating the second device recently, I setup all the interfaces in an identical fashion and then enabled all the options on the master and just the synchronization enabled and the interface name on the backup system. All the rules etc passed across OK.
On the backup I then clicked on the Enable CARP button on the status page. It almost immediately switched to backup. Great, job done. After a few seconds they both locked up. I shutdown the backup and the master carried on going OK. I then read up on all the stuff I could find on troubleshooting. I reverted the backup to the config before starting and then ran tcpdump on each interface to ensure I could see the advertisements for each CARP address. Check. Etc etc.
I have tried creating a mock up with three PFs in VMWare and that appeared to work OK, although I think I'll actually need five to do a better job of it! That seems a bit unecessary if someone can point out what I have done wrong. At teh moment the backup is OK if I just have the CARP (ie the synchro interfaces) and LAN plugged in. As soon as I plug in a WAN, OPT1 or OPT2 (which I've renamed to WAN2 and WAN3) they lock up.
I don't know what is actually the fault but I suspect that there is some sort of storm (perhaps multicast forwarding loop??) that goes on.
Hardware: Two PCs (Intel) SP, 1 GB RAM. One has a quad Compaq/Intel (fxp) card and a Realtek (rl0) the other a quad DLink (ste) and an fxp.
Please help me. I am going nuts. Also I have to test theories out of hours …
-
I know you said you tested, but verify you can ping every real interface of the backup firewall from the master firewall and vice-versa. It's fairly easy to screw up the carp traffic with gateway rules when you're doing multi-wan.
-
I know you said you tested, but verify you can ping every real interface of the backup firewall from the master firewall and vice-versa. It's fairly easy to screw up the carp traffic with gateway rules when you're doing multi-wan.
Good answer! To be honest its the sort of thing I tell other people when they have snags with networking. I have stared at the setup until my eyes water but its no substitute for proper testing.
I'll kill off the CARP on the backup and bring it back as just physical and do as you suggest.
I'll post back with the results tomorrow when I have a chance. If I could I'd remotely put the WAN connections back in and try it now but my arms are not 20 miles long.
Thanks for the reply with some good down to earth advice.
-
After quite a bit of testing and so on I seem to have the same issue as this: http://forum.pfsense.org/index.php?topic=16373
I added the sync components one by one and as soon as the load balancer config copied over then things went beserk. By unplugging all WAN connections bar one I was able to run tcpdump and see multicast traffic flooding the systems.
It seems that load balancers forward MCAST from your LAN to the other box which then repeats it back for a loop. CARP needs MCAST to work though. The bug is also seen in 1.2.3-RC1. I also have systems broadcasting in over VPNs connected to my ADSL routers so it is a bit hard to block all this.
See: http://forum.pfsense.org/index.php/topic,16566.0.html for potential fix.