PfSense CARP setup
-
I used pfSense for quite some time in the past, until some of the work I do made it helpful to switch to a Watchguard UTM firewall.
I was recently tasked with setting up a pfSense HA cluster. Something that despite having used pfSense, I've never done before.
I have some former checkpoint firewalls that have been re-tasked as pfSense firewalls that I keep around for testing/backups/etc, so early last week, I used a pair of these to put together a 'proof of concept' since I had never done it before. I searched around and found some 'how-tos' (such as This one - all of the ones I found were essentially the same) on setting up a CARP cluster, and I thought I had it working - I thought that every permutation I tested (LAN link down, WAN link down or entire firewall down) worked perfectly, with my continuous ping to an external IP picking up after a failover delay and continuing on happily. The one thing I actually didn't do in that testing was set up a WAN VIP, but as I recall, everything I tried worked perfectly.
This weekend I received some official pfSense firewalls for the implementation, and proceeded to program them with the known LAN parameters and the WAN parameters required to make it work for now in my test setup. In my testing, I found that if I pulled the WAN link on the master, the firewalls 'went stupid' - the pattern I generally saw was a dropped ping followed by two successful pings over and over. This seems to have been the pattern now regardless of if it's the WAN link, LAN link, or the whole primary firewall. For whatever reason, its just not failing over 100%. I reconnected my test firewalls (nothing had changed with them since my last testing other than being unplugged), and I saw similar weirdness, so maybe in my initial testing, I missed something. I've tried both with a WAN VIP and associated NAT rules as well as without, and no matter what I do, the behavior seems to be the same kind of flaky at best, or at worst, no failover at all.
If I'm not mistaken, what should happen is if any fault should occur, be it the (or if multiple WAN links, all) WAN link down, LAN link down, or the entire firewall down, the backup should take over and assume all roles the master held, resulting in normal operation after a brief period of the backup assuming the roles - This is not what I've observed happening with these firewalls.
All four involved firewalls were upgraded from previous versions and are currently running the latest release (2.2.6). The official pfSense firewalls came with 2.2.4 and the initial setup started with that version and was upgraded mid-stream, but after it not working at all post-upgrade, I reset them both to factory and started over, getting me to where I am now.
What am I missing? Is there any better 'How-To' than the one I linked?
Any other thoughts?
-
Hello, good morning!
I have a pair (2.2.6) on HA under VMWARE ESXi 6 for my production environment, it is working really fine for me, apart from an issue that I can't use my WAN CARP IP for outbound traffic. This how to is simple and have everything you might need:
http://blog.thedarkwinter.com/2015/03/pfsense-ha-hardwaredevice-failover.html- For my sync interface I'm using a crossover cable, NIC to NIC.