Master/Slave setup of two PFSense Firewalls
<Upstream Provider> | | WAN-Switch-1 WAN-Switch-2 | | | | FW-1m --- FW-1s FW-2m --- FW-2s | | | | ----------<IRF-Core-Switch>---------
Our current setup consists of 4 Sonicwall Firewalls, of which 2 operate in Master(m)/Slave(s) mode for the two subnets we own. The plan is to replace the 4 Sonicwalls with PFSense for which we have 4 dedicated boxes.
Our coreswitch, where the firewalls connect with their LAN sides, consists of 2 HPE Switches, which are aggregated as one logical switch via IRF. That setup works so far without problems
The LAN side of the firewalls consists of a LAGG on 2 ports of the coreswitch
On WAN side the firewalls are connected via a WAN switch to the upstream provider. Our provider provides us with both subnets on both uplink cables (this cannot be changed by our upstream provider). To overcome this we have to disable STP on the WAN switch port on which the uplink to the provider is connected.
Master and Slave are connected via a dedicated cable connection on dedicated ports (we use it for pfsync)
WAN and LAN ports of the firewalls are bridged together (via bridge0)
As we need a Trigger to disable the bridge on the Slave firewall and enable it in case the Slave becomes Master we decided to use CARP. CARP is configured between Master and Slave on the LAN interface. We used a unique VHID for each Master/Slave pair. Master and Slave sync via the dedicated connection. Now we use devd to listen for the CARP event (Master goes down) and trigger a shell script which brings bridge0 up on Slave. The switching between Master and Slave does work well. Problems arise when all 4 PFSense boxes are connected and the Sonicwalls are down/disconnected. Then all over sudden the network gets very slow, unstable and even internal connections within the subnet itself are affected. It looks like a loop or a massive paket storm. When we disconnect the WAN side of one Master then the network goes back to normal (except we lose half of the network connections ;-) )
Currently we suspect a problem with the LAGG (LAN) of the new firewalls and the coreswitch. As we only need CARP as a trigger (see above) we thought about removing the CARP from LAN interface and move it to the direct connection between Master and Slave. That way the CARP traffic would stay away from coreswitch and could be ruled out for any loop/paketstorm issue. Another advantage of this is that we could bring not just the bridge down but as well the birdge member interfaces (LAN and WAN) of the current Slaves. So it would be much more unlikley that the Slaves could mess up with the subnets
I'm aware that this is not too much detailed description, but I thought if I describe every step/setting we have then the posting would never been read as its too long. Our hope is that someone here did such a setup and knows the "well-known" traps one can step into :-) If more details are needed I will provide it.
Thanks for reading and any idea on how to fix
Would it be possible to buy payed support from Netgate to help us with this setup?
That would be handled by Netgate Professional Services.