Summary of making a redundant bridge using CARP & ifstated
-
Hi, I thought I'd share my experiences of setting up a transparent (bridging) redundant firewall.
HISTORY
I have inherited a system where the servers are on the same network as the clients all of which are attached to a router I have no control of. I wanted to put our servers into effectively a DMZ, but without being able to configure the router, not wanting the hassle of changing all the clients gateways, and many people have things mapped direct IPs on the servers all meant that the firewall needed to be a drop in bridge. And because all the servers were going to be behind it, need to be very reliable. I also wanted it to go FAST which puts all but the most expensive appliance systems out the window.So after considering various systems settled on pfSense.
HARDWARE
Intel 915GHA motherboards, 512Mb RAM, 40GB HDD, Intel Pro1000GT 2 port PCIe NICs for LAN/WAN, standard NIC for pfSync.STEPS
Get the live CD from:
http://snapshots.pfsense.org/FreeBSD6/RELENG_1_2/iso/
The naming convention is confusing, but when this boots it comes up as 1.2RC3, as opposed to 1.2RC2 which is (currently) the one on the download site. 1.2RC2 would crash the boxes when trying to enable CARP.Boot the CD and install to HDD. The cylinders/heads/sectors weren't detected correctly when I did this so had to enter them off the HDD specs. Installed without packet mode or GRUB.
Boot pfSense, configure the interfaces and then log into the webGUI.
Configure the following:
In System-Advanced:
Tick "Enable Filtering Bridge"In Interfaces->Lan
Bridge with: WANIn Interfaces->WAN
Specify an IP address and netmask of 24
Specify the gateway
Scroll down and untick "Block private networks"In Firewall->Nat->Outbound:
Select Manual Outbound Rule Generate
Save
Delete the Rule that appears
Apply ChangesTest by pinging from the LAN side to a machine on the other side. Pings should go through. If you enabled logging in the rule then you should see it loggint icmp messages.
MAKING IT REDUNDANT
CARP is a protocol designed to failover an IP address. I don't know exactly how it works, but am guessing that when the backup system (the one with the higher advskew) takes over the IP, it simply starts responding to arp broadcasts saying it owns the IP. Not much use for a bridge…but read on.Firstly, pfSense has preempt enabled. This means that if one CARP interface fails, then they are all failed over. Secondly you can detect the failure using ifstated, and then run a script to take the bridge0 up or down which does the actual failing over.
There are 3 interfaces:
LAN, WAN & pfSync.
On a bridge, the only one that NEEDs an IP is the pfSync one, but CARP needs something to work with, so give all the interfaces IPs. (I realise half the advantage of a bridge is it's hidden nature but I don't think there's anyway round this step). The IPs don't need to be public, in fact given that both sides of the bridge are on the same public subnet, and CARP needs the interfaces to be on different subnets, you will need to make at least one of the sides on a different (private) subnet.
So eg, configure:
LAN1 = 192.168.1.1/24
LAN2 = 192.168.1.2/24
CARP0 (or virtual IP) = 192.168.1.3/24WAN1 = 192.168.2.1/24
WAN2 = 192.168.2.2/24
CARP1 = 192.168.2.3/24Configure the advskews in so that one is the master on all interfaces.
On the CARP status screen on each box, one should be master and one slave. unplug the WAN link on the master and you should see them all swap over.
Next (the WAN links need valid IPs for this bit as it connects to the internet) log into the shell, and run the following command on both boxes:
pkg_get -r ifstated
This will install ifstated which is used to detect the CARP changes and bring the bridge up and down.
edit the file:
/usr/local/etc/ifstated.conf
and replace it with the following:init-state auto
carp_up = "carp1.link.up"
carp_down = "!carp1.link.up"state auto {
if $carp_up {
set-state primary
}
if $carp_down {
set-state backup
}
}state primary {
init {
run "echo now primary"
run "ifconfig bridge0 up"
}if $carp_down {
set-state backup
}
}state backup {
init {
run "echo now backup"
run "ifconfig bridge0 down"
}if $carp_up {
set-state primary
}
}HOW IT WORKS
On boot up the masters bridge is brought up, and the backups bridge taken down by the ifstated script.
When the master fails (or a network cable gets unplugged), CARP changes the backup's carp interfaces from backup to master.
ifstated detects this change and runs a script that brings the bridge0 interface up on the backup, (and takes it down on the master just in case)If the master then comes back up, the whole thing reverses, ifstated takes the backup's bridge down, and brings it up on the master.
In theory the masters failure should be totally transparent to users, however there are actually significant delays. The pfSense systems failover within a second or so, but with unmanaged netgear switches it can actually take a long time (several minutes) before the switch realises the port has changed.
If you test by sending pings through the system from A to B eg
A (- pfsense -) B
the pings will start timing out. Running
arp -d *
on A (assuming a windows system) after failing the master, and then pinging again will work. I'm guessing that when you clear the arp cache - and so A rebroadcasts - it clears the switches arp tables as well? Either way I need to test it with managed switches. It may also be possible to clear the switches lookup table using something like macof, or another arp flooding tool. -
Hi,
This is really great to see, thanks a lot, I have asked the mods to make this one sticky.
Did someone already use this way ?
Matts
-
I've modified this so it sends emails when the backup takes over. If people want I can supply the details. (Not much point in having things failover if you don't know they did :) )
I also experienced problems as was taking it into production with one pair of the CARP interfaces both being master if the bridge was up, which is the case just after booting.
I now explicitly shut the bridge down in a startup script, THEN run ifstated. This way ifstated doesn't think that both are master and keep the bridge up.
If you have managed switches, this solution could also be extended to use some sort of "fencing" whereby the backup ssh's into the switch, and disables the masters port, to avoid a split brain situation. It could also maybe force the switch to do a refresh on the mac tables so that everything fails faster.
Using one cisco, and one 3com managed switch, the failover time is about 5-10sec, without the above set up. I can live with 10sec.
-
I've modified this so it sends emails when the backup takes over. If people want I can supply the details.
Hi, I'm interested on the updated script, can you send it?
Thank you in advance.
-
Can you add these features to the GUI and provide us a patch? This would be helpful to have.
-
Is someone willing to make this a more step by step howto ?
I would like to see the additional info about mailing too :)
-
Anyone any info ?
-
Read the first post:
This IS a step by step.If you dont understand how to do it you most probably shouldnt try.
-
Read the first post:
This IS a step by step.If you dont understand how to do it you most probably shouldnt try.
Do you see the changement about the emails somewhere ? ;)