Hardware redundancy

matp

So today, we had a firewall appliance fail. Site offline :(

We use a rack mounted appliance from applianceshop.eu (not hyperlinked, not intended as promotion) and we've been happy with them, if I remember correctly, there were no appliances available from pfsense when we first purchased one. Anyway, one of them broke.

So we had a spare, loaded a backup of the site config onto it and drove it for three hours to site to replace the failed one. Problem solved. Win.

I want better though. So I'm interested in opinions.
We are thinking of sending a cold spare to each site, preconfigured. Then if it happens again, an operator on site could swap the network and power cables from the broken one into the cold spare and with a little luck it'll work.

There has to be a better way.

I'm struggling with the 'default gateway' issue. Given that all the devices at each site have a default gateway setting, if that gateway goes away, how can auto failover work? Seems like there always has to be a single point of failure.

Derelict

Look at HA and CARP.

With CARP you would give each failover node an interface IP address and they would share a CARP VIP.

Say the LAN is 192.168.10.0/24

CARP 192.168.10.1/24
Primary: 192.168.10.2/24
Secondary: 192.168.10.3/24

When everything is normal, the primary controls the MAC address of the CARP VIP.

When the secondary senses a failure (no heartbeat received from primary) it "takes over" the MAC address of the CARP VIP. So you would have the hosts point to .1 for their default gateway.

Note that to do this with public IP addresses you need at least a /29 or otherwise three addresses (like if your colo is on a /24 or something.)

There is also a pfSync link between the two nodes to keep things like rule/config changes and firewall states intact.

Harvy66

While you're waiting for someone more knowledgeable to respond, PFSense does support high availability fail-over via CARP.

johnpoz

this should help
https://doc.pfsense.org/index.php/Configuring_pfSense_Hardware_Redundancy_%28CARP%29

If your a cisco guy think of HSRP… Where you have multiple routers share the same vip, and the active one owns the mac of the vip. If primary fails another router in the cluster takes over the vip and clients that are using it as the gateway are no wiser that traffic is going through different router.

There are 3 different protocols that pretty much all do the same thing
VRRP [Virtual Router Redundacy Protocol], HSRP [Hot Standby Router Protocol] and CARP [Common Address Redundancy Protocol].. Read up on any of the 3 and you will get some understanding of how clients always point to the same gateway.

Hope that helps. If you do not have more than your 1 public IP, you can do it behind nat… with rfc1918 space as your wan for your carp setup - but then your back to a single point of failure with that device doing the nat.

matp

Well alright!
So the good news is, Yes, this is possible. The bad news is I understand none of it!!!
That's nothing a good read won't fix, so I'll get to studying.

On the surface of it, it seems that although devices 'think' they are routing traffic based on IP, the switch is really doing the routing based on MAC, and these methods leverage that to get the traffic to a different point without the devices needing to be told. I'm not sure where the public IP issue comes in to play, but the sites do all have a /29, so I'm good there anyway.

Thanks for the informed tips, I'll break out the books.

Derelict

They are still routing to the IP address.

But on the local segment traffic to the gateway address is ARPed then traffic is actually sent to the MAC address.

All that happens is the backup node starts responding on the MAC address and the CARP VIP. The hosts don't see anything change. The switch moves the MAC address to the new port automatically.

It all works pretty well. Google for CARP, HSRP, and VRRP as has been mentioned.

Good writeup in the pfSense book. You'll want to read the 2.2 release notes since the book is 2.1 and CARP VIPs changed in 2.2.