Single point of failure - pfsync

goran

Hi guys,

Currently I'm looking for a way to get fully redundant and clustered router/firewall in our datacenter for servicing an ESX clustered virtualized enviroment.

Because we have a lot of customers running on these servers it needs to be fast and reliable. Pfsense soon came in the picture. :)

Currently I'm running pfsense on two embedded systems (CF) on two WAN connections in a test enviroment, GREAT product! One thing seems to be missing though.

There still is one 'single point of failure' in my setup and that is de pfsync connction. Because it only runs on two (one on each cluster node) nic's a failure of one would mean an instand 'split-head' and a failure of the entire cluster. Ofcourse I connected both nic's with a cross-cable, no switches there.

Is there a way to configure TWO pfsync connections and therefore useing four nic's and two cross-over cables to get this last single-point out?

Thanks guys!

goran

Hi Guys,

Anyone know? I've been looking in the forum but can't seem to fnd anything on this topic.

It's important information as the entire FW / RTR choice depents on it.

Thanks a lot!

GruensFroeschli

You could setup on both nodes on LAN a VLAN and exchange CARP infos over this VLAN.
If one interface/node dies then the other node has to take over in any case.

morbus

A CARP cluster won't fall apart if the sync connection fails. The current master remains master and salve stays a slave. You wont end up with two masters.

I think the decision on who is master for a VIP is done on the interface for that VIP ie LAN VIP communication is done on the LAN interface.
The Sync interface does all the state copying and XML-RPC copying of rules etc.

So if your sync interface was to fail you would lose the continuous state copying failover but it would still fail over.

I don't know if you could have 2 pfsync interfaces.

goran

Hi guys,

Thanks for the response.

I'm a bit confused. The whole idea behind pfsync is informing eachother of there status, right? If that link dies no sync comm. will take please either way and both will asume the other is down and initiate failover. To my opinion this would result in a 'split-head' 'master-master' situation. Please correct me (again ;) ) if i'm wrong.

Best option seems to be transferring all pfsync traffic of the nics wich also hold the VIP, if one goes done the other MUST failover either way. Though this seems to be the best solution I'm woundering why a dedicated CARP interface is neccesary at all!? During setup i remember reading articles specificly telling me to use a cross wired cable in between and not use swiches of any sort to prefect comm. loss.

I've some experience with german made 3rd party windows clustering software and they appoint TWO nics on each server (maximum of two servers) to use their 'pfsync' setup. should one fail no 'split-head' will occur.

Perhaps I'm a bit hardheaded on this but please correct me if I'm wrong ;-)

Thanks!

cmb

You seem to have everything mixed up. pfsync only synchronizes the firewall state table, so all your active connections don't drop in case of failover. CARP handles the failover process, and runs on every interface (only way it can detect the failure of a single interface). It is recommended to use a dedicated pfsync interface for security and performance reasons.

goran

Ah, thanks a lot!

Now I understand and after reading some instructions again I see what you mean.

I was a bit off because of other failover software I used to work with. That uses those kind of 'lines' to failover as well.

Thanks for bringing me up to speed on this.

cmb

Now that I read the previous posts more thoroughly, you do have somewhat of a point but it's not nearly as bad as you described. If you lose your pfsync interface, and later one of your boxes fails and CARP switches your backup to master, then you're going to lose your state table. In some environments that's a major problem, in many nobody will even notice. But the chance of losing your sync interface and later having a complete failure of the primary (without the primary completely dying in the first place) is remote.

goran

Thank you very much for the info.

The state table won't become a problem as far as I can judge now.

Now it realy is a GREAT firewall! ;-)