Linking CARP VIP's?

jwelter99

Hello,

We are using CARP VIP's on both the WAN and LAN networks and want those to be linked - so if the WAN goes down and the VIP's move to the other system, the LAN side also moves.

I know VRRP under Linux can handle this, but is there a way to get this behaviour in PFSense?

podilarius

If you setup the cluster by following the guide in the book or online, then the default behavior is that if LAN or WAN becomes unavailable on the primary, it will failover all CARP VIP to the next secondary FW.

jwelter99

@podilarius:

If you setup the cluster by following the guide in the book or online, then the default behavior is that if LAN or WAN becomes unavailable on the primary, it will failover all CARP VIP to the next secondary FW.

I found an earlier thread with the exact same issue: http://forum.pfsense.org/index.php?topic=46377.0

My settings do appear correct:

$ sysctl net.inet.carp
net.inet.carp.allow: 1
net.inet.carp.preempt: 1
net.inet.carp.log: 1
net.inet.carp.arpbalance: 0
net.inet.carp.suppress_preempt: 0

This problem is triggered when the WAN NIC hangs which is a new issue for us since 2.0.1 but we thought 2.0.3 pre-release solved it but seems to come back when pfblocker is enabled… very odd behaviour for hardware that was solid under 2.0 and 2.0.1 but has become flaky under 2.0.2 and later.

Reiner030

@jwelter99:

This problem is triggered when the WAN NIC hangs which is a new issue for us since 2.0.1 but we thought 2.0.3 pre-release solved it but seems to come back when pfblocker is enabled… very odd behaviour for hardware that was solid under 2.0 and 2.0.1 but has become flaky under 2.0.2 and later.

I guess that you run in the same problem as I when I tested it with "ifconfig <dev>down" and <vlan remove="" from="" interface="">.
The system did not recognize the disconnect and make only "single failover"…

You could check if this is the problem by disconnecting cable from port (or deactivate switch port):
http://forum.pfsense.org/index.php/topic,58917.msg316610.html#msg316610

Bests

Reiner</vlan></dev>

jwelter99

@Reiner030:

@jwelter99:

This problem is triggered when the WAN NIC hangs which is a new issue for us since 2.0.1 but we thought 2.0.3 pre-release solved it but seems to come back when pfblocker is enabled… very odd behaviour for hardware that was solid under 2.0 and 2.0.1 but has become flaky under 2.0.2 and later.

I guess that you run in the same problem as I when I tested it with "ifconfig <dev>down" and <vlan remove="" from="" interface="">.
The system did not recognize the disconnect and make only "single failover"…

You could check if this is the problem by disconnecting cable from port (or deactivate switch port):
http://forum.pfsense.org/index.php/topic,58917.msg316610.html#msg316610

Bests

Reiner</vlan></dev>

Thanks, that is the issue with one twist…

What seems to occur is that WAN on pfs1 goes into some odd state where it's not passing traffic. It's up but dead. This trips the WAN side VIP's to fail over to pfs2. But for some reason the LAN side do not move. If I manually take down the WAN switch port on pfs1 then the LAN side fails as expected.

I can also ssh into pfs1 from pfs2 over the pfsync interface and get to the shell and reboot it.

So I think the real issue is why WAN on pfs1 is going into the zombie state.....

Reiner030

@jwelter99:

Thanks, that is the issue with one twist…

What seems to occur is that WAN on pfs1 goes into some odd state where it's not passing traffic. It's up but dead. This trips the WAN side VIP's to fail over to pfs2. But for some reason the LAN side do not move. If I manually take down the WAN switch port on pfs1 then the LAN side fails as expected.

Yes, I think that is equal to my actual luckily only theoretical problem:

@Reiner030:

One question left:
What happens if the gateway goes down (it's behind a switch, so "virtual" disconnection like the "interface down")?
Is it possible to use some mechanism to combine Gateway failover and CARP failover as group between master-slave pfSense boxes if the slave has still connection to the gateway?

jimp

You can't combine gateway failover with CARP failover. It's not meant to work that way, and there isn't a good way to tie them together in a reliable way.

If the port doesn't lose link it won't go down. This can be an issue for VMs, unless you bond the VM nic to a physical NIC, or you have been smart enough to script the link state of the physical NIC with an action that takes down the associated virtual switch.

Reiner030

@jimp:

You can't combine gateway failover with CARP failover. It's not meant to work that way, and there isn't a good way to tie them together in a reliable way.

Gateway failover in "normal" defintion in pfSense Web GUI is to have different WAN links/gateway and switch between them on master firewall.
This is not our intension here…

The problem is that the gateway is not reachable by firewall-master but by firewall-slave.
So even it's a "gateway down" and no "link down" its effently a "WAN link down" problem on master firewall...
And such behavior yells for an active gateway/CARP failover ;)

If the port doesn't lose link it won't go down. This can be an issue for VMs, unless you bond the VM nic to a physical NIC, or you have been smart enough to script the link state of the physical NIC with an action that takes down the associated virtual switch.

It's same problem If I'm behind a physical switch running pfSense on a physical server…

=> If this is not automatically catchable in a trap by CARP device itself then perhaps the gateway failover mechanism in pfSense "master down" script must initiate such "carp down" event?

EDIT:
Perhaps the problem why it's still not implemented is that the master must be sure that the slave has still connection to the gateway and other networks; if both have no connection than it would be bad (ping-pong between master-slave or even the slave goes "more slave" prio)... ^^

Perhaps it can help here to define an additional gateway "peer gw" which holds IPs of CARP team partner on all interface to check

what connections are available/lost
if peer gateway has still connection to WAN/LAN/...
perhaps with configurable parameter what do to when x/y/z fails

Further problems (not often I guess):

what should happen here if master has more than one slave partner?

jimp

There is much more to it than that. Gateway status means nothing to CARP status. It's not something that you can assume has any relation whatsoever. In certain cases it might be close, but that does not make it a general solution.

If there is a loss of connectivity, the slave will take over, but for the master to self-demote, it must lose link on an interface. (Or you can manually disable CARP of course)