2 VIPS, one inside, one outside. inside one fails over, outside does not.

gwhynott

Hello and happy Friday!

We have 2 pfsence boxes with 3 NICs each, one dedicated to heartbeat, one on the inside, one on the outside. The inside interfaces connect to an internal router, the outside ones to the 'internet routers' switches.

We have a VIP both on the inside and outside interfaces, using CARP between the boxes.

Today an event caused the outside link of the MASTER node to go down for a few minutes. This was the first unplanned fail-over event since deployment.

The VIP on the MASTER's outside interface migrated to the SLAVE unit.

Things started breaking in terms of web browsing. Some sites would load while others would not.

Because only the Outside VIP migrated, this created an unexpected asymmetrical routing path, which isn't bad itself. the issue is the other firewall wasn't expecting return traffic, therefore it was dropping everything coming back.

as far as i can tell the only differences between the boxes is the master has the lightsquid reporting, mtr, and dns server (not configured or used) package installed.

The Q's:

If there is an event which triggers a VIP migration, shouldn't of both(all) VIPs be migrated to prevent the aforementioned situation?

Is there a setting or config file that could be updated if one desired this type of behavior?

i wonder if running a routing protocol between them on the heartbeat would of helped here…

Was a bit of a bummer this happened as we just had a hardware failure several weeks ago and i was asked by management what we could do to mitigate the chances of this happening again. My solution was another pfsence box and CARP. less than 3 weeks in service and this happens. oh well.... 8D

thanks and have a great weekend,
greg

dotdash

Curious. pfSense should have preempt turned on, which IIRC, lets the backup grab all the VIPS when the master drops one. I would carefully check your CARP config on both boxes. Get one of the tutorials and try to verify every step. If you look at ifconfig output on both boxes, they should match but the master should have advskew 0 on the virtuals and the backup should show 100. You can check the preempt with sysctl net.inet.carp

gwhynott

@dotdash:

Curious. pfSense should have preempt turned on, which IIRC, lets the backup grab all the VIPS when the master drops one. I would carefully check your CARP config on both boxes. Get one of the tutorials and try to verify every step. If you look at ifconfig output on both boxes, they should match but the master should have advskew 0 on the virtuals and the backup should show 100. You can check the preempt with sysctl net.inet.carp

i'll check again but i did follow the document on the wiki, did our tests pre production and it worked as expected. we played with it for a few hours (the fail over/back). in any event i'll go over the docs again. thanks for your comments.

greg

marcocarrera

Hi,

I have the same problem.. but in the opposite sense :)

In my office I have 2 Pfsense (Manster and Slave). Syncronization works fine if one of the two fw fails.
I want set this behaviour: if WAN link fails in the first fw, I want that all VIPs (e.g LAN VIP) must be moved to slave fw

Seems not possible to "connect logically" two virtual IP managed by 2 different CARP.
If something happen on the first IP CARP , all IP CARP "connected to the first" must be affected.

Let me know if someone has solved this problem
Regards
Marco

dotdash

Go to diagnostics, command prompt and check the output of sysctl net.inet.carp
preempt should be 1 and suppress_preempt should be 0
Also compare ifconfig on the master and backup and verify they match and the advskew is 0 on the master and 100 on the backup.

marcocarrera

These are my results

MASTER
$ sysctl net.inet.carp
net.inet.carp.allow: 1
net.inet.carp.preempt: 1
net.inet.carp.log: 1
net.inet.carp.arpbalance: 0
net.inet.carp.suppress_preempt: 4

vip1: flags=49 <up,loopback,running>metric 0 mtu 1500
inet 81.29.159.2 netmask 0xffffff00
carp: MASTER vhid 1 advbase 1 advskew 0
vip2: flags=49 <up,loopback,running>metric 0 mtu 1500
inet 192.168.32.1 netmask 0xffffff00
carp: MASTER vhid 2 advbase 1 advskew 0

SLAVE
$ sysctl net.inet.carp
net.inet.carp.allow: 1
net.inet.carp.preempt: 1
net.inet.carp.log: 1
net.inet.carp.arpbalance: 0
net.inet.carp.suppress_preempt: 4

vip1: flags=49 <up,loopback,running>metric 0 mtu 1500
inet 81.29.159.2 netmask 0xffffff00
carp: BACKUP vhid 1 advbase 1 advskew 100
vip2: flags=49 <up,loopback,running>metric 0 mtu 1500
inet 192.168.32.1 netmask 0xffffff00
carp: BACKUP vhid 2 advbase 1 advskew 100

If I disconnect the WAN link on Master Firewall, only the WAN vip is migrated on Slave firewall.
LAN vip remains on Master Firewall.
LAN vip is used as Default Gateway on LAN.
As you can understand, in this way, is not possible go out of the LAN if WAN link is down on Master Firewall</up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running>

dotdash

This is your problem:
net.inet.carp.suppress_preempt: 4

From the man page:
net.inet.carp.suppress_preempt
A read only value showing the status of preemp-
tion suppression. Preemption can be suppressed
if link on an interface is down or when
pfsync(4) interface is not synchronized. Value
of 0 means that preemption is not suppressed,
since no problems are detected. Every problem
increments suppression counter.
Carp is detecting some issue and not letting all the VIPs fail over. Not sure where to go from here- I would verify everything was good with the sync for a start.