Avoid auto failback to reduce VPN client interruptions

mjugmans

Relating to how CARP VIPs are maintained in pfSense, I am trying to avoid auto-failback under the condition where primary server has come back online, with the aim to reduce the number of VPN client interruptions.

I have followed this guide to build a highly available OpenVPN service which works as per the guide. However, and I have noted similar questions to this (Re: carp master slave - manage master recover).

Setting the server in CARP maintenance mode appears to be a workaround by setting the advertised skews the same on both servers, but it's unpalatable as the LAN and WAN interfaces failover independently, and also the idea of running a production system in Maintenance mode as the normal will be a source of confusion.

Initially I started working in OPNsense, noted in the HA settings is the Disable Preempt checkbox which you would only set on the configured Master. The help text is: "When this device is configured as CARP master it will try to switch to master when powering up, this option will keep this one slave if there already is a master on the network. A reboot is required to take effect." Under this setting, the advertised skews are set equal, and also LAN and WAN interfaces fail over together. Is this something that can be done in pfSense? Is this function under the hood, in the config.xml or deeper perhaps? Grateful for your thoughts,

Kind regards,
-Matthew

Derelict

@mjugmans said in Avoid auto failback to reduce VPN client interruptions:

Setting the server in CARP maintenance mode appears to be a workaround by setting the advertised skews the same on both servers, but it's unpalatable as the LAN and WAN interfaces failover independently

No idea what you're talking about here. It does not make the advskews the same on both nodes. The default advskew on the secondary node is 100. Maintenance mode sets all advskews to 254. Nor do I understand what you're talking about with the interfaces failing over independently.

You can install the shell command package and run this at boot on the primary:

/usr/local/sbin/pfSsh.php playback enablecarpmaint

I can't remember whether that should be an early shell command or regular but that will put the master node into maintenance mode every time it boots. You can then take it out of maintenance mode when you want to and fail back.

the idea of running a production system in Maintenance mode as the normal will be a source of confusion.

Why would it be the normal? Fix the problem and fail back to it when it is less-impactful to do so.

mjugmans

Thanks Derelict. To be clear, when the pair of pfSense nodes are not in CARP maintenance mode, the Primary has advskew set to 0, and the Secondary set to 100. When I set each node to enter maintenance mode, advskews in both nodes (and both WAN/LAN interfaces) are set equal, 254 exactly as you've stated. One can find the advskew value using ifconfig on an interface to look for:

carp: MASTER vhid 1 advbase 1 advskew 254

Regardless of maintenance mode state, the sysctl net.inet.carp parameters are equal between the two nodes:
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 0
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1

In summary, I don't care which node is MASTER, but I only want another node to become MASTER when there is a failure, eg OpenVPN service, VM or WAN failure. I want both interfaces (WAN/LAN) on the node to fail over together, not for one to be MASTER on LAN interface and BACKUP on WAN interface (otherwise I'll have created a major routing problem).

In comparing with OPNsense, it seems a significant difference is how failback is managed. I am investigating this further to see what the configuration differences are, and will post up results when I have run some tests.

Derelict

If the advskew is 254 it is almost certainly in maintenance mode. The unit will not fail over unless it loses an interface on link down. It will not fail over on "OpenVPN service, VM or WAN failure." I am not sure what that means exactly.