OpenVPN interfering with CARP Failover
-
@UserCo We can reproduce this issue with our ha cluster running version 23.09.1
Some process is killing states it should not touch when bringing the openvpn server up on the active node.
We are currently discussing this issue with the Netgate Support. -
Hmm, the OpenVPN tunnel network is shown in the auto outbound NAT rules which means pfSense sees as a LAN. That should mean it doesn't run any of the WAN IP scripts when it comes up.
However, what is logged when it does?
If you restart the OpenVPN server without failing over does that also break existing connections?
-
@stephenw10 It does not break connections consistently when just restarting the ovpns.
It does for example if you fail over to your secondary node and the restart the ovpns on the primary (inactive) node, but only on the first restart of the service.Here are the logs for that specific scenario:
OpenVPN Log Feb 2 17:59:04 openvpn 49241 Initialization Sequence Completed Feb 2 17:59:04 openvpn 49241 UDPv4 link remote: [AF_UNSPEC] Feb 2 17:59:04 openvpn 49241 UDPv4 link local (bound): [AF_INET] x.x.x.181:1201 Feb 2 17:59:04 openvpn 49241 /usr/local/sbin/ovpn-linkup ovpns2 1500 0 10.150.11.1 255.255.255.0 init Feb 2 17:59:04 openvpn 49241 /sbin/ifconfig ovpns2 10.150.11.1/24 mtu 1500 up Feb 2 17:59:04 openvpn 49241 TUN/TAP device /dev/tun2 opened Feb 2 17:59:04 openvpn 49241 TUN/TAP device ovpns2 exists previously, keep at program end Feb 2 17:59:04 openvpn 49241 WARNING: experimental option --capath /var/etc/openvpn/server2/ca Feb 2 17:59:04 openvpn 49241 Note: OpenSSL hardware crypto engine functionality is not available Feb 2 17:59:04 openvpn 49241 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts Feb 2 17:59:04 openvpn 49120 DCO version: FreeBSD 14.0-CURRENT amd64 1400094 #1 plus-RELENG_23_09_1-n256200-3de1e293f3a: Wed Dec 6 21:00:32 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1-main/obj/amd64/Obhu6gXB/var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1 Feb 2 17:59:04 openvpn 49120 library versions: OpenSSL 3.0.12 24 Oct 2023, LZO 2.10 Feb 2 17:59:04 openvpn 49120 OpenVPN 2.6.8 amd64-portbld-freebsd14.0 [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [MH/RECVDA] [AEAD] [DCO] Feb 2 17:59:04 openvpn 13686 SIGTERM[hard,] received, process exiting Feb 2 17:59:04 openvpn 35804 Flushing states on OpenVPN interface ovpns2 (Link Down) Feb 2 17:59:04 openvpn 13686 /usr/local/sbin/ovpn-linkdown ovpns2 1500 0 10.150.11.1 255.255.255.0 init Feb 2 17:59:04 openvpn 13686 /sbin/ifconfig ovpns2 10.150.11.1 -alias Feb 2 17:59:02 openvpn 13686 event_wait : Interrupted system call (fd=-1,code=4) Syslog General Feb 2 17:59:05 php-fpm 35457 /rc.newwanip: Interface is disabled, nothing to do. Feb 2 17:59:05 php-fpm 35457 /rc.newwanip: rc.newwanip: Info: starting on ovpns2. Feb 2 17:59:04 check_reload_status 467 rc.newwanip starting ovpns2 Feb 2 17:59:04 kernel ovpns2: link state changed to UP Feb 2 17:59:04 check_reload_status 467 Reloading filter Feb 2 17:59:04 php-fpm 43717 OpenVPN PID written: 49241 Feb 2 17:59:04 check_reload_status 467 Reloading filter Feb 2 17:59:04 kernel ovpns2: link state changed to DOWN
-
Hmm, I expect the server to be down already if it's running on the CARP VIP since that will be unavailable on the backup node.
Does your tunnel subnet also appear on auto OBN rules?
Is the server interface assigned?
-
@stephenw10 In my scenario described above ovpns is not running on a carp ip but native wan, if we bind it to a carp ip states clear on each failover no matter what.
If ovpns is bound to native wan ip states do not reset with each failover and ovpns server will not stop and start based on carp ip status.
Tunnel subnet is in auto OBN rules.Ovpn server interface is not assigned currently for debugging.
Here are some clarifications:
- If opvns is disabled no states clear and everything works perfectly
- if opvns is active and bound to carp wan ip each failover clears states (ovpns starts and stop depending on carp state)
- if opvns is active and bound to native wan ip carp failover do not usually trigger a state reset, but do in some cases. (if you restarted opvns on the passive node)
- first restart of ovpns often resets states, following restarts do not until you fallback or you do interface changes
- disabling opvns on the primary node and syncing the config to the secondary node will cause state reset everytime carp is active on the secondary node (until you enable and disable ovpns again on the secondary node)
- We do not have any issues with xmlrpc or state sync, they work perfectly fine
To me it looks like this is some wrapper handling bs but i cant find the script or function causing it.
-
Ah, OK that's not the same then.
So do you have the server assigned as an interface?
Do you see the tunnel subnet in auto outbound NAT rules?
Which states are cleared?
If the server is running on the WAN directly I assume it's there only for access to the firewall itself. Rules passing traffic there should be set to not sync states since they would not be valid on the other node.
-
@stephenw10 Not the same as what? It is exactly the same issue @UserCo is experiencing.
If an OpenVPN server is bound to a wan interface, wan states are cleared if the service starts or stops after an interface change.@stephenw10 said in OpenVPN interfering with CARP Failover:
So do you have the server assigned as an interface?
As stated before, there is not interface assigned for the ovpns currently for debugging, but this does not seem to make a difference.
@stephenw10 said in OpenVPN interfering with CARP Failover:
Do you see the tunnel subnet in auto outbound NAT rules?
As stated before, we see the tunnel network on outbound NAT rules.
@stephenw10 said in OpenVPN interfering with CARP Failover:
Which states are cleared?
At least all WAN states are cleared, we have not verified if really all states are cleared since it does not matter that much in our case.
@stephenw10 said in OpenVPN interfering with CARP Failover:
If the server is running on the WAN directly I assume it's there only for access to the firewall itself. Rules passing traffic there should be set to not sync states since they would not be valid on the other node.
We have multiple OpenVPN server for different purposes, the ones that are directly bound to the native WAN interface are only for accessing the firewall itself.
We also have OpenVPN servers for other purposes that need to be on a carp ip. I am not worried about invalid states and that is not the issue here.Support suggested to bind the OpenVPN servers to localhost and then NAT from the carp ip/wan interface to localhost.
This "resolves" the issue that states are cleared during failover, but creates other unwanted sideeffects. -
@dkoruga @stephenw10 Thank you for the inputs. Yes for me it behaves exactly like @dkoruga describes. Is this a known Bug in Pfsense? what can I do about it? when I try the suggested workaround from you @dkoruga with having the OpenVPN server on localhost and doing the port forwarding, the failover does not break the states anymore but also the OpenVPN server does not send an exit notify to the clients so they don't try to reconnect. How do I get the Clients to reconnect? If that would work, I would be satisficed with this workaround as it ticks all the boxes.
What are the mentioned "other unwanted Sid effects"?
Thanks
-
@UserCo Netgate Support confirmed the issue we are seeing here is https://redmine.pfsense.org/issues/13569
First unwanted side effect is the missing exit notify during shutdown of the server as you mentioned.
In result you have to reduce the client ping timeout to a low value to make the client reconnect after some seconds.
Even if you put this as low as 1 or 2 seconds, with exit notification the failover is way more seamless for the client.Second is that ovpns will not see the real client ip without additional magic
Third there could be additional side effects if any packets are received by your inactive firewall node since this node will have the tunnel network in its routing table.
We are considering "commenting out the line /sbin/pfctl -i $1 -Fs in /usr/local/sbin/ovpn-linkdown" as mentioned as a workaround in the bug tracker since i can not imagine an unwanted state on this interface in our configuration, and if there is then i will make sure these states are not synced within our firewall rules in the first place.
@dkoruga said in OpenVPN interfering with CARP Failover:
To me it looks like this is some wrapper handling bs but i cant find the script or function causing it.
It is funny how the line "/sbin/pfctl -i $1 -Fs in /usr/local/sbin/ovpn-linkdown" was my first suspicion 10 minutes into debugging and was then scrapped in my head as the variable is logged and when i tried to execute the command by hand 0 states were cleared as described in the bug tracker conversation.
-
@dkoruga Thanks a lot. This workaround worked for me.
-
@dkoruga said in OpenVPN interfering with CARP Failover:
https://redmine.pfsense.org/issues/13569
Hmm, interesting. Since you're not running it on the CARP VIP I wouldn't expect that to apply to you. I wouldn't have expected exit notification to apply either since the server running on the WAN IP would not shutdown. Unless it loses link entirely.
The server still sees the real source IP when you forward to localhost. There's no source NAT there.
Steve
-
@dkoruga I can confirm it still happens in 2.7.2 and the /usr/local/sbin/ovpn-linkdown fix worked for me.
Normal HA with OpenVPN in WAN CARP. Commenting the line made the trick -
I have a similar problem with carp and vpn, however I use the openvpn interface being a gateway group, being a gateway where carp is running the vpn does not connect, the other one that is outside carp is normal, the same case as comment Could the line solve this?
-
OpenVPN is part of a gateway group? On the gateway group? Unclear exactly how you have that setup.
-
@stephenw10 Good afternoon, I have a VPN in the following configuration.
In the System - Routing part it looks like this
(I don't know if the part where this ASLGW with the carp's IP is working, as it was due to some testing)In the export part, I put the IP of 1 interface as default and then in Additional configuration options (still in export) I put remote ip port udp4
How it works, the VPN tries to connect to the first IP and if it is out, it goes to another IP. This configuration may not be the best and there are better ones, but without the carp part it has always worked for me.
-
Hmm, OK. So what exactly are you seeing happen?
-
@stephenw10 When connecting to the link that operates with CARP, it doesn't work, it fails due to a TLS error, and goes to the next connection that works, I saw in the firewall logs that the connection "Default deny rule IPv4 (1000000103) was being blocked )", because it arrived at the firewall as the carp's IP, but the firewall's dealings were for the interface's IP, I made the change, but there was no result in connecting the VPN, I needed to generate a new VPN with the VPN's interface being the carp, this is the only way to connect to the vpn, but if I keep the 2 vpns, each one on 1 link would I be able to work with just 1 .ovpn file on the computer?
-
Was the primary gateway still up at that point?
This is a VPN server so it isn't connecting out it just listens for incoming connections. There is no reason it can't listen on both WANs all the time, no need to use a failover group there.
See: https://docs.netgate.com/pfsense/en/latest/vpn/openvpn/multi-wan.html#port-forward-method
-
@stephenw10 Good afternoon, would you have an example of what this configuration would look like, I couldn't understand it.
-
Set the the OpenVPN server to listen on localhost.
Then setup port forwards on both WANs to localhost for the port the OpenVPN traffic is arriving on.
Clients will be able to connect to either WAN and replies will go back correctly.