2.1 made site-site openvpn intermittent. Pfsync to blame?
-
After upgrading the client side of a site-site pfsense openVPN setup that was working flawlessly, connection to the server side became intermittent though there is no problem anywhere else with traffic. On both the client and the server it's two pf boxes in a ha primary/backup situation.
Nothing in the firewall logs, no obvious (to me) reason why that should happen. I upgraded the server side thinking that would fix it… and ... maybe it got a little worse. More down than up. Comes up, stays up for a bit, goes away for a minute, comes back. No problems with the underlying network. During the time it is up performance is quick. It's all or nothing. Looks like the client is 'timing out', but I see no place to tweak that in the GUI. I'm suspicious of leftover invisible bits in the XML carried over by the installer during the upgrade, but it's just a guess. Is it a fight between the HA primary and backup client? Is it a fight between the primary and backup HA server? What broke on the way to 2.2? Is it a mistake to use the openvpn HA check box in pfsync for site-site openvpn?
Here's a bit from the client openvpn log:
Oct 12 00:23:43 openvpn[2959]: Initialization Sequence Completed
Oct 12 00:23:43 openvpn[2959]: Preserving previous TUN/TAP instance: ovpnc3
Oct 12 00:23:41 openvpn[2959]: [MamaBossoTUNVPN] Peer Connection Initiated with [AF_INET]zz.ZZZ.ZZZ.59:922
Oct 12 00:23:40 openvpn[2959]: UDPv4 link remote: [AF_INET]97.64.213.59:922
Oct 12 00:23:40 openvpn[2959]: UDPv4 link local (bound): [AF_INET]ZZ.zzz.ZZZ.195
Oct 12 00:23:40 openvpn[2959]: NOTE: the current –script-security setting may allow this configuration to call user-defined scripts
Oct 12 00:23:38 openvpn[2959]: SIGUSR1[soft,ping-restart] received, process restarting
Oct 12 00:23:38 openvpn[2959]: [MamaBossoTUNVPN] Inactivity timeout (–ping-restart), restarting
Oct 12 00:21:41 openvpn[2959]: Initialization Sequence Completed
Oct 12 00:21:41 openvpn[2959]: Preserving previous TUN/TAP instance: ovpnc3
Oct 12 00:21:39 openvpn[2959]: [MamaBossoTUNVPN] Peer Connection Initiated with [AF_INET]ZZ.ZZ.ZZZ.59:922
Oct 12 00:21:37 openvpn[2959]: UDPv4 link remote: [AF_INET]ZZ.ZZ.ZZZ.59:922
Oct 12 00:21:37 openvpn[2959]: UDPv4 link local (bound): [AF_INET]ZZ.zzz.ZZZ.195
Oct 12 00:21:37 openvpn[2959]: NOTE: the current –script-security setting may allow this configuration to call user-defined scripts
Oct 12 00:21:35 openvpn[2959]: SIGUSR1[soft,ping-restart] received, process restarting
Oct 12 00:21:35 openvpn[2959]: [MamaBossoTUNVPN] Inactivity timeout (–ping-restart), restarting
Oct 12 00:19:41 openvpn[2959]: Initialization Sequence Completed
…repeat...repeat...repeat.Here's a bit from the server side:
Oct 12 00:28:37 openvpn[99226]: gate1.quietfountain.com/97.64.213.194:57726 send_push_reply(): safe_cap=940
Oct 12 00:28:34 openvpn[99226]: MULTI_sva: pool returned IPv4=192.168.55.6, IPv6=(Not enabled)
Oct 12 00:28:34 openvpn[99226]: 97.64.213.194:57726 [gate1.quietfountain.com] Peer Connection Initiated with [AF_INET]97.64.213.194:57726
Oct 12 00:27:37 openvpn[99226]: gate1.quietfountain.com/97.64.213.195:58424 send_push_reply(): safe_cap=940
Oct 12 00:27:34 openvpn[99226]: MULTI_sva: pool returned IPv4=192.168.55.6, IPv6=(Not enabled)
Oct 12 00:27:34 openvpn[99226]: 97.64.213.195:58424 [gate1.quietfountain.com] Peer Connection Initiated with [AF_INET]97.64.213.195:58424
Oct 12 00:26:41 openvpn[99226]: gate1.quietfountain.com/97.64.213.194:16016 send_push_reply(): safe_cap=940
Oct 12 00:26:39 openvpn[99226]: MULTI_sva: pool returned IPv4=192.168.55.6, IPv6=(Not enabled)
Oct 12 00:26:39 openvpn[99226]: 97.64.213.194:16016 [gate1.quietfountain.com] Peer Connection Initiated with [AF_INET]97.64.213.194:16016 -
One change I did: When the upgrade came to 2.1 I checked the previously unchecked box asking that the master and backup synced via pfsync for OpenVPN. So either something about the upgrade broke site-site or checking that box broke site-site.
As a test, I unchecked the box and disabled the client and the server setup on the backup pf box, leaving them run on the master pf box on both the client and server ends of the site-site. And… after a reboot... it works now.
Site-Site OpenVPN does not like the openvpn HA pfsync box checked. I suggest to check it during the configuration of the master, then before enabling the setup, save it disabled, so the backup PF box gets the config. Then uncheck pfsyncing the OpenVPN, then enable the config on the master. The good news is that it will work. The bad news is that should the master go down you'll have to manually get up at 3 am to enable the config on the slave. I think it is a rule that PFsense boxes only fail at 2:45 am.