OpenVPN dpinger behavior question

whorfin

[edited to add - currently on 2.4.0]
[…and same on 2.4.1]
Greetings

I've been trying for some time to narrow down some problems where OpenVPN seems to lose connection. If it exits, the watchdog I've got running [thanks, Packages!] works great.
If not, dpinger catches the problem but then nothing happens.

In the logs i see for example

Oct 22 16:04:33 rc.gateway_alarm 61843 >>> Gateway alarm: VPN_CLIENT_VPNV4 (Addr: redacted Alarm:1 RTT:29709ms RTTsd:11880ms Loss:22%)

which will be quickly followed by

Oct 22 16:04:33 pfSense check_reload_status: updating dyndns VPN_CLIENT_VPNV4
Oct 22 16:04:33 pfSense check_reload_status: Restarting ipsec tunnels
Oct 22 16:04:33 pfSense check_reload_status: Restarting OpenVPN tunnels/interfaces
Oct 22 16:04:33 pfSense check_reload_status: Reloading filter
Oct 22 16:04:35 pfSense php-fpm[71737]: /rc.dyndns.update: phpDynDNS (redacted.nope.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
Oct 22 16:04:35 pfSense php-fpm[62324]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use VPN_CLIENT_VPNV4.

That all looks great, but doesn't actually restart anything, as confirmed by the OpenVPN logs.

I see that the following is executed by the gateway alarm script:

pfSctl
-c "service reload dyndns VPN_CLIENT_VPNV4"
-c "service reload ipsecdns"
-c "service reload openvpn VPN_CLIENT_VPNV4"
-c "filter reload"

Digging in rc.openvpn shows that openvpn_resync_if_needed() is not called, as the conditions are not met [client interface is WAN, but $interface is opt8, where VPN_CLIENT_VPNV4/ovpnc2 sits]
Even if openvpn_resync_if_needed() were called, it wouldn't ever call openvpn_resync(), as
it notices that the binding interface didn't change [there's no failover]. There's a comment that says

/* Compare the interface currently used by the VPN with the interface that should be used.
If the VPN should stay on the same interface do not resync */

Is that really all as intended?
Is it further intended that "service restart openvpn VPN_CLIENT_VPNV4" won't start openvpn
if not running? It'd sure be great if the alarm could trigger a restart, as then no watchdog
would be needed and dpinger would be doing the job.

I'm going to experiment with the "interval" argument along with a watchdog, but again, wanted to understand if observed behavior was as intended. It was surprising to me that "service reload openvpn" doesn't actually reload unless a bunch of failover-related conditions are met. No doubt I missed or forgot something along the way.

Cheers

whorfin

Don't all answer at once….

Well, adding inactive directive and forcing restart on ping failure did not improve the situation.

inactive 120
remap-usr1 SIGHUP

To recap. I run an OpenVPN client on pfSense which is meant to be always on.
Every random amount of time, depending on ISP whatever, the connection stops working.
OpenVPN itself rarely notices the problem, even with the settings above [in addition to the usual keepalive].
If I am de-authed and it exits, my watchdog catches it, but otherwise the OpenVPN logs don't show a problem.
Attempting to use the network shows packets not flowing.
dpinger always notices the problem, and fires the gateway alarm.
…which does nothing, hence my posting here.
Going back through lengthy logs shows dpinger always noticing, reporting, firing alarm. And then nothing until manual intervention.

So, I made this patch:


--- /etc/rc.openvpn     2017-11-03 19:46:16.254944000 -0700
+++ /tmp/rc.openvpn     2017-11-03 19:49:20.564839000 -0700
@@ -40,17 +40,18 @@
        if (isset($ovpn_settings['disable'])) {
                $resync_needed = false;
        } else {
-               if (!empty($interface)) {
-                       $mode_id = $mode . $ovpn_settings['vpnid'];
-                       $fpath = "{$g['varetc_path']}/openvpn/{$mode_id}.interface";
-                       if (file_exists($fpath)) {
-                               /* Compare the interface currently used by the VPN with the interface that should be used.
-                                  If the VPN should stay on the same interface, do not resync */
-                               if (trim(file_get_contents($fpath), " \t\n") == get_failover_interface($ovpn_settings['interface'])) {
-                                       $resync_needed = false;
-                               }
-                       }
-               }
+# /* We want to resync if called and the interface is not disabled */
+#              if (!empty($interface)) {
+#                      $mode_id = $mode . $ovpn_settings['vpnid'];
+#                      $fpath = "{$g['varetc_path']}/openvpn/{$mode_id}.interface";
+#                      if (file_exists($fpath)) {
+#                              /* Compare the interface currently used by the VPN with the interface that should be used.
+#                                 If the VPN should stay on the same interface, do not resync */
+#                              if (trim(file_get_contents($fpath), " \t\n") == get_failover_interface($ovpn_settings['interface'])) {
+#                                      $resync_needed = false;
+#                              }
+#                      }
+#              }
        }
        if ($resync_needed == true) {
                log_error("OpenVPN: Resync " . $mode_id . " " . $ovpn_settings['description']);
@@ -116,9 +117,10 @@

        if (is_array($config['openvpn']['openvpn-client'])) {
                foreach ($config['openvpn']['openvpn-client'] as &$client) {
-                       if ($client['interface'] == $interface || empty($interface) || (!empty($gwgroups) && in_array($client['interface'], $gwgroups))) {
+#                      /* We want to resync if called; in our case, $client['interface'] != $interface but we still want to resync */
+#                      if ($client['interface'] == $interface || empty($interface) || (!empty($gwgroups) && in_array($client['interface'], $gwgroups))) {
                                openvpn_resync_if_needed('client', $client, $interface);
-                       }
+#                      }
                }
        }
 }

Apply with path strip of 0 and default root dir. And of course do so at your own risk.
I have no idea what the logic in here which twice tries to prevent the client from actually resyncing is for.
This patch disables the "only resync if VPN interface isn't what is set" logic in openvpn_resync_if_needed() and removes the various interface/gwgroups tests just for the "client" case.
Again, I don't know what they're there for, but for my use-case, this change now means that for the first time, my OpenVPN client has become reliable.
When dpinger sees a problem, it restarts the client.

This may definitely result in momentary connection loss, but that is preferable to connection being down until manual intervention.

Again, I'd appreciate commentary from those more knowledgeable on why those tests are in there. I think I'm requesting a supported way to disable them…

Cheers

whorfin

no change in 2.4.2

A Former User

I see the same on my pfSense 2.4.2_1:

Something triggers when check_reload_status is run that causes dpinger issues.

Not a biggy, but I too have noticed this.


Feb 6 20:48:20	rc.gateway_alarm	72290	>>> Gateway alarm: WAN_PPPOE (Addr:X.X.X.X Alarm:0 RTT:3906ms RTTsd:1128ms Loss:0%)
Feb 6 20:48:20	check_reload_status		updating dyndns WAN_PPPOE
Feb 6 20:48:20	check_reload_status		Restarting ipsec tunnels
Feb 6 20:48:20	check_reload_status		Restarting OpenVPN tunnels/interfaces
Feb 6 20:48:20	check_reload_status		Reloading filter
Feb 6 20:48:21	php-fpm		/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_PPPOE.
Feb 6 21:46:58	php-fpm		/index.php: Successful login for user 'admin' from: 192.168.0.106
Feb 7 01:46:58	rc.gateway_alarm	95168	>>> Gateway alarm: WAN_PPPOE (Addr:X.X.X.X Alarm:1 RTT:134683ms RTTsd:486359ms Loss:0%)
Feb 7 01:46:58	check_reload_status		updating dyndns WAN_PPPOE
Feb 7 01:46:58	check_reload_status		Restarting ipsec tunnels
Feb 7 01:46:58	check_reload_status		Restarting OpenVPN tunnels/interfaces
Feb 7 01:46:58	check_reload_status		Reloading filter
Feb 7 01:46:59	php-fpm		/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_PPPOE.
Feb 7 01:47:12	rc.gateway_alarm	97132	>>> Gateway alarm: WAN_PPPOE (Addr:X.X.X.X Alarm:1 RTT:219457ms RTTsd:556905ms Loss:11%)
Feb 7 01:47:12	check_reload_status		updating dyndns WAN_PPPOE
Feb 7 01:47:12	check_reload_status		Restarting ipsec tunnels
Feb 7 01:47:12	check_reload_status		Restarting OpenVPN tunnels/interfaces
Feb 7 01:47:12	check_reload_status		Reloading filter

gius3ppe

I know saying "me too!" isn't the biggest help ever. However, I also have run in to this issue.

I have my WAN Gateway and running OpenVPN for my other gateway. At random, my internet kill-switch kicks in because OpenVPN is restarting.

May 15 19:25:09	rc.gateway_alarm	98632	>>> Gateway alarm: VPN_WAN_VPNV4 (Addr:REDACTED Alarm:1 RTT:31347ms RTTsd:5964ms Loss:21%)
May 15 19:25:09	check_reload_status		updating dyndns VPN_WAN_VPNV4
May 15 19:25:09	check_reload_status		Restarting ipsec tunnels
May 15 19:25:09	check_reload_status		Restarting OpenVPN tunnels/interfaces
May 15 19:25:09	check_reload_status		Reloading filter
May 15 19:25:10	php-fpm	243	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use VPN_WAN_VPNV4.

I thought it was a memory issue, I did turn up a few things to push my rig to see what I could get away with but even with things turned down (log retention, number of entries for pfblocker, things like that) it still keeps cycling itself.

I will say I only started having this issue in the past few weeks. I am only running the pfBlocker package, but I do have some large lists. I initially thought a cron job was causing a memory issue to make this go down however, I switched the cron jobs to once a day at 2am but still experienced the issue.

Either of you guys have similar setups that can correlate potential causes so we can start to work towards a solution? Or smarter minds than my own are definitely in this community so I would love to hear back from someone that can tell me where my dumb mistake lies - I will glady wear egg on my face if it means network gains some stability.

Thanks!