OpenVPN Client fatal exit when WAN goes away



  • 2.1-BETA1 (i386) built on Mon Mar 25 15:43:22 EDT 2013 on Alix nanoBSD (but I expect the problem is with OpenVPN and not specific to 2.1 or nanoBSD)
    The site has a TP-Link ADSL router connected to WAN by ordinary ethernet cable. The ADSL router provides the internet connection. We have "load-shedding" - so power is unavailable to the ADSL router for many hours, but we do have the Alix wired directly to a 12V battery, so it stays up. When the ADSL router goes away, the WAN port of the Alix has no electrical connection any more. WAN port has static IP 10.49.105.1
    There are 2 OpenVPN clients, each connecting to one of our main offices. They exit with "fatal error" as per log below:

    Mar 27 20:00:15 	openvpn[37004]: [UNDEF] Inactivity timeout (--ping-restart), restarting
    Mar 27 20:00:15 	openvpn[37004]: SIGUSR1[soft,ping-restart] received, process restarting
    Mar 27 20:00:17 	openvpn[37004]: WARNING: No server certificate verification method has been enabled. See http://openvpn.net/howto.html#mitm for more info.
    Mar 27 20:00:17 	openvpn[37004]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
    Mar 27 20:00:18 	openvpn[37004]: UDPv4 link local (bound): [AF_INET]10.49.105.1
    Mar 27 20:00:18 	openvpn[37004]: UDPv4 link remote: [AF_INET]49.244.nnn.nn:nnnn
    Mar 27 20:01:18 	openvpn[37004]: [UNDEF] Inactivity timeout (--ping-restart), restarting
    Mar 27 20:01:18 	openvpn[37004]: SIGUSR1[soft,ping-restart] received, process restarting
    Mar 27 20:01:20 	openvpn[37004]: WARNING: No server certificate verification method has been enabled. See http://openvpn.net/howto.html#mitm for more info.
    Mar 27 20:01:20 	openvpn[37004]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
    Mar 27 20:01:20 	openvpn[37004]: TCP/UDP: Socket bind failed on local address [AF_INET]10.49.105.1: Can't assign requested address
    Mar 27 20:01:20 	openvpn[37004]: Exiting due to fatal error
    Mar 27 20:01:20 	openvpn[37004]: /usr/local/sbin/ovpn-linkdown ovpnc2 1500 1557 10.49.252.26 10.49.252.25 init
    Mar 27 20:01:31 	openvpn[19567]: write UDPv4: No route to host (code=65)
    Mar 27 20:01:41 	openvpn[19567]: write UDPv4: No route to host (code=65)
    Mar 27 20:01:42 	openvpn[19567]: [INFN-ICO-S2S-SERVER] Inactivity timeout (--ping-restart), restarting
    Mar 27 20:01:42 	openvpn[19567]: SIGUSR1[soft,ping-restart] received, process restarting
    Mar 27 20:01:44 	openvpn[19567]: WARNING: No server certificate verification method has been enabled. See http://openvpn.net/howto.html#mitm for more info.
    Mar 27 20:01:44 	openvpn[19567]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
    Mar 27 20:01:44 	openvpn[19567]: TCP/UDP: Socket bind failed on local address [AF_INET]10.49.105.1: Can't assign requested address
    Mar 27 20:01:44 	openvpn[19567]: Exiting due to fatal error
    Mar 27 20:01:44 	openvpn[19567]: /usr/local/sbin/ovpn-linkdown ovpnc1 1500 1557 10.49.251.26 10.49.251.25 init
    

    When power comes back to the ADSL router and WAN comes alive again the next morning, the internet comes up fine but the OpenVPN client processes do not restart.
    Any ideas about:
    a) How to prevent the clients from crashing; or
    b) How to make them auto-restart when WAN comes up
    are welcome.
    Ideally they should try again every minute for hours and hours without giving up…



  • When you lose your WAN link, you lose your WAN IP, and then OpenVPN dies since it can't bind. That much is unavoidable. But the WAN up process should be restarting it when the WAN comes back. That generally works. What's the system log show when the WAN comes back up?



  • The power seems to have been off from 20:00 to 22:00 - so definitely one of the timed load-shedding events in Nepal. Note that both WAN and LAN go offline and come back together, because the ADSL router on WAN has no power, and the network switch on LAN has no power. Log entries:

    Mar 27 16:16:32 isp-rt-01 check_reload_status: Linkup starting vr0
    Mar 27 22:01:32 isp-rt-01 kernel: vr0: link state changed to UP
    Mar 27 22:01:37 isp-rt-01 php: : DEVD Ethernet attached event for lan
    Mar 27 22:01:37 isp-rt-01 php: : HOTPLUG: Configuring interface lan
    Mar 27 16:16:47 isp-rt-01 check_reload_status: Linkup starting vr1
    Mar 27 22:01:47 isp-rt-01 kernel: vr1: link state changed to UP
    Mar 27 22:01:52 isp-rt-01 php: : DEVD Ethernet attached event for wan
    Mar 27 22:01:52 isp-rt-01 php: : HOTPLUG: Configuring interface wan
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host ntc-idp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host wlink-ico-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host ntc-ijp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host wlink-ibp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host www.fb.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host ntc-isp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host facebook.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host www.facebook.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host wlink-isp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host wlink-ikp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host ntc-kchd-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host fb.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host ntc-imp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host ntc-ibp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:02:31 isp-rt-01 filterdns: failed to resolve host wlink-ktm-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host wlink-ico-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host www.facebook.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host wlink-isp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host ntc-ibp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host wlink-ktm-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host ntc-idp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host ntc-ijp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host wlink-ibp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host www.fb.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host ntc-isp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host facebook.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host wlink-ikp-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host ntc-kchd-inf.dyndns-ip.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host fb.com will retry later again.
    Mar 27 22:07:31 isp-rt-01 filterdns: failed to resolve host ntc-imp-inf.dyndns-ip.com will retry later again.
    Mar 27 16:25:12 isp-rt-01 check_reload_status: Updating all dyndns
    Mar 27 16:25:12 isp-rt-01 check_reload_status: Restarting ipsec tunnels
    Mar 27 16:25:12 isp-rt-01 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Mar 27 16:25:12 isp-rt-01 check_reload_status: Reloading filter
    Mar 27 22:10:15 isp-rt-01 php: : phpDynDNS: updating cache file /conf/dyndns_wandyndns'ntc-isp-inf.dyndns-ip.com'0.cache: 49.244.238.147
    Mar 27 22:10:15 isp-rt-01 php: : phpDynDNS (ntc-isp-inf.dyndns-ip.com): (Success) IP Address Changed Successfully! (49.244.238.147)
    Mar 27 22:10:22 isp-rt-01 php: : phpDynDNS (ntc-isp-inf.dyndns-ip.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
    Mar 27 22:10:22 isp-rt-01 kernel: Bump sched buckets to 256 (was 0)
    Mar 27 22:10:22 isp-rt-01 kernel: Bump sched buckets to 256 (was 0)
    Mar 27 22:15:02 isp-rt-01 kernel: Bump sched buckets to 256 (was 0)
    Mar 27 22:15:02 isp-rt-01 kernel: Bump sched buckets to 256 (was 0)
    Mar 27 22:25:14 isp-rt-01 php: : phpDynDNS (ntc-isp-inf.dyndns-ip.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
    Mar 27 22:30:03 isp-rt-01 kernel: Bump sched buckets to 256 (was 0)
    Mar 27 22:30:03 isp-rt-01 kernel: Bump sched buckets to 256 (was 0)
    Mar 27 22:40:03 isp-rt-01 php: : phpDynDNS (ntc-isp-inf.dyndns-ip.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
    then repeating DynDNS check every 15 minutes, which is normal as scheduled in a cron job...
    

    The OpenVPN log has nothing between the death at 20:00 last night, and when I manually restarted the OpenVPN clients at 10:31.

    Mar 27 20:01:44 	openvpn[19567]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
    Mar 27 20:01:44 	openvpn[19567]: TCP/UDP: Socket bind failed on local address [AF_INET]10.49.105.1: Can't assign requested address
    Mar 27 20:01:44 	openvpn[19567]: Exiting due to fatal error
    Mar 27 20:01:44 	openvpn[19567]: /usr/local/sbin/ovpn-linkdown ovpnc1 1500 1557 10.49.251.26 10.49.251.25 init
    Mar 28 10:31:37 	openvpn[90929]: OpenVPN 2.3.0 i386-portbld-freebsd8.3 [SSL (OpenSSL)] [LZO] [eurephia] [MH] [IPv6] built on Jan 27 2013
    Mar 28 10:31:37 	openvpn[90929]: WARNING: No server certificate verification method has been enabled. See http://openvpn.net/howto.html#mitm for more info.
    Mar 28 10:31:37 	openvpn[90929]: NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
    Mar 28 10:31:37 	openvpn[90929]: Control Channel Authentication: using '/var/etc/openvpn/client1.tls-auth' as a OpenVPN static key file
    Mar 28 10:31:37 	openvpn[91055]: UDPv4 link local (bound): [AF_INET]10.49.105.1
    ...
    
    

    The message "Restarting OpenVPN tunnels/interfaces" looks like a great thing, but actually I can't see any evidence of it really happening.
    I have an Alix at home with the same configuration behind an ADSL router, with 2 OpenVPN clients going out to the same places, so I will try to reproduce the issue tonight.

    Note: This issue is probably not new, it is just that the site now has a reliable ISP connection and so we can finally fault-find other things, rather than constantly fighting to get any sort of internet access :)



  • That "restarting OpenVPN" log from check_reload_status is good, when it logs that it runs /etc/rc.openvpn

    Could you add some debug logging into rc.openvpn, replicate, and see what happens? Maybe a log_error right at the top to see if it is actually running it, and a log_error under the openvpn_resync to see what it's calling resync with.



  • The problem is that rc.openvpn looks for OpenVPN servers and clients that have a gateway group as their interface. For "ordinary" clients and servers that are listening on a real interface (e.g. WAN) it doesn't call openvpn_resync:

    	if(is_array($config['openvpn']['openvpn-client'])) {
    		foreach($config['openvpn']['openvpn-client'] as &$client) {
    			if(is_array($gwgroups[$client['interface']])) {
    				openvpn_resync('client', $client);
    			}
    		}
    	}
    

    I can't see why the gateway groups check is done. Currently this code has no idea which interface may have changed its IP, so it would really have to restart every OpenVPN endpoint, just in case. Ideally it could be told which interface has changed/come up and then just resync any endpoints that use that interface, or a gateway group that contains that interface.
    The workaround is to make a gateway group that just contains WAN, then set the client interface to the gateway group.



  • I added a TestGroup gateway group, with just WAN at Tier 1 in the group. Then made one of the OpenVPN client interfaces "TestGroup". That client survives a WAN cable unplug/replug event. But the client with interface just WAN does not.
    Also, after physically unplugging the WAN cable, waiting a couple of minutes, then plugging it in again, I lost the default route in the routing table - no internet. To get it back in went to Interface->WAN and pressed Save, then the default route came back. I guess that is a separate issue, I'll post about that later in the 2.1 forum.



  • Opened a bug against 2.1: http://redmine.pfsense.org/issues/2915 - this thread could be moved to the 2.1 forum, as I believe it is only a result of some new features in 2.1.


Locked