OpenVPN up but unable to contact daemon



  • Just want to document that this happens, and can happen even on an APU system with 2GB memory and running 2.2. I was connected from the office to my test system at home over the OpenVPN site-to-site link, so the link was definitely up. But the dashboard said "unable to contact daemon".
    There was no PID file:

    [2.2-BETA][root@apu22.localdomain]/var/etc/openvpn: ls -l /var/run
    total 96
    -rw-r--r--  1 root  wheel    6 Oct 26 21:28 apinger.pid
    -rw-r--r--  1 root  wheel   72 Oct 29 09:21 apinger.status
    srw-rw-rw-  1 root  wheel    0 Oct 26 21:28 check_reload_status
    -rw-------  1 root  wheel    5 Oct 28 03:01 cron.pid
    -rw-------  1 root  wheel    3 Oct 26 21:28 devd.pid
    srw-rw-rw-  1 root  wheel    0 Oct 26 21:28 devd.pipe
    srw-rw-rw-  1 root  wheel    0 Oct 26 21:28 devd.seqpacket.pipe
    -rw-------  1 root  wheel    5 Oct 26 21:28 dhclient.re1.pid
    -rw-r--r--  1 root  wheel    6 Oct 26 21:29 dnsmasq.pid
    -rw-r--r--  1 root  wheel    6 Oct 26 21:29 expire_accounts.pid
    -rw-r--r--  1 root  wheel    4 Oct 29 09:21 filter_reload_status
    -rw-r--r--  1 root  wheel    6 Oct 26 21:28 filterlog.pid
    -rw-------  1 root  wheel    5 Oct 26 21:29 inetd.pid
    -r--r--r--  1 root  wheel  173 Oct 26 21:28 ld-elf.so.hints
    -r--r--r--  1 root  wheel  188 Oct 28 03:01 ld-elf.so.hints.sudo-amd64
    -r--r--r--  1 root  wheel  139 Oct 26 21:28 ld-elf32.so.hints
    -r--r--r--  1 root  wheel  174 Oct 28 03:01 ld-elf32.so.hints.sudo-amd64
    -rw-r--r--  1 root  wheel    6 Oct 26 21:29 lighty-webConfigurator.pid
    srw-rw-rw-  1 root  wheel    0 Oct 26 21:28 log
    srw-------  1 root  wheel    0 Oct 26 21:28 logpriv
    -rw-r--r--  1 root  wheel    5 Oct 28 03:02 ntpd.pid
    -rw-r--r--  1 root  wheel    3 Oct 26 21:28 php-fpm.pid
    srw-------  1 root  wheel    0 Oct 26 21:28 php-fpm.socket
    -rw-r--r--  1 root  wheel   58 Oct 26 21:28 php_modules_load_errors.txt
    -rw-r--r--  1 root  wheel    6 Oct 26 21:29 ping_hosts.pid
    -rw-r--r--  1 root  wheel    5 Oct 26 21:28 sshd.pid
    -rw-------  1 root  wheel    4 Oct 26 21:28 syslog.pid
    -rw-r--r--  1 root  wheel    6 Oct 26 21:29 update_alias_url_data.pid
    -rw-r--r--  1 root  wheel    5 Oct 26 21:29 updaterrd.sh.pid
    -rw-r--r--  1 root  wheel    0 Oct 26 21:28 utmp
    -rw-r--r--  1 root  wheel  394 Oct 29 09:11 utx.active
    

    Then I made one to match the running process:

    [2.2-BETA][root@apu22.localdomain]/var/etc/openvpn: ps aux | grep openvpn
    root   66268   0.0  0.3  21592  5452  -  Ss   Tue03AM    0:06.99 /usr/local/sbin/openvpn --config /var/etc/openvpn/client1.conf
    root   50638   0.0  0.1  18816  2304  0  S+    9:22AM    0:00.00 grep openvpn
    [2.2-BETA][root@apu22.localdomain]/var/etc/openvpn: echo 66268 > /var/run/openvpn_client1.pid
    
    

    and then the dashboard widget (and Status:Services…) could find the process and report it up.
    I know this has been reported in the past, somewhere in the forums.
    So there is some way that the PID file can go missing, probably in amongst WAN going down/up. One day someone (me?) will work out the timing or whatever issue that causes this.



  • Yep  -  I have seen this on a vm running under esxi
    Quite annoying.


  • Rebel Alliance Developer Netgate

    Is this client using a hostname to find the server, by chance?



  • DNS name.


  • Rebel Alliance Developer Netgate

    I don't see it right now but I think there is an open redmine ticket for that. Something happens to OpenVPN when it stops and restarts when using a hostname and infinite resolve. Somehow it ends up with a different process ID disconnected from the management port.



  • Yes - I get that on some installs but not others. 
    Seems it never really stops anything from working for me.



  • My clients all use a public DynDNS name that is kept up-to-date by the server-end pfSense (that has multi-WAN and offers the server end on the highest tier WAN of a gateway group…).
    I haven't tried hard to find a reproducible test case, it just happens from time to time. When I am at home I will try some combinations of failing the client-end link, switching the server end link and name... to see if I can induce it.
    From an end-user perspective it is not a show-stopper - user traffic is still passing through the tunnel.



  • That's probably this.
    https://redmine.pfsense.org/issues/3894

    I haven't had time to dig far enough to find out where and why it's getting started multiple times in that circumstance.



  • Do you believe that running a pfsense instance in esxi might make this condition more likely to occur?
    Its the only time I ever see it myself and the setting on physical machines are not really any different.



  • If there's something about your ESX environment or setup in general that makes those VMs network connectivity take longer to come online, possibly. Generally speaking that wouldn't be the case though. I was testing and replicating it on a VK-T40E4 appliance where I blocked or degraded its network access upstream on its WAN network. Same end result with a VM.



  • Possibly.  Slower HDD access? 
    I think the one in question is spread pretty thin.  The drive is running on a NAS connected with a gigabit switch and maybe 10 VMs sharing access to the NAS on that one switch.
    I could see it running abit slow at times.  That is the one that is consistently having this issue no matter if it is wiped and completely reinstalled.



  • I just saw this today, first time I updated snapshots after installing OpenVPN.  Bare hardware, not stretched at all (Atom D2550).  Client connection to Private Internet Access was shown as down, yet it was up & running.  Kill by hand, restart by GUI, and things are fine.


Log in to reply