OpenVPN up but unable to contact daemon

phil.davis

Just want to document that this happens, and can happen even on an APU system with 2GB memory and running 2.2. I was connected from the office to my test system at home over the OpenVPN site-to-site link, so the link was definitely up. But the dashboard said "unable to contact daemon".
There was no PID file:

[2.2-BETA][root@apu22.localdomain]/var/etc/openvpn: ls -l /var/run
total 96
-rw-r--r--  1 root  wheel    6 Oct 26 21:28 apinger.pid
-rw-r--r--  1 root  wheel   72 Oct 29 09:21 apinger.status
srw-rw-rw-  1 root  wheel    0 Oct 26 21:28 check_reload_status
-rw-------  1 root  wheel    5 Oct 28 03:01 cron.pid
-rw-------  1 root  wheel    3 Oct 26 21:28 devd.pid
srw-rw-rw-  1 root  wheel    0 Oct 26 21:28 devd.pipe
srw-rw-rw-  1 root  wheel    0 Oct 26 21:28 devd.seqpacket.pipe
-rw-------  1 root  wheel    5 Oct 26 21:28 dhclient.re1.pid
-rw-r--r--  1 root  wheel    6 Oct 26 21:29 dnsmasq.pid
-rw-r--r--  1 root  wheel    6 Oct 26 21:29 expire_accounts.pid
-rw-r--r--  1 root  wheel    4 Oct 29 09:21 filter_reload_status
-rw-r--r--  1 root  wheel    6 Oct 26 21:28 filterlog.pid
-rw-------  1 root  wheel    5 Oct 26 21:29 inetd.pid
-r--r--r--  1 root  wheel  173 Oct 26 21:28 ld-elf.so.hints
-r--r--r--  1 root  wheel  188 Oct 28 03:01 ld-elf.so.hints.sudo-amd64
-r--r--r--  1 root  wheel  139 Oct 26 21:28 ld-elf32.so.hints
-r--r--r--  1 root  wheel  174 Oct 28 03:01 ld-elf32.so.hints.sudo-amd64
-rw-r--r--  1 root  wheel    6 Oct 26 21:29 lighty-webConfigurator.pid
srw-rw-rw-  1 root  wheel    0 Oct 26 21:28 log
srw-------  1 root  wheel    0 Oct 26 21:28 logpriv
-rw-r--r--  1 root  wheel    5 Oct 28 03:02 ntpd.pid
-rw-r--r--  1 root  wheel    3 Oct 26 21:28 php-fpm.pid
srw-------  1 root  wheel    0 Oct 26 21:28 php-fpm.socket
-rw-r--r--  1 root  wheel   58 Oct 26 21:28 php_modules_load_errors.txt
-rw-r--r--  1 root  wheel    6 Oct 26 21:29 ping_hosts.pid
-rw-r--r--  1 root  wheel    5 Oct 26 21:28 sshd.pid
-rw-------  1 root  wheel    4 Oct 26 21:28 syslog.pid
-rw-r--r--  1 root  wheel    6 Oct 26 21:29 update_alias_url_data.pid
-rw-r--r--  1 root  wheel    5 Oct 26 21:29 updaterrd.sh.pid
-rw-r--r--  1 root  wheel    0 Oct 26 21:28 utmp
-rw-r--r--  1 root  wheel  394 Oct 29 09:11 utx.active

Then I made one to match the running process:

[2.2-BETA][root@apu22.localdomain]/var/etc/openvpn: ps aux | grep openvpn
root   66268   0.0  0.3  21592  5452  -  Ss   Tue03AM    0:06.99 /usr/local/sbin/openvpn --config /var/etc/openvpn/client1.conf
root   50638   0.0  0.1  18816  2304  0  S+    9:22AM    0:00.00 grep openvpn
[2.2-BETA][root@apu22.localdomain]/var/etc/openvpn: echo 66268 > /var/run/openvpn_client1.pid

and then the dashboard widget (and Status:Services…) could find the process and report it up.
I know this has been reported in the past, somewhere in the forums.
So there is some way that the PID file can go missing, probably in amongst WAN going down/up. One day someone (me?) will work out the timing or whatever issue that causes this.

OpenVPN-Client-unable-to-contact-daemon.png_thumb

kejianshi

Yep - I have seen this on a vm running under esxi
Quite annoying.

jimp

Is this client using a hostname to find the server, by chance?

kejianshi

DNS name.

jimp

I don't see it right now but I think there is an open redmine ticket for that. Something happens to OpenVPN when it stops and restarts when using a hostname and infinite resolve. Somehow it ends up with a different process ID disconnected from the management port.

kejianshi

Yes - I get that on some installs but not others.
Seems it never really stops anything from working for me.

phil.davis

My clients all use a public DynDNS name that is kept up-to-date by the server-end pfSense (that has multi-WAN and offers the server end on the highest tier WAN of a gateway group…).
I haven't tried hard to find a reproducible test case, it just happens from time to time. When I am at home I will try some combinations of failing the client-end link, switching the server end link and name... to see if I can induce it.
From an end-user perspective it is not a show-stopper - user traffic is still passing through the tunnel.

cmb

That's probably this.
https://redmine.pfsense.org/issues/3894

I haven't had time to dig far enough to find out where and why it's getting started multiple times in that circumstance.

kejianshi

Do you believe that running a pfsense instance in esxi might make this condition more likely to occur?
Its the only time I ever see it myself and the setting on physical machines are not really any different.

cmb

If there's something about your ESX environment or setup in general that makes those VMs network connectivity take longer to come online, possibly. Generally speaking that wouldn't be the case though. I was testing and replicating it on a VK-T40E4 appliance where I blocked or degraded its network access upstream on its WAN network. Same end result with a VM.

kejianshi

Possibly. Slower HDD access?
I think the one in question is spread pretty thin. The drive is running on a NAS connected with a gigabit switch and maybe 10 VMs sharing access to the NAS on that one switch.
I could see it running abit slow at times. That is the one that is consistently having this issue no matter if it is wiped and completely reinstalled.

charliem

I just saw this today, first time I updated snapshots after installing OpenVPN. Bare hardware, not stretched at all (Atom D2550). Client connection to Private Internet Access was shown as down, yet it was up & running. Kill by hand, restart by GUI, and things are fine.