OpenVPN up but unable to contact daemon
-
Just want to document that this happens, and can happen even on an APU system with 2GB memory and running 2.2. I was connected from the office to my test system at home over the OpenVPN site-to-site link, so the link was definitely up. But the dashboard said "unable to contact daemon".
There was no PID file:[2.2-BETA][root@apu22.localdomain]/var/etc/openvpn: ls -l /var/run total 96 -rw-r--r-- 1 root wheel 6 Oct 26 21:28 apinger.pid -rw-r--r-- 1 root wheel 72 Oct 29 09:21 apinger.status srw-rw-rw- 1 root wheel 0 Oct 26 21:28 check_reload_status -rw------- 1 root wheel 5 Oct 28 03:01 cron.pid -rw------- 1 root wheel 3 Oct 26 21:28 devd.pid srw-rw-rw- 1 root wheel 0 Oct 26 21:28 devd.pipe srw-rw-rw- 1 root wheel 0 Oct 26 21:28 devd.seqpacket.pipe -rw------- 1 root wheel 5 Oct 26 21:28 dhclient.re1.pid -rw-r--r-- 1 root wheel 6 Oct 26 21:29 dnsmasq.pid -rw-r--r-- 1 root wheel 6 Oct 26 21:29 expire_accounts.pid -rw-r--r-- 1 root wheel 4 Oct 29 09:21 filter_reload_status -rw-r--r-- 1 root wheel 6 Oct 26 21:28 filterlog.pid -rw------- 1 root wheel 5 Oct 26 21:29 inetd.pid -r--r--r-- 1 root wheel 173 Oct 26 21:28 ld-elf.so.hints -r--r--r-- 1 root wheel 188 Oct 28 03:01 ld-elf.so.hints.sudo-amd64 -r--r--r-- 1 root wheel 139 Oct 26 21:28 ld-elf32.so.hints -r--r--r-- 1 root wheel 174 Oct 28 03:01 ld-elf32.so.hints.sudo-amd64 -rw-r--r-- 1 root wheel 6 Oct 26 21:29 lighty-webConfigurator.pid srw-rw-rw- 1 root wheel 0 Oct 26 21:28 log srw------- 1 root wheel 0 Oct 26 21:28 logpriv -rw-r--r-- 1 root wheel 5 Oct 28 03:02 ntpd.pid -rw-r--r-- 1 root wheel 3 Oct 26 21:28 php-fpm.pid srw------- 1 root wheel 0 Oct 26 21:28 php-fpm.socket -rw-r--r-- 1 root wheel 58 Oct 26 21:28 php_modules_load_errors.txt -rw-r--r-- 1 root wheel 6 Oct 26 21:29 ping_hosts.pid -rw-r--r-- 1 root wheel 5 Oct 26 21:28 sshd.pid -rw------- 1 root wheel 4 Oct 26 21:28 syslog.pid -rw-r--r-- 1 root wheel 6 Oct 26 21:29 update_alias_url_data.pid -rw-r--r-- 1 root wheel 5 Oct 26 21:29 updaterrd.sh.pid -rw-r--r-- 1 root wheel 0 Oct 26 21:28 utmp -rw-r--r-- 1 root wheel 394 Oct 29 09:11 utx.active
Then I made one to match the running process:
[2.2-BETA][root@apu22.localdomain]/var/etc/openvpn: ps aux | grep openvpn root 66268 0.0 0.3 21592 5452 - Ss Tue03AM 0:06.99 /usr/local/sbin/openvpn --config /var/etc/openvpn/client1.conf root 50638 0.0 0.1 18816 2304 0 S+ 9:22AM 0:00.00 grep openvpn [2.2-BETA][root@apu22.localdomain]/var/etc/openvpn: echo 66268 > /var/run/openvpn_client1.pid
and then the dashboard widget (and Status:Services…) could find the process and report it up.
I know this has been reported in the past, somewhere in the forums.
So there is some way that the PID file can go missing, probably in amongst WAN going down/up. One day someone (me?) will work out the timing or whatever issue that causes this.
-
Yep - I have seen this on a vm running under esxi
Quite annoying. -
Is this client using a hostname to find the server, by chance?
-
DNS name.
-
I don't see it right now but I think there is an open redmine ticket for that. Something happens to OpenVPN when it stops and restarts when using a hostname and infinite resolve. Somehow it ends up with a different process ID disconnected from the management port.
-
Yes - I get that on some installs but not others.
Seems it never really stops anything from working for me. -
My clients all use a public DynDNS name that is kept up-to-date by the server-end pfSense (that has multi-WAN and offers the server end on the highest tier WAN of a gateway group…).
I haven't tried hard to find a reproducible test case, it just happens from time to time. When I am at home I will try some combinations of failing the client-end link, switching the server end link and name... to see if I can induce it.
From an end-user perspective it is not a show-stopper - user traffic is still passing through the tunnel. -
That's probably this.
https://redmine.pfsense.org/issues/3894I haven't had time to dig far enough to find out where and why it's getting started multiple times in that circumstance.
-
Do you believe that running a pfsense instance in esxi might make this condition more likely to occur?
Its the only time I ever see it myself and the setting on physical machines are not really any different. -
If there's something about your ESX environment or setup in general that makes those VMs network connectivity take longer to come online, possibly. Generally speaking that wouldn't be the case though. I was testing and replicating it on a VK-T40E4 appliance where I blocked or degraded its network access upstream on its WAN network. Same end result with a VM.
-
Possibly. Slower HDD access?
I think the one in question is spread pretty thin. The drive is running on a NAS connected with a gigabit switch and maybe 10 VMs sharing access to the NAS on that one switch.
I could see it running abit slow at times. That is the one that is consistently having this issue no matter if it is wiped and completely reinstalled. -
I just saw this today, first time I updated snapshots after installing OpenVPN. Bare hardware, not stretched at all (Atom D2550). Client connection to Private Internet Access was shown as down, yet it was up & running. Kill by hand, restart by GUI, and things are fine.