OpenVPN client running, but GUI does not find it

phil.davis

I am getting a situation where an OpenVPN site-to-site client is up and running (I am accessing the client site from the server site across the VPN, so it really is working). But the GUI displays "Unable to contact daemon Service not running?" and the services status shows the client process is stopped. This happens for "seemingly random" clients, but it is more frequent for client-server pairs that have less-reliable links (i.e. they are restarting/renegotiating etc multiple times a day).
Here are the processes on a system with 2 clients (going to 2 main offices):

ps aux | grep open
root    7248  0.0  1.9  6456  4508  ??  Ss    2:04PM   0:01.76 /usr/local/sbin/openvpn --config /var/etc/openvpn/client1.conf
root   79874  0.0  2.2  7480  5208  ??  RNs  Thu04PM   1:14.39 /usr/local/sbin/openvpn --config /var/etc/openvpn/client2.conf

But the PID files for the processes are:

> ls -l /var/run/open*
-rw-r--r--  1 root  wheel  5 Jul 29 14:04 /var/run/openvpn_client1.pid
-rw-r--r--  1 root  wheel  5 Jul 29 14:06 /var/run/openvpn_client2.pid
> cat openvpn_client1.pid
7248
cat openvpn_client2.pid
2755

Client1 got recently restarted successfully - client1 PID file is new and matches the actual process Id.
Client2 has an old process that is running fine. Somewhere in the past, a new PID file has been written, and presumably a new client2 process attempted to start. But I guess the old process was still there, running happily, and never got stopped. So the new process would have exited, but leaving that new PID file behind.
The way to recover the status is:
a) modify /var/run/openvpn_client2.pid to contain the correct PID of the (old) running process. Now status-services shows the process green, but the message "Unable to contact daemon Service not running?" still comes on OpenVPN Status.
b) Restart the OpenVPN client from Status-Services. It stops the running process and cleanly starts a new one. Now all the GUI status is good (and the VPN link is working for users, like it always was)

So, there is some way for the OpenVPN client PID file to get updated without the old client process being stopped.
If someone has seen this and has an idea of the sequence of events to cause it, please comment. It would be a good thing to fix.

ensnare

I'm having the same problem on 2.1-RC1. Did you ever find a resolution?

phil.davis

No, I haven't really tried hard to determine the full cause. I have lots of slow links with high packet loss to remote places. So the OpenVPN connections often timeout and reestablish themselves, gateways go down, ISP ADSL connections get reset by the ISP and acquire new dynamic public IP… In amongst all this pfSense is trying to restart stuff to reestablish everything. It actually does a good job and my OpenVPN site-to-site links do come back once all the DynDNS name-to-IP changes propogate everywhere...
In amongst all that there is some way for an OpenVPN process to be running fine in FreeBSD underneath, but the pfSense GUI thinks it should be a different PID and so thinks it is down.

ensnare

Are you running 2.1-RC1?

Dirty hack, but I put this into crontab:
```
ps aux | grep client1 | grep -v grep | awk '{print $2}' > /var/run/openvpn_client1.pid


Not sure how to get the GUI to automatically refresh w/o restarting the service

phil.davis

Yep, running 2.1-RC1. Systems are on versions from 1 Aug or 6 Aug.

ensnare

Are you using multiple OpenVPN clients? I have two tunnels and wonder if this may be the cause of the problem. Is this a confirmed bug? If not, how can we submit it?

phil.davis

Now that someone else is seeing this also, I guess that makes it a confirmed bug?! At the moment I am busy with other non-pfSense stuff. Feel free to add a bug issue on Redmine and put whatever data you can into it. I will add to it when I have time to try and collect some useful information (to reproduce might require a tricky sequence of taking links down and up on a test system, and trying to trace everything that happens in the logs…)
It's not a show-stopper bug, as the OpenVPN is actually working. It's just that the dashboard/services status makes it look like it is down.

jimp

I have several clients and servers running on a test VM and things get beaten up a lot there, lots of loss/latency (artificial), clients and servers restarting, and so on… I have never seen this happen.

Is there anything weird/special about the client that gets lost? Anything different in its advanced options? Is it an assigned OpenVPN interface? Used in any NAT rules or similar?