OpenVPN dropping connectivity
I've configured some open vpn clients in pf sense 2.1.4 which work great. The problem is they will randomly go down - one could be down a day or two later. When I restart the corresponding service, it comes up fine and works.
I noticed one down today via the gateway monitor
- service was up
- connection was up
- couldn't ping google
- couldn't ping the vpn gateway
The last time this happened, I installed Service Watchdog in hopes that it would restart the service based on the gateway monitoring but it looks like no dice this time around and I don't see any configuration for it other than notifications.
I managed to figure out the pid and grepped the vpn log against it and see nothing out of the ordinary after the connection was established except this
PID_ERR replay-window backtrack occurred
A quick google tells me this is due to isp / line quality / packet loss.
Whatever the reason, it's not often, and restarting the service fixes the issue for me every time.
The only issue is that I don't know how to get pfsense to automatically restart the vpn service for me when this happens - can anyone help me do this?
if it is the client side navigate openvpn pick the client look for server host name resolution. check infinitely resolve server. you may need to add a keepalive to the advanced also
I checked the OpenVPN documentation and my provider's settings as entered in the advanced section. It's currently set to 5 instead of not specified which would default to infinite. From what I can tell this option is for when you're trying to connect to the server, it will try indefinitely, not for when you're connected to the vpn / service is running and traffic has just stopped and the gateway monitor is signalling offline because the alternative monitoring IP can't be hit / pinged.
I'm actually kind of surprised that neither pfSense nor the watchdog package account for the this case. I managed to hack together a script based on my limited php knowledge and function searching of other scripts which looks to see if any gateways are down and if they correlate to a VPN which has a service, restarts the VPN service.
The problem is, I never got a chance to test the script properly because the failures stopped. The last thing I remember changing was the gateway monitor ip to a different host (I chose an NTP server who's region matches the VPN server's country).
I don't know why changing the monitoring IP would fix it but it so far 2 days and no more stalled traffic. Before, within 4-8h it would be down. At this stage, it could be an ISP packetloss issue was fixed, or this IP doesn't affect pfSense or my VPN provider the same way.
I was suspicious of the packetloss issue when I first reported this problem because I actually have three connections set up to my VPN provider to three different countries and it was always the same one connection that was failing. If it was truly an ISP packetloss issue, all three would be intermittently dropping I would think.
I'll keep monitoring the situation but so far so good.. I'm still wondering if this VPN reset script should be tested and officially incorporated somewhere.
check this out
So after 5 straight days of connectivity, one of the VPN clients died again.
- Service is up
- Last log entry for the openvpn client is sequence completed, many hours before.
- Can't ping monitor IP through the vpn gateway.
- pfSense dashboard / gateway monitor shows the gateway as down.
Now that the issue has happened again, regardless of the cause, I was able to test my watchdog script. It successfully matched the gateway that was down to a vpn client and restarted the corresponding service. If I run it again, it sees scans the gateways, sees they are up and does nothing.
I downloaded the cron package, added the script to check the gateways every minute and I'll monitor to make sure it rebounds.
Here is an example of what pfSense reports for a downed gateway when calling a function that gets all gateway statuses:
[126.96.36.199] => Array ( [monitorip] => 188.8.131.52 [srcip] => 184.108.40.206 [name] => VPN0 [lastcheck] => Wed, 13 Aug 2014 21:02:31 -0400 [delay] => 0ms [loss] => 100% [status] => VPN0down
Where 220.127.116.11 is the gateway. I noticed 2 bugs in status. When gateways are up, status shows none. When its down, as you can see, the string has more than just down, so my script checks if the status string simply contains down.
If it matches this gateway to one used by a vpn, it restarts the corresponding vpn service, and you get an output like so:
VPN Gateway 18.104.22.168 for VPN id 0 is reporting as being down. Looking for VPN service associated with vpn id 0 Found corresponding service: OpenVPN client: MyVPN. Restarting...
otherwise, if everything is fine, you get:
All vpn client gateways are up.
Attached is the script in case it helps someone, gets turned into a package, incorporated into the watchdog package or into pfSense.
PS It can be optimized, it doesn't have to go the extra mile to find the service object to get the description, that's just for clarity when running from the shell.
Hopefully this will solve the rest of my vpn connectivity issues.