So after 5 straight days of connectivity, one of the VPN clients died again.
Service is up
Last log entry for the openvpn client is sequence completed, many hours before.
Can't ping monitor IP through the vpn gateway.
pfSense dashboard / gateway monitor shows the gateway as down.
Now that the issue has happened again, regardless of the cause, I was able to test my watchdog script. It successfully matched the gateway that was down to a vpn client and restarted the corresponding service. If I run it again, it sees scans the gateways, sees they are up and does nothing.
I downloaded the cron package, added the script to check the gateways every minute and I'll monitor to make sure it rebounds.
Here is an example of what pfSense reports for a downed gateway when calling a function that gets all gateway statuses:
[1.2.3.4] => Array
(
[monitorip] => 8.8.8.8
[srcip] => 5.6.7.8
[name] => VPN0
[lastcheck] => Wed, 13 Aug 2014 21:02:31 -0400
[delay] => 0ms
[loss] => 100%
[status] => VPN0down
Where 1.2.3.4 is the gateway. I noticed 2 bugs in status. When gateways are up, status shows none. When its down, as you can see, the string has more than just down, so my script checks if the status string simply contains down.
If it matches this gateway to one used by a vpn, it restarts the corresponding vpn service, and you get an output like so:
VPN Gateway 1.2.3.4 for VPN id 0 is reporting as being down.
Looking for VPN service associated with vpn id 0
Found corresponding service: OpenVPN client: MyVPN. Restarting...
otherwise, if everything is fine, you get:
All vpn client gateways are up.
Attached is the script in case it helps someone, gets turned into a package, incorporated into the watchdog package or into pfSense.
PS It can be optimized, it doesn't have to go the extra mile to find the service object to get the description, that's just for clarity when running from the shell.
Hopefully this will solve the rest of my vpn connectivity issues.
openvpn_hb.php.txt