Break connection when a primary gateway is restored.
Here's my setup ::
Gateway Group (GROUP1) ---> WAN1 (Tier1) ---> WAN2 (Tier2) OpenVPN(udp) Client with GROUP1 interface
Here's the sequence tested
WAN1+WAN2 Online --> OpenVPN connected via WAN1 WAN2 Online --> OpenVPN connected via WAN2 WAN1+WAN2 Online --> OpenVPN still connected via WAN2
I assume that pfSense wouldn't break existing connection when WAN1 is restored.
Is there anyway to break all connections via WAN2 when WAN1 is restored?
In this setup, WAN1 is a cable modem with unlimited data, while WAN2 is a 3G modem which can get very expensive. Unlike HTTP requests, OpenVPN connection tends to last very long, and is unlikely to switch back to WAN1 until WAN2 fails.
This fix should resolve your issue: https://github.com/pfsense/pfsense/commit/4bf23d320bc96eeabf2daf9024583f2cc5a6662a
The code also had a bit of optimisation by these commits:
These last 2 do not actually fix any bugs. So you can just make the 1-line change to etc/rc.openvpn from the first commit, if you like.
The fix is in 2.1 code branch after 2.1-RELEASE, so will appear in 2.1.1 if and when that happens.
Your patch does fix the problem. Thank you!
Hmm.. After some tests, I seem to have run into another problem.
Say WAN1 gateway device is removed, and pfSense cannot obtain DHCP for that interface….
In this case, Gateway Group depending on it only sees it as "Gathering data", neither up, nor down.
The same goes for gateway status for this WAN. It's listed as "Pending".
Here's the problem :
Say an OpenVPN client is currently on Gateway Group 1 (GG1), consisting of WAN1 and WAN2.
Then WAN1 cable to the modem is severed, pfSense would have trouble renewing DHCP for the interface.
At this stage, WAN1 should be considered "down" instead of "pending" as it's unusable. However, since it's listed as "pending", OpenVPN is stuck and doesn't failover to WAN2.
I'm not sure if this has been covered elsewhere. But if it hasn't, and it's something that can be fixed easily, I'd appreciate pointer to the right file so that I can apply correct logic for Gateway Group failover mechanism.
When the WAN cable is disconnected, I thought that a hardware link down event happens and the system goes through the processing similar to when a gateway stops responding to ping.
If I get to work early in the morning, I should try that :)
It's a backup WAN, you can try it NOW, no one would notice ;D
On a serious note, yes, testing/debugging a live system can be quite a hassle.
In my case, WAN1/2 are connected to pfSense via a switch (VLAN'd). So, when modems die go offline & the lease is expired, pfSense will keep on trying to obtain DHCP lease forever. Hence infinite "Gathering data" on Gateway Group and "Pending" on Gateway.
One more thing with this infinite "Gathering Data" & "Pending" issue.
It seems that services requiring outbound access via such interface are affected by this issue as well.
Case in point, I have unbound using both WAN1 & WAN2 as Query Interfaces, now with WAN1 get stuck in this "Pending" status, it is now listed as "Stopped" and can never be restarted. Unchecked WAN1 on unbound configuration page & save and unbound is back online.
I couldn't try actual physical disconnection because I needed to be there in person to do that! Now I tried it, unplugged the main WAN and waited. The ordinary internet access using a gateway group failed over to WAN2 and the Dynamic DNS names that are tied to a gateway group changed. But none of the OpenVPN servers (site-to-site and 1 road warrior) switched to listen on WAN2 and the 1 OpenVPN site-to-site client going out to another office did not switch to going out WAN2. None /var/etc/openvpn*.conf got rewritten - which they should to in response to gateway group status change.
I will have a look at that code,
I guess it doesn't implement the same failover processing as when a gateway just stops responding to ping.
Added: On a test system, it activates all the processing, but my test hardware only has 1 real WAN, and thus a gateway group with only 1 WAN in it. But it does rewrite the server.conf file. I will have to try a real WAN unplug again and investigate why it didn't seem to work for me early this morning.