Gateway not switching back after failover
-
I have multiple VPN gateways setup in a failover gateway group.
Set to "Member down".So once Tier 1 goes down it switches to Tier 2.
This works but when Tier 1 comes back up all my clients are still going over the Tier 2 gateway.Is there a way to fix this? I've read several threads with the same problem but no fix found...
Running pfSense CE v2.6.0
-
When the tier 1 gateways come back up new connections will use them again but existing connections on the tier 2 gateway will remain until they are closed.
For some types of traffic that can be indefinitely, SIP connections for example.
Most traffic using TCP closes at the end of the session and new connections will be on the restored WAN.Steve
-
Thanks for the reply.
Is there a way to force all connections back to Tier 1 immediately?
And not after a certain period. Btw in my test case I've waited for at atleast 15 minutes and it did not switch back. -
Not built into pfSense. You could probably script something.
What connections are you seeing still using the tire2 gateway?
-
This is the situation.
2 VPN gateways in failover, Tier 1 and Tier 2.
Laptop is in vlan LAN_VPN and policy based routing is applied.
So my laptop uses the failover group as outgoing.Once I bring down Tier 1 the failover does its job and switches to Tier 2.
Once Tier 1 comes back up my laptop is still going out over the Tier 2 gateway. -
For new connections or existing connections? If existing what type of traffic?
What should happen there is the defined gateway for the group in /tmp/rules.debug should change. Make sure it is.
Steve
-
New and existing connections.
I have checked the behaviour of the
/tmp/rules.debug
file and checked the changes:Normal situation where Tier 1 and Tier 2 are up:
GWVPN_WAN1 = " route-to ( ovpnc1 10.10.10.1 ) " GWVPN_WAN2 = " route-to ( ovpnc2 10.20.20.1 ) " GWvpn_gw_group = " route-to { ( ovpnc1 10.10.10.1 ) } "
When Tier 1 goes down I have:
GWVPN_WAN2 = " route-to ( ovpnc2 10.20.20.1 ) " GWvpn_gw_group = " route-to { ( ovpnc2 10.20.20.1 ) } "
After Tier 1 comes back up and is online (Tier 2 still online as well) I have:
GWVPN_WAN2 = " route-to ( ovpnc2 10.20.20.1 ) " GWvpn_gw_group = " route-to { ( ovpnc2 10.20.20.1 ) } "
-
Hmm, then the only other possibility is that the ruleset it not actually being reloaded for some reason.
Check the gateway used in the running policy rule in the output of:pfctl -vsr
(that could be large!) -
-
This is pretty well documented on the problem, but still not a real great solution from the pfsense interface side. I would like to see another setting in the Gateway Group Entry that let's you set a time for forced pushback to the Tier 1. In my situation, I don't see new traffic go back to Tier 1 for hours after it's back up. What happens is: my Tier 1 VPN will see a 5-10 second brief packet loss (maybe they reset servers every 24 hours or something?), but after 5-10 seconds all is good again and no loss for another 24 hours or so. When I do a dnsleaktest.com test on a client PC, it's still on the failover Tier 2 VPN for hours at a time before it ever switches back. One way to force it back to Tier 1 right away is to restart the VPN service on Tier 2, which kicks it back to Tier 1 immediately.
So if I had a setting Gateway Priority that said force traffic back to Tier 1 after x seconds of uptime on Tier 1, life would be good! People could customize it how they needed. If downtime was always small like in my case, ~30 seconds. If others wanted 2 hours, cool.
-
Yes the failback situation could be improved. It exists as it is because when that code was created it was not possible to fail back in any useful way without killing all connections. That should now be possible, though it would still be disruptive.
-
As heavy handed as it might be I am still curious to know whether a Reset States command would be sufficient to guarantee failback?
-
As long as the main gateway has come back up any new states would created via that. So, yes, I would expect it to failback.