Failback from Primary WAN after failover to Secondary WAN

Asulu

Nice scirpt, works well if you use 2. wan as failover only.
How whould this work with policy based routing?

I use my low latency wan for gaming and my other wan for downloads only.

When the gaming wan is down there is a group (WAN1GW) with failover to wan2 which works fine. If the Gaming wan is back up it stays on wan2.

Is tehre a possibility to force it to my Gamingwan again?

pfpv

So, this is still needed in 2.5.0. I am surprised. So few people use failover?

The /etc/rc.kill_states script was very slightly modified in 2.5.0 (added utf8_encode to IP variables). I recreated the /etc/my_kill_states and left the /root/check_backup_wan untouched (they weren't wiped out after the upgrade) and everything seems to work as in 2.4.5.

I hope the next build will have this or something better built-in.

Asulu

Yes it is:
https://redmine.pfsense.org/issues/855

pfpv

Can anyone confirm if this is still needed in 2.5.2? It seems so.

njacobs

Failback works as expected for me on 21.05 on an SG-3100.

I'm not 100% sure when it started working but I beleive it was at some point in the past 12 months.

I assume this isn't a planned divergance between Plus vs CE?

pfpv

@njacobs Failback worked for me in one test on 2.5.2. But this thread is not about whether it's working or not. It was working when this thread was created but didn't kill states properly. And it seems still doesn't.

njacobs

@pfpv My understanding of the issue was that any connections which failover - or are established whilst in failover - don’t failback. This appears to work for me. Have I misunderstood the issue?

mjh_ca

@njacobs If your secondary WAN is truly for backup only, like an LTE connection (expensive/limited bandwidth) then you want your IPSec tunnels to revert to the primary WAN when it is restored. However, current behavior as of 2.5.2, the established connections remain on the secondary WAN. This is a problem in most scenarios with LTE backup as it will chew through all the data limit and/or incur significant charges when it wasn’t necessary to do so. This thread and referenced ticket are requesting the capability to automatically kill open connection states when reverting back to primary WAN to achieve the desired behavior.

njacobs

@mjh_ca Yes. This is my setup and my experience, however I haven’t tried specifically with an IPSec tunnel. Direct traffic fails back to the primary WAN as soon as it is available again.

JimmyB

I just hit the same states-issue in 2.5.2.

Primary WAN (Tier 1) goes offline --> failover to secondary WAN2 (Tier 2, mobile plan).
WAN connection comes back online --> failover returns to primary WAN as expected.

WAN2 is still online, ready as a backup connection, which seems to not trigger clearing of WAN2 active states. WAN2 states continue to consume data from data plan as described above which is not desired. A "Clear states when returning to higher Tier" would be great for solutions implemented with LTE and limited data plans.

ddbnj

@jimmyb said in Failback from Primary WAN after failover to Secondary WAN:

I just hit the same states-issue in 2.5.2.

Primary WAN (Tier 1) goes offline --> failover to secondary WAN2 (Tier 2, mobile plan).

WAN connection comes back online --> failover returns to primary WAN as expected.

WAN2 is still online, ready as a backup connection, which seems to not trigger clearing of WAN2 active states. WAN2 states continue to consume data from data plan as described above which is not desired. A "Clear states when returning to higher Tier" would be great for solutions implemented with LTE and limited data plans.

On 21.05 I unknowingly consumed LTE allowance on old connections even though WAN fios was only down for a moment.

njacobs

@njacobs said in Failback from Primary WAN after failover to Secondary WAN:

@mjh_ca Yes. This is my setup and my experience, however I haven’t tried specifically with an IPSec tunnel. Direct traffic fails back to the primary WAN as soon as it is available again.

I stand corrected. On further investigation it appears I was actually seeing traffic from new connections.

manicmoose

I know this thread is quite old, but I wonder if anyone who suffer[s/ed] from this issue tried the state killing option in:

System -> Advanced -> Networking:

??

mkernalcon

@manicmoose That option is responsible for killing states when a particular WAN interface's IP address changes (via DHCP, for example). The option is to kill ALL states when this happens, instead of just those on the old WAN IP. This has absolutely nothing to do with WAN failover, since in that case, the interface IP addresses don't change, just which one is being used for routing.

Another option which seems helpful is the "Flush all states when a gateway goes down" option on the System->Advanced->Miscellaneous tab. However, this is what enables the failover, but not the failback (i.e. this won't do anything when a down gateway comes up).

This issue remains, and I've been correcting it for a few years now using this script (more or less).

pfpv

The scripts in the OP stopped working for me in 22.05. I found that

pfctl -i mvneta0 -ss

stopped outputting anything. I tried

pfctl -i mvneta0 -s states

and still nothing. I wonder what's up as it is a standard command in FreeBSD.

Viper_Rus

Hello. Is there a new version of the script to work on 22.05?

Viper_Rus

@viper_rus

So far I have found this option.
https://github.com/mk-fg/pfsense-scripts

kill connections when changing gateway. Works correctly on 22.05

But of course I would like a working script from this discussion.

sensei-two

pfSense 2.6 here. Still same problem :-(

Viper_Rus

@sensei-two

For me, this is the main problem with pfSense. Because of this bug, I will lose expensive 4g traffic

ddbnj

@viper_rus

This problem is real and I do two things to limit unnecessary expense.

I use Tello for my LTE service and limit it to 1GB ($6). The most I can lose is 6 bucks. If I need the connection to last longer, I can log on to tello from my phone and add funds or change the plan to unlimited. The gateway then comes back online.
I have an alert when LTE traffic exceeds a threshold so I can investigate. The alert system uses telegraf/influxdb/grafana and pushover.