Multi-WAN gateway failover not switching back to tier 1 gw after back online
-
You have one or two Gateway Groups defined? The one with time stamp 02-25-44.
What you call "WANGROUP" is easier to handle when called " PPPoE 2 UPC"
Now you need an additional "UPC 2 PPPoE" group with reversed tiers.
Add another firewall rule for that one as well and it should work.And start with setting both "Trigger levels" to "Member Down".
-
Hi again
But this is a temporal solution you've found to make it work or is the normal configuration for failover? I can't understand why we have to create two Gateways Groups, cause we only want one failover direction, not the oposite. I've been reviewing documentation and online info, and you should only need to create one gateway group.
When you create the second firewall rule with the inverted Tier numbers, if that rule goes after the normal rule, the firewall should never reach the second one, because it looks at the rules sequentially, so when it reaches the first one, it directs the traffic through the main group (the group is online, because wan2 is online). It should never reach the second rule, so if it's doing it in your case, I think something strange happens.
I'm probably wrong or I'm missing something, but I just want to clarify if with your configuration it's working because that's the normal config or is some other problem that makes it work although is not the right configuration.
Now I can't test it in my client, because is a production system, but I'll try to make a demo in our office to see if I can verify your configuration. If it works, it could be a good temporal patch to solve the problem, but I still think something is not going well. The group should recover gateways and order preference automatically (that's what Tier is for).
Thanks for your help
-
Hi again
But this is a temporal solution you've found to make it work or is the normal configuration for failover? I can't understand why we have to create two Gateways Groups, cause we only want one failover direction, not the oposite. I've been reviewing documentation and online info, and you should only need to create one gateway group.
When you create the second firewall rule with the inverted Tier numbers, if that rule goes after the normal rule, the firewall should never reach the second one, because it looks at the rules sequentially, so when it reaches the first one, it directs the traffic through the main group (the group is online, because wan2 is online). It should never reach the second rule, so if it's doing it in your case, I think something strange happens.
I'm probably wrong or I'm missing something, but I just want to clarify if with your configuration it's working because that's the normal config or is some other problem that makes it work although is not the right configuration.
Now I can't test it in my client, because is a production system, but I'll try to make a demo in our office to see if I can verify your configuration. If it works, it could be a good temporal patch to solve the problem, but I still think something is not going well. The group should recover gateways and order preference automatically (that's what Tier is for).
Thanks for your help
Hi. Indeed, a second rule makes no sense to me also but I will test it as advised, test it for the second time actuallly. I've tried even with 3 rules, same results.
In the mean time I've done more testing and got to a conclusion but first please let me know how did you simulate the main wan failure?
-
Second rule and gateway group is not necessary unless you want some traffic to prefer the second route and fail over the other way.
You only need the one Tier 1 to Tier 2 to fail all traffic over is that direction.
It certainly should recover and "fail back" when the Tier 1 route comes back up.
-
Second rule and gateway group is not necessary unless you want some traffic to prefer the second route and fail over the other way.
You only need the one Tier 1 to Tier 2 to fail all traffic over is that direction.
It certainly should recover and "fail back" when the Tier 1 route comes back up.
Well, in my case it doesn't. WAN1 is a PPPOE connection, and after I re-plug or the Ethernet cable in WAN 1 all connections still go through OPT1.
For the testing purpose, I added another router in front of pfsense so it won't have to use a PPPOE connection, I assigned to WAN1 a static IP like OPT1 has. In this case to some extent it works if I unplug/re-plug the connection on the first router (take down the ISP Interface, the fiber media converter) so both WAN and OPT1 stay up in pfsense. Still, some sites refuses to load in Chrome with the following error: DNS_PROBE_FINISHED_NXDOMAIN
So, for now my only conclusion is that there is a problem with pfsense when you unplug and re-plug the cable on the interface using a PPPOE connection. The dns error is still a mystery to me, I still need to figure it out.
-
Well you need to fix your DNS. Sounds like it might not be working right on one or both WANs. Are you using the forwarder or the resolver?
It shouldn't matter which WAN the resolver uses because it should only be trying to talk to authoritative name servers that should accept queries from everywhere.
The problem lies in forwarders because you usually point the forwarder at ISP caching servers and they might only accept connections from their network so it matters which DNS servers are used out which interface.
-
Well you need to fix your DNS. Sounds like it might not be working right on one or both WANs. Are you using the forwarder or the resolver?
It shouldn't matter which WAN the resolver uses because it should only be trying to talk to authoritative name servers that should accept queries from everywhere.
The problem lies in forwarders because you usually point the forwarder at ISP caching servers and they might only accept connections from their network so it matters which DNS servers are used out which interface.
I tried both the resolver and the forwarder, some sites are just not resolved. Unfortunately I don't think I can use pfsense in a production environment, for me at least failover it's not working with pppoe :(.
should "State Killing on Gateway Failure" should be on?
Thanks
-
Depends on whether or not you want states killed on a gateway failure.
-
Depends on whether or not you want states killed on a gateway failure.
Well, isn't better to have them reset on a gw failure? The definition is a bit tricky for this option
-
I only skimmed through this thread so I apologize if this was already suggested but – are you certain your clients are set to use the pfSense IP as their DNS resolver? If e.g. you have a gateway defined with a custom monitor IP of 8.8.8.8 or the DNS servers on your General settings page are locked to a specific gateway, then static routes are built which will force traffic out that specific gateway, even if it's down. So this could result in DNS being "dead" when one of the gateways goes down. Is this possibly what's happening?
-
I only skimmed through this thread so I apologize if this was already suggested but – are you certain your clients are set to use the pfSense IP as their DNS resolver? If e.g. you have a gateway defined with a custom monitor IP of 8.8.8.8 or the DNS servers on your General settings page are locked to a specific gateway, then static routes are built which will force traffic out that specific gateway, even if it's down. So this could result in DNS being "dead" when one of the gateways goes down. Is this possibly what's happening?
Monitor IPs are currently set to one of each ISP, in General I have a pair of DNSes set for each gateway (four servers in total). Clients DNS is manullay set 192.168.1.1 (pfsense)
-
I tried both the resolver and the forwarder, some sites are just not resolved.
If you do not know how to get more information than that about what's actually happening, you are probably in over your head.
-
I tried both the resolver and the forwarder, some sites are just not resolved.
If you do not know how to get more information than that about what's actually happening, you are probably in over your head.
Oh, nice. What can I say?Thanks? :)…..Thanks.
-
Hi
yanakis, in my case we have the fiber media converter and the router (not PPPoE), and happens the same. I switched off or disconnected the router, but never tried switching off media converter (good idea).
In my last installations I don't usually use DNS forwarder/resolver for localhost, but in this case I do (I configured it in the past and never change it). Have you tried deactivating that option in General Settings? Just to see if something changes.
I understand luckman212 concerns about DNS and static routes created by pfsense for each DNS associated to a wan, but in my case we had two different DNS configured and working, and failed. And in any case, once wan is recovered again, DNS works again and everything should work again.
By the way, I tried with "State Killing on Gateway Failure" on and off, and recover fails in both cases. I keep it unchecked, because with external sip connections is mandatory to make failover work (at least in my case). And I personally prefer to reset states if a gateway fails, to avoid problems.
Regards
PD: I don't think you are in over your head… Thanks for all
-
Hi
yanakis, in my case we have the fiber media converter and the router (not PPPoE), and happens the same. I switched off or disconnected the router, but never tried switching off media converter (good idea).
In my last installations I don't usually use DNS forwarder/resolver for localhost, but in this case I do (I configured it in the past and never change it). Have you tried deactivating that option in General Settings? Just to see if something changes.
I understand luckman212 concerns about DNS and static routes created by pfsense for each DNS associated to a wan, but in my case we had two different DNS configured and working, and failed. And in any case, once wan is recovered again, DNS works again and everything should work again.
By the way, I tried with "State Killing on Gateway Failure" on and off, and recover fails in both cases. I keep it unchecked, because with external sip connections is mandatory to make failover work (at least in my case). And I personally prefer to reset states if a gateway fails, to avoid problems.
Regards
PD: I don't think you are in over your head… Thanks for all
Well, I left empty the DNS fields in General but failback to WAN still not working after WAN recovery unless I change something in Firewall or Routing and apply changes :(
-
…and this looks like a pfsense problem...
I cannot second that!
I have this working for quite some time now with WAN1 (100Mb cable) and a rather old WAN2 (6Mb DSL).
I have failover to W2 if W1 is down and immediately W1 again when available.Show us your System | Routing | Gateway Groups page.
Hi Cris. Can you please post your setup? Thanks
-
Well, Derelict wrote that my config seems to be a bit more complicated than necessary.
Since I trust him I usually would test his suggestion first and post afterwards. I just don't have the time for that in the foreseeable future…I only use DSL as failover (it's 6Mbit) and rely on cable which is 100Mbit.
You will know why if you have teen kids...
Just checked and failover to DSL is working as well as fallback to cable when available again.I set this up about a year ago and used the pfsense docs for that.
-
I have mine set up exactly like you do (A group with Tier 1 Cable, Tier 2 DSL and a group with Tier 1 DSL, Tier 2 Cable).
I was just commenting it's only necessary if you want to have rules that prefer the other circuit while maintaining the ability for those rules to fail over too.
I tested my failover yesterday since I was putting new splitters on my cable in anticipation of MoCA 2.0. It all worked exactly as configured and when I was done it brought my Tier 1 back online just as it has many times before.
-
i didn't criticize! I only mentioned that I might be done with half the work. ;)
-
Hi again
Past week we've installed a new machine with 2.2.4 and two WAN with failover, and same problem. In this case we have to different LANs, and each one has one failover group with different order (one with wan1->wan2 and the other with wan2->wan1), and none of them redirect traffic to the main one when it's recovered (we have to do some change and Save, as yanakis says).
It's a new installation without anything strange. We run several tests in both directions, and I can confirm the problem exists. Never went back automatically to the recovered main wan.
We didn't find nothing new or more clues, it just doesn't work.
Regards
-
+1 with arcanos.
In fact, instead of attempting to troubleshoot this and failing, it would be better if someone that has this working, would post a complete series of screenshots showing their setup. Then we can all learn from a working environment.
-
Is PPPoE a common factor for those that don't work? Both my WANs are DHCP.
There's really nothing to it. Create a gateway group with a Tier 1 and Tier 2 with member down as the trigger level and policy route to it.
-
Correct, PPPoE is the default and DHCP is the failover.
-
Not PPPoE in my case. This last case are two cable connections with routers with NAT and DMZ pointing to the wan interface of pfsense. But I've seen the problem with DSL and cable in bridge mode.
-
On the first view I have the same issue but looking deeper I can see that my Gateway keeps really offline until I reapply the interface config page or reboot.
Short decription of config and behavior:
I have two gateways (1. fiber / 2. cable modem) with a routing group for balancing (tier 1 / tier 1). Both gateways are monitored against external DNS servers. The routing group is defined as gateway in FW rule. "Use sticky connection" on System-advanced-misc. is on.
At the beginning after reboot all works fine an traffic is distributed to both gateways with weight 1:4.
But after some minutes / hours always second gateway goes offline (100% package loss) and keeps this status until I reapply the interface config or reboot. It's not an apinger problem. The gateway is really broken. A ping from Diagnostic - ping with source of gateway doesn't work (100% loss). The cable modem isn't disconnect and it works if a plugin a notebook there. So Pfsense stops the gateway really and keep it broken. Even if I disconnect the lan wire and reconnect no reaction.
Same happens on my backup Pfsense which is running in CARP mode. There is no traffic load but GW stops also.
If I set routing group in redundant mode (GW 1 tier 1 / GW 2 tier 2 OR GW 2 tier 1 / GW 1 tier 2) then all work OK. The gateways keep online. Also after reconnection of wire the interface comes online again.
My estimation is that there must be something wrong with balancing gateways. But I need the capacity of both gateways.
-
you should put a working monitor ip for each interfaces like dns ip
-
Hello,
I'm facing the same problem. I've read those (with no solution)
I'm runing 2.3.1 wit 2 wans (1 cable/main and 1dsl-pppoe/secondary), 2 groups. Failover is working (trigger ok) but not switching back after weak connection is back at 100%.
Ready to send screenshots. Ask
logs:
Jun 16 13:49:20 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 9274us stddev 5829us loss 21%
Jun 16 13:49:42 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7586us stddev 4056us loss 15%
Jun 16 13:53:58 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7362us stddev 4941us loss 21%
Jun 16 13:54:15 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6543us stddev 3719us loss 19%
Jun 16 13:54:39 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 6692us stddev 3840us loss 21%
Jun 16 13:54:57 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6644us stddev 3338us loss 15%
Jun 16 13:56:03 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8839us stddev 5402us loss 21%
Jun 16 13:56:19 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8292us stddev 4864us loss 19%
Jun 16 13:56:43 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8431us stddev 5556us loss 22%
Jun 16 13:57:02 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7940us stddev 5158us loss 15%
Jun 16 13:58:35 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 12630us stddev 12111us loss 21%
Jun 16 13:58:53 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8282us stddev 4592us loss 15%
Jun 16 13:59:21 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8983us stddev 5856us loss 21%
Jun 16 13:59:32 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8447us stddev 5473us loss 16%
Jun 16 13:59:58 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8206us stddev 5630us loss 21%
Jun 16 14:00:11 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7373us stddev 4132us loss 14%
Jun 16 14:01:14 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8049us stddev 4691us loss 21%
Jun 16 14:01:44 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7842us stddev 3865us loss 18%
Jun 16 14:01:47 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7944us stddev 3892us loss 21%
Jun 16 14:02:18 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7717us stddev 3673us loss 12%
Jun 16 14:03:51 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7952us stddev 4608us loss 21%
Jun 16 14:04:16 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7030us stddev 3415us loss 12%
Jun 16 14:04:28 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7711us stddev 4555us loss 21%
Jun 16 14:04:56 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7538us stddev 4081us loss 14%
Jun 16 14:05:10 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8504us stddev 5216us loss 21%
Jun 16 14:05:32 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8245us stddev 4794us loss 13%
Jun 16 14:05:51 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8544us stddev 5200us loss 21%
Jun 16 14:06:14 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7369us stddev 3934us loss 16%
Jun 16 14:06:26 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7862us stddev 4613us loss 21%
Jun 16 14:06:56 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7151us stddev 3861us loss 13%
Jun 16 14:11:12 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7910us stddev 4976us loss 21%
Jun 16 14:11:26 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6648us stddev 3553us loss 15%
Jun 16 14:11:49 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 6910us stddev 4027us loss 21%
Jun 16 14:12:11 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6271us stddev 2901us loss 15%
Jun 16 14:12:28 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 6705us stddev 3698us loss 21%
Jun 16 14:12:52 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6371us stddev 2763us loss 11%
Jun 16 14:13:45 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8346us stddev 5486us loss 21%
Jun 16 14:14:57 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8065us stddev 5624us loss 16%
-
I would change the monitor IP in the WAN2CABLEGW to 8.8.8.8 or anything else that responds reliably and see if things improve. You can't expect any multi-WAN routing solution to perform with any semblance of continuity with flapping like that.
-
Actually, the monitor IP is default so it is the gateway of each wan.
I will give a try for a one like google (8.8.8.8) but the problem is not the monitor ip or the trigger, the problem is that when the deffect wan is back to normal (ping to monitor ip is better quality) the system do not switch back. I mean If I log in pfsense (hours after the problem), and watch my "gateway groups" status, they are all in green (same for gateways) but the system do not switch back to the favorite gateway.
I would change the monitor IP in the WAN2CABLEGW to 8.8.8.8 or anything else that responds reliably and see if things improve. You can't expect any multi-WAN routing solution to perform with any semblance of continuity with flapping like that.
-
No, the problem is your gateway is flapping about every minute due to packet loss to your monitor IP. If that was in my multi-wan group I would disable it until it was fixed. If that's "just the way it is" you will need to increase your monitoring threshholds and consider it up when it sucks like that.
-
OK my gateway has "troubles" for few minutes a day (not all the time).
That's precisely why I want a failover.
And this do not explain why the system do not go back to it's first wan after the first wan is seen by ths system in green. (hours after!)
If the goal of a failover is to work on connections that never have tropubles, it's non sense to me…
You have one or two Gateway Groups defined? The one with time stamp 02-25-44.
What you call "WANGROUP" is easier to handle when called " PPPoE 2 UPC"
Now you need an additional "UPC 2 PPPoE" group with reversed tiers.
Add another firewall rule for that one as well and it should work.And start with setting both "Trigger levels" to "Member Down".
-
I am pretty sure this is exactly related to my issue and my most recent detailed post here:
https://forum.pfsense.org/index.php?topic=86851.msg632594#msg632594
-
Hello JM,
Globally it seems to be the same problem reported by most people writing in your post. I've seen a bug report but dev team consider it is not a bug but misconfiguration without explaining where is the misconfiguration… strange
I am pretty sure this is exactly related to my issue and my most recent detailed post here:
https://forum.pfsense.org/index.php?topic=86851.msg632594#msg632594
-
It is not a bug.
A setting that kills all states on a Tier X interface when a Tier < X interface returns to service would be a feature request.
I did not see one for this on redmine.pfsense.org.
-
After many readings on this subject it is the first time I read that this is normal and this is a feature request. I've read that this was the result of missconfiguration meaning that connection should go back to what it was before failover…
For example:
https://redmine.pfsense.org/issues/5090Chris Buechler
…
I went through and re-tested multi-WAN in general on 2.2.5 (which is the same as 2.2.4 in that regard) and it fails over and back as it should just fine every time.
...
There may be some edge case but nothing here to suggest what that might be.BUT fiew lines later, it goes another way
Chris Buechler
…
that's how it's supposed to work at this point. Sounds like you want state killing on failback, which doesn't exist at this time. feature #855 covers thathttps://redmine.pfsense.org/issues/855
So the final answer is FAILOVER DO NOT GO BACK TO INITIAL STATE
This is suprising but knowing this, I stop loosing time trying different config options…It is not a bug.
A setting that kills all states on a Tier X interface when a Tier < X interface returns to service would be a feature request.
I did not see one for this on redmine.pfsense.org.
-
It is not a bug.
A setting that kills all states on a Tier X interface when a Tier < X interface returns to service would be a feature request.
I did not see one for this on redmine.pfsense.org.
Right, but if it's not a bug, then how do you get traffic to go back over the original interface when it returns online.
Killing the states does not always work.
I have also been able to test that a brand new device connected to the network, will still route in the same way (onto the failover interface) even if the primary wan was back online BEFORE the new device was connected.
I have also been testing this in a virtual environment and can replicate the issue. Although it is not always the same. Sometimes new states will follow the correct route (back over the primary wan) and other times they will get stuck on the backup wan. It is not consistent which doesn't make sense.
-
Let's be clear, to me it is a bug. But if they say no, I have no choice.
Actually, I reset all states and sometimes I change the firewall rule (time consuming!!!) If better proposition I'm interested.
-
Killing the states does not always work.
Please demonstrate with evidence.
-
@MrD:
I did not see one for this on redmine.pfsense.org.
https://redmine.pfsense.org/issues/855
So the final answer is FAILOVER DO NOT GO BACK TO INITIAL STATE
This is suprising but knowing this, I stop loosing time trying different config options…There. Feature #855. My redmine searching could obviously use a tuneup.
-
Please demonstrate with evidence.
Ok so in very basic terms since I already have quite a lot of information on this post here https://forum.pfsense.org/index.php?topic=86851.msg632594#msg632594
-
The connection has failed over to the backup WAN when the primary WAN has gone down. (Failover has worked as expected)
-
The primary WAN has come back up (Status > Gateways confirms this is up/online).
-
The states (VoIP sessions for phones) are still showing in the state table 12hrs later going over the backup WAN.
-
No new or refreshed sessions from the phones go over the primary connection.
-
Current state table (filtered by the phone with the IP of 10.10.30.55) looks like this:
WAN_EFM udp 135.196.xxx.xxx:41809 (10.10.30.55:49679) -> 185.83.xxx.xxx:5060 MULTIPLE:MULTIPLE 201.589 K / 102.513 K 125.60 MiB / 39.52 MiB
30VOICELAN udp 185.83.xxx.xxx:5060 <- 10.10.30.55:49679 MULTIPLE:MULTIPLE 99.293 K / 99.502 K 61.87 MiB / 38.35 MiBTo clarify:
WAN_EFM - is the backup WAN connection
30VOICELAN - is the LAN network for the phones
135.196.xxx.xxx - is the IP of my backup WAN connection
185.83.xxx.xxx - is the IP of my externally hosted VoIP platform -
I have then "Reset the firewall state table"
-
At this point SOMETIMES the states will clear and obey the correct Gateway fail-over rule and be sent back over the primary WAN.
SOMETIMES they will stay where they are (on the backup WAN)
I can understand the argument that it is a feature request to have the states clear on the re-establishment of the primary wan connection.
However, why have I seen the following…-
Primary connection has been down for a length of time and has since come back online.
-
A brand new device which has never connected to the network (so therefore has no open states) is connected.
-
This new device states are sent over the backup WAN - even though the primary wan is available
-
"Reset the firewall state table" and the new device has states over the primary wan (as it should have done when it first connected to the network)
I also ran a test of this in a virtual environment and simulated the primary WAN connection dropping and re-connecting.
I was using a Linux machine as a test client and just running a PING and TRACEROUTE to use as example states on the firewall (eliminating the VoIP aspect).
Sometimes, when you bought the Primary WAN connection back online, a new TRACEROUTE to a different IP address could go over the primary WAN, and other times it would remain over the backup WAN.
I have not been able to prove what causes this - it appears random.In my mind, if the primary wan connection is reconnected and online, then any NEW state that hits the firewall should always follow the gateway group rule and go over the Tier1 connection.
Why does running a pfctl and targeting the relevant hosts/network not force clearing of the states just for the VoIP devices (without clearing the whole state table)?
Another simple way of putting it….........
If your primary connection goes down for an hour and then comes back online. At what point should your traffic start to reuse that connection again. What if your "backup" connection has a very data usage charge?
Bit of history for you…..
I used to use Draytek equipment for all my client sites, on their old 2830 series of routers, they had the WAN failover options, but the same applied… if the primary went down everything would failover to the backup and then never fail back again when the primary connection returned.
On their newer 2860 series of routers, they added one simple check box labelled "Failback" and it moved your sessions/states back to the correct primary connection when it was available again.
However on the Draytek I never had the issue where a NEW state/session would still go over the backup WAN when the primary was available. If it was a new session it always followed the rules correctly.
I hope that makes sense to some of you :)
-