Multi-WAN gateway failover not switching back to tier 1 gw after back online
-
The fail back seems to work providing the PC's connection is left idle for 20 Seconds or so, but if theres an active connection after your primary connection goes down (voip, video/audio streaming or even a continuous ping), it seems to remain on the redundant connection.
The following script seems to work for my situation (4g modem failover with limited quota), it's nowhere near perfect but it'll shut the 4g interface down long enough for the states to be killed when the Primary WAN is up , would be better if it exited if there was no active states on 4G but meh..
(Using cron to run every 5 minutes or so, */5 * * * * root /bin/sh /root/routercheck.sh)
#!/bin/sh
check_wan1=8.8.8.8
check_wan2=8.8.4.4wan_ipaddress=
ifconfig rl0 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1
backupwan_ipaddress=ifconfig rl1 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1
ping -c 2 -S {backupwan_ipaddress} ${check_wan2} > /dev/null 2>&1
wan2_resp=$?backupwan_resp=
expr ${wan2_resp}
if [ ${backupwan_resp} -gt 0 ]; then
exit 1
fiping -c 2 -S ${wan_ipaddress} ${check_wan1} > /dev/null 2>&1
wan1_resp=$?wan_resp=
expr ${wan1_resp}
if [ ${wan_resp} -eq 0 ]; then
#service netif restart rl1
ifconfig rl1 down;sleep 15;ifconfig rl1 upfi
#end
-
The fail back seems to work providing the PC's connection is left idle for 20 Seconds or so, but if theres an active connection after your primary connection goes down (voip, video/audio streaming or even a continuous ping), it seems to remain on the redundant connection.
The following script seems to work for my situation (4g modem failover with limited quota), it's nowhere near perfect but it'll shut the 4g interface down long enough for the states to be killed when the Primary WAN is up , would be better if it exited if there was no active states on 4G but meh..
(Using cron to run every 5 minutes or so, */5 * * * * root /bin/sh /root/routercheck.sh)
#!/bin/sh
check_wan1=8.8.8.8
check_wan2=8.8.4.4wan_ipaddress=
ifconfig rl0 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1
backupwan_ipaddress=ifconfig rl1 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1
ping -c 2 -S {backupwan_ipaddress} ${check_wan2} > /dev/null 2>&1
wan2_resp=$?backupwan_resp=
expr ${wan2_resp}
if [ ${backupwan_resp} -gt 0 ]; then
exit 1
fiping -c 2 -S ${wan_ipaddress} ${check_wan1} > /dev/null 2>&1
wan1_resp=$?wan_resp=
expr ${wan1_resp}
if [ ${wan_resp} -eq 0 ]; then
#service netif restart rl1
ifconfig rl1 down;sleep 15;ifconfig rl1 upfi
#end
Thank you for this…
I am not a script writer, but it would appear I need to change rl0 and rl1 to my specific interfaces. Any other changes necessary?
Also, I have searched for a couple of hours and still cannot find what directory to install the script to, and what command to run at CLI to test. I see that the "Filer" pkg was the preferred way, but is no longer available on my version, 2.3.4.
-
Yeah , it needs to be changed to the physical interface names, not the name assigned in pfsense. script location can be anywhere, i just saved mine under /root/failback.sh , you'll need to allow it to run after saving, chmod 775 scriptname.sh should do it, aslong as the path in your cron points to the script it can go anywhere,
Thinking it may be better to just leave the 4g interface down until the wan stops responding though, it may have a better outcome, but it still seems to do the job
-
i've changed it so the 4g is down until the primary wan stops working, this time cron is set every minute, the most time you should lose connection is maybe 70 or 80 seconds or so as it takes some time for the gateway to register as online again
#!/bin/sh
check_wan1=8.8.8.8
#check_wan2=8.8.4.4wan_ipaddress=
ifconfig rl0 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1
#backupwan_ipaddress=ifconfig rl1 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1
#ping -c 2 -S {backupwan_ipaddress} ${check_wan2} > /dev/null 2>&1
#wan2_resp=$?#backupwan_resp=
expr ${wan2_resp}
#if [ ${backupwan_resp} -eq 1 ]; then
# exit 1#fi
ping -c 2 -S ${wan_ipaddress} ${check_wan1} > /dev/null 2>&1
wan1_resp=$?wan_resp=
expr ${wan1_resp}
if [ ${wan_resp} -eq 0 ]; then
ifconfig rl1 down
fi
if [ ${wan_resp} -gt 0 ]; then
#service netif restart rl1
ifconfig rl1 upfi
#end
-
Hello.
I have a similar problem with failover. I use one openvpn client as tier1 and the second openvpn client as tier2.
After tier1 is online, pfsense does not switch back from tier2 to tier1.Is the solution from kimkhan suitable for me? Any other solutions?
-
Dear pfSense Staff, this is a very important issue, we can find a solution? :-)
-
I have perhaps found my problem in pfSense 2.3.4.
My setup in System > Routing > Gateway Groups:
OpenVPN Client1 = tier1
OpenVPN Client2 = tier2I have not set 'Default gateway switching' or anything else.
In System > Routing > Gateways I have set:
Monitor IP of OpenVPN Client1 = 8.8.8.8
Monitor IP of OpenVPN Client2 = 8.8.4.4If I disable OpenVPN Client1, then pfsense switches to OpenVPN Client2 correctly.
But only after I have activated 'Apply Settings' in "System > Routing > Gateways"If I activate OpenVPN Client1, then pfsense switches back to OpenVPN Client1.
But only after I have activated 'Apply Settings' in "System > Routing > Gateways" again.Does somebody has any idea, which settings should I make?
-
I have the same issue with my failover WAN. If WAN1 goes down=Offline switching to WAN2 works correctly but it don't switch back to WAN1 if its available again. I have to deactivate and activate WAN1 manually.
Anyone have a solution?
I've set multi WAN as the screenshots below:
13:37 WAN1 goes offline
14:00 WAN1 is available again but don't switch back to WAN2System Logs:
Feb 27 13:37:48 rc.gateway_alarm 91471 >>> Gateway alarm: WAN1GW (Addr:8.8.4.4 Alarm:1 RTT:18713ms RTTsd:4408ms Loss:22%)
Feb 27 13:37:48 check_reload_status updating dyndns WAN1GW
Feb 27 13:37:48 check_reload_status Restarting ipsec tunnels
Feb 27 13:37:48 check_reload_status Restarting OpenVPN tunnels/interfaces
Feb 27 13:37:48 check_reload_status Reloading filter
Feb 27 13:37:50 php-fpm 5341 /rc.dyndns.update: Default gateway down setting WAN2_PPPOE as default!
Feb 27 13:37:50 php-fpm 5341 /rc.dyndns.update: MONITOR: WAN1GW is down, omitting from routing group DualWAN 8.8.4.4|192.168.100.2|WAN1GW|18.747ms|4.479ms|25%|down
Feb 27 13:37:50 php-fpm 5341 /rc.dyndns.update: Default gateway down setting WAN2_PPPOE as default!
Feb 27 13:37:50 php-fpm 65471 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN1GW.
Feb 27 13:37:50 php-fpm 65471 /rc.openvpn: Default gateway down setting WAN2_PPPOE as default!
Feb 27 13:37:50 php-fpm 65471 /rc.filter_configure_sync: Default gateway down setting WAN2_PPPOE as default!
Feb 27 13:40:20 php-fpm 11774 /services_dyndns.php: Default gateway down setting WAN2_PPPOE as default!
Feb 27 13:40:24 check_reload_status Syncing firewall
Feb 27 13:40:24 php-fpm 11697 /services_dyndns.php: Default gateway down setting WAN2_PPPOE as default!
Feb 27 14:11:47 check_reload_status Syncing firewall
Feb 27 14:11:47 check_reload_status Reloading filter
Feb 27 14:11:48 php-fpm 23963 /rc.filter_configure_sync: Default gateway down setting WAN2_PPPOE as default!
Feb 27 14:20:59 check_reload_status Syncing firewall
Feb 27 14:20:59 check_reload_status Reloading filter
Feb 27 14:21:00 php-fpm 34668 /rc.filter_configure_sync: Default gateway down setting WAN2_PPPOE as default!
Feb 27 14:24:10 check_reload_status Syncing firewall
Feb 27 14:24:10 check_reload_status Reloading filter
Feb 27 14:24:11 php-fpm 5833 /rc.filter_configure_sync: Default gateway down setting WAN2_PPPOE as default!Thanks
-
I am having a similar issue. However, I am unable to get it to work simply failing over from WAN1 to WAN2. The logs show that when WAN1 goes down the default it appears things switch. Whoever, traffic does not flow and the pfSense UI hangs (actually becomes very slow). Then Bringing WAN1 back on-line does to resume traffic flow. The quickest way to get things going again is to restart the box. I have configured the dual wan configuration in the simplest way similar to your. I have also tried the suggested configurations that does not use the automatic gateway switching building gateway groups in accordance with the configuration suggestions. I have tried using different hardware and rebuilding pfSense from scratch. The frustrating part for me is I can take a commercial firewall that supports multi-wan and configure things in a similar fashion and it works perfect every time. I apologize for not having a proven solution. You getting it to work this far is great. The only thing different is that many of the multi-wan configuration recommendations is that You have an additional gateway group that handles flipping connections back the other way.
This is one of many configuration examples out there https://www.cyberciti.biz/faq/howto-configure-dual-wan-load-balance-failover-pfsense-router/. I found this one to be helpful as most are. I think in my case it just my limited experience or perhaps I have a glaring simple issue preventing mine from working that I am just missing.
-
@markn455 Did you ever find a working configuration for failing back to a primary connection once it comes back up?
-
Goal: Have auto fail-over to 2nd ISP when 1st ISP is down. When 1st ISP comes back, re-enable as primary in routes.
My ISP setup all use static ip config, have not tested with a dynamic interface.Here is what works for me, various sites, multiple ISP fail-overs.
Step 1:
Navigate to: System - > Advanced -> Miscellaneous
Make sure "Default gateway switching" is UNCHECKED.Step 2:
Configure your gateway group accordingly. Tier1 is highest priority.
I use Member Down as trigger.Step 3:
Choose the gateway group in your firewall rules setup.
You find this under Advanced Options for each rule you want to make use of the gateway group.Simple test if working, plug out Tier1 cable from firewall. Should fail-over to Tier2.
Plug back in Tier1 cable, should become the default route almost instantly. -
Just for clarification:
-
Default gateway switching is unchecked.
-
A single gateway group with Tier 1 gateway being highest priority, and Tier 2 being lower priority, and member down is the trigger.
-
Firewall rules use that gateway group.
And that works for failover? If you pull the cord for gateway 1 it switches to gateway 2? And if you reconnect gateway 1 it switches back?
-
-
@satadru
Yes that is correct. The switch back and forth between tiers is fully automatic. -
Although this is an older thread, I have the same issue happening with the very latest version, as of September, 2019. I have three WAN connections, and one of the gateways I have configured has two of the gateways on it. My PFSense will failover to my Tier 2 connection automatically; but when it comes back up, it will not go back to the Tier 1. I even tried clearing the states - no change. I tried changing the gateway set as Tier 2, and it just routed all the traffic thru that gateway, instead of the Tier 1. All gateways are up, and show as up.
What more can I do to debug this? I did not find the "Default Gateway Switching" option where indicated. Indeed, my "default" gateway is the Tier 1 gateway that seems not to be being used by the Gateway group.
My config is a bit complex, but I'm happy to try to debug this. Just need direction. Thanks.
Bob
-
I ended up writing a script and running it via cron to achieve the "switch." Yes, it is not elegant, but it gets the job done.
Here's what I have and I run this as a 5-minute cron job.
#!/bin/sh # get active gateway and current time CURRENT_TIME="$(date +"%c")" CURRENT_GW="$(netstat -rn | grep default | awk '{print $4}')" if [ $CURRENT_GW = "em2" ]; then #check if WAN1 is up or not WAN1_STATUS="$(pfSsh.php playback gatewaystatus brief | grep WANGW | awk '{print $2}')" if [ $WAN1_STATUS = "none" ]; then #WAN1 is back online, stop/start WAN2 echo "$CURRENT_TIME: Bringing down WAN2" ifconfig em2 down echo "$CURRENT_TIME: Sleeping for 30s" sleep 30 echo "$CURRENT_TIME: Bringing up WAN2" ifconfig em2 up else echo "$CURRENT_TIME: WAN1 is still down" fi else echo "$CURRENT_TIME: Nothing to do!" fi
-
Hey. Thanks @ibbetsion for the script.
Here is a slightly modified version that kills firewall states when there are connections remaining on WAN2 and WAN1 is back online.
Works great for my needs ( LTE failover ).
I set it as a cron, every minute:
*/1 * * * * /root/clear_state_back_from_failover_cron.sh >> /root/clear_state_back_from_failover_cron.log
- I also checked "Flush all states when a gateway goes down" in System / Advanced / Miscellaneous.
- The LTE gateway has monitoring disabled "Disable Gateway Monitoring" in System / Routing / Gateways. Otherwise states will be created on the interface and the script becomes wrong. Also, monitoring would consume data and I did not want that.
Code:
#!/bin/sh # *** kills firewall states on failover WAN when WAN1 is up *** WAN1_NAME="WAN_DHCP" WAN2_IF=ue0 WAN2_GW_IP=192.168.3.1 CURRENT_TIME="$(date +"%c")" WAN1_STATUS=`pfSsh.php playback gatewaystatus brief | grep "$WAN1_NAME" | awk '{print $2}'` if [ "$WAN1_STATUS" = "none" ]; then # the following line may need to be tweaked depending on your needs WAN2_NSTATES=`pfctl -s state | grep "$WAN2_IF" | grep -v " -> $WAN2_GW_IP" | wc -l` if [ "$WAN2_NSTATES" -gt 0 ]; then echo "$CURRENT_TIME: WAN1 is online, but connections remain on $WAN2_IF. Killing states." pfctl -F state fi else echo "$CURRENT_TIME: WAN1 is down" fi
-
I'm really surprised pfSense has nothing built in to handle this yet. This has been ongoing since 2017. In my case, my LTE modem (unlimited data) is still in gateway monitoring mode, so I'll be using @ibbetsion script. Thanks @ibbetsion
-
EDIT2
Issues not fixed. If i pull cable and put it right back in it will mess up Multi-WAN. Will not switch back correctly.EDIT:
Resetting to defaults and setting everything up again seems to have fixed my issues.Old:
I think i found the cause:Seems Multi WAN is not working properly (or maybe dpinger) if the Interface goes down and back up (unplugging and re plugging).
In my tests i was just unplugging the cable on the WAN-Port.I think the same happens with PPPOE or anytime the link is down and up again (physically).
This should not happen In my opinion. If modems reboot and so on: MultiWan would stop working.
I use "Paket Loss" as trigger level on Gateway-Group.
Would love to hear from you, thanks.
-
Hello!
I have several sites using multi wan and gateway groups with a mixture of static, dhcp, and pppoe. They all behave as expected.
Are you policy routing all of your WAN bound traffic?
"Defining gateway groups is only part of the story. Traffic must be assigned to these gateways using the Gateway setting on firewall rules."
https://docs.netgate.com/pfsense/en/latest/routing/multi-wan.html#firewall-rules
My experience is that you cant depend on the system routing table having your "preferred" (tier1) default route.
John
-
@serbus said in Multi-WAN gateway failover not switching back to tier 1 gw after back online:
Hello!
I have several sites using multi wan and gateway groups with a mixture of static, dhcp, and pppoe. They all behave as expected.
Are you policy routing all of your WAN bound traffic?
"Defining gateway groups is only part of the story. Traffic must be assigned to these gateways using the Gateway setting on firewall rules."
https://docs.netgate.com/pfsense/en/latest/routing/multi-wan.html#firewall-rules
My experience is that you cant depend on the system routing table having your "preferred" (tier1) default route.
John
Thank you,
strange is: if i don't unplug a cable on testing and switch off internet without pulling the cable, everything works just as expected. Every time. Soon as i unplug and replug i have to save interface settings for example to get switiching back to default Tier back working.
I checked almost every configuration before and nothing really helped.