Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Multi-WAN gateway failover not switching back to tier 1 gw after back online

    Scheduled Pinned Locked Moved Routing and Multi WAN
    119 Posts 35 Posters 53.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      sandrino
      last edited by

      Hi all,

      same problem WAN1 tier 1 (cable - default GW - 2Mb/2Mb), WAN2 tier 1 (WiMAX pppoe 12Mb/3Mb)  weigth 1 WAN1 : 4 WAN2

      If WAN2 goes down all traffic switch on WAN1

      When WAN2 return online (GatewayGrops all online) all connections still in WAN1.

      If I reload filter everythinks turns all rigth WAN1 1 : WAN2 4 as weigth.

      I don't use DNS Forwarder and fror monitor I use IPS dns (2 per connections).

      Please help!!!!

      Bye
      Sandro

      1 Reply Last reply Reply Quote 0
      • S
        sandrino
        last edited by

        Hi

        in "miscellaneus config" under "Gateway Monitoring" there are:

        Gateway Monitoring
        State Killing on Gateway Failure
        Flush all states when a gateway goes down The monitoring process will flush all states when a gateway goes down if this box is checked.

        Skip rules when gateway is down
        Do not create rules when gateway is down By default, when a rule has a gateway specified and this gateway is down, the rule is created omitting the gateway. This option overrides that behavior by omitting the entire rule instead.

        Someone could explain it?

        Thanks
        Bye
        Sandro

        1 Reply Last reply Reply Quote 0
        • D
          devmaybe
          last edited by

          I think I have found a solution.

          I have tested it on 2.3.2 release, it consists of 2 steps

          1. Take note of the name you assigned to your PPPoE connection (WAN2 in this example)
          2. Add the following lines at the end of "/usr/local/sbin/ppp-linkup" script (between "fi" and "exit 0" lines)

          –-----------------------
          fi

          sleep 5
          /etc/rc.newwanip wan2

          exit 0

          In all my tests traffic switches back correctly.

          Note: without the "sleep" instructions I was having mixed results, maybe is only a timing problem with pppoe activation?

          Bye

          1 Reply Last reply Reply Quote 0
          • S
            SecureIS
            last edited by

            +1 that failback would be very valuable. I have a deployment where the Tier 2 connection is pay per GB so it would be nice to be able to automate failover AND failback but I have to keep that WAN disconnected to make sure no connections get stuck on it. It's not a PPPoE link so sadly I can't use an up/down script for this :(

            We need a setting for "Flush all states when a lower tier gateway comes back up. The monitoring process will flush all states when a lower tier gateway comes up if this box is checked"

            1 Reply Last reply Reply Quote 0
            • luckman212L
              luckman212 LAYER 8
              last edited by

              I'm working on a script to kill VOIP states when WAN1 (primary) comes back online.  As mentioned elsewhere in this thread, this is a critical feature in real-world scenarios due to (a) costly metered backup connections as well as (b) SIP interop issues when devices behind the same LAN are seen registering from different public IPs.  So I won't rehash all of that. I am trying to automate pfctl from the rc.gateway_alarm script that gets called on WANUP.  I also see that a PR has been recently merged that might help make this even easier and less hacky.  Has anyone hooked into these new functions yet to make this more reliable?

              TL;DR— pfctl is not killing all of the related states. Can someone help me to understand something regarding states?

              • Assume vlan100 is dedicated for voice, with subnet 192.168.20.0/24
              • WAN1=primary, WAN2=backup
              • When a "fail back" WAN2–>WAN1 event happens, I need to kill all states: (any)->WAN2->vlan100 and vlan100->WAN2->(any)
              • I try using a command like:

              pfctl -i igb0_vlan100 -k 0.0.0.0/0

              But, this only seems to kill the states originating from inside the LAN. There are still tracked states via WAN2 that are NAT'ted to –> internal igb0_vlan100 IPs. Do I also need to run the commands like this instead?

              pfctl -k 192.168.20.0/24 -k 0.0.0.0/0
              pfctl -k 0.0.0.0/0 -k 192.168.20.0/24

              Or, some other command?  Is there a better way….  ???

              1 Reply Last reply Reply Quote 0
              • N
                nemanager
                last edited by

                Any news ?  :(

                1 Reply Last reply Reply Quote 0
                • K
                  kimkhan
                  last edited by

                  Failback to default WAN works for me.

                  I have a Gigabit Fiber connection and a Cable modem connection. I put one of them as Tier1 and the other as Tier2.

                  I used 8.8.8.8 for one and 8.8.4.4 for the other.

                  But just following all the instructions in pfsense documentation and postings here in the forum that suggests with creating groups and different level of Tiers and etc. will not work unless you have the 'Default gateway switching' box checked. You can find it under System > Advanced > Miscellaneous

                  http://prntscr.com/evn3ub

                  I tested with disconnecting WAN1 and going to whatismyip.com and then plugging WAN1 back and going to a different what is my ip site. Don't go to the first one as it will be cached and will not show your original/default wan IP.

                  Or you can just do a ping.

                  Let me know if this helps. I can also post my configurations if you need to see.

                  KK

                  Netgate SG-2440
                  2.3.3-RELEASE-p1

                  1 Reply Last reply Reply Quote 0
                  • R
                    red_cat1930
                    last edited by

                    2.3.3-RELEASE-p1 (amd64), MultiWAN, VM on Hyper-V

                    WAN1 ( tier2, monitor ip 8.8.4.4 )
                    WAN2 ( tier1, monitor ip 8.8.8.8 ).

                    Today WAN2 had alarm latecy but no clear latency occured despite the fact line becomes stable (accordingly to dashboard)

                    Usual (System logs->Gateways):
                    Apr 12 03:29:32 dpinger WAN2_DHCP 8.8.8.8: Clear latency 39052us stddev 2978us loss 5%
                    Apr 12 03:28:34 dpinger WAN2_DHCP 8.8.8.8: Alarm latency 34409us stddev 429us loss 22%

                    Today (no clear latency event):
                    –-
                    Apr 13 13:19:23 dpinger WAN2_DHCP 8.8.8.8: Alarm latency 34494us stddev 342us loss 21%

                    All clients from from LAN were using WAN1 until i manually simulated WAN2 disconnect (set 1.1.1.1 as monitor ip for a minute, then revert back to 8.8.8.8 )

                    1 Reply Last reply Reply Quote 0
                    • C
                      carmico
                      last edited by

                      same problem here

                      failover is working tier1 to tier2, but when tier1 recovers, monitor says "online" but the traffic doesn't switch back to tier1 , remains on tier2

                      PFsense ver. 2.3.3-RELEASE-p1

                      1 Reply Last reply Reply Quote 0
                      • R
                        ronnysa
                        last edited by

                        @carmico:

                        same problem here

                        failover is working tier1 to tier2, but when tier1 recovers, monitor says "online" but the traffic doesn't switch back to tier1 , remains on tier2

                        PFsense ver. 2.3.3-RELEASE-p1

                        I am having the exact same problem here.

                        2.3.3-RELEASE-p1 (amd64)
                        built on Thu Mar 09 07:17:41 CST 2017
                        FreeBSD 10.3-RELEASE-p17

                        1 Reply Last reply Reply Quote 0
                        • J
                          jono_white
                          last edited by

                          The fail back seems to work providing the PC's connection is left idle for 20 Seconds or so, but if theres an active connection after your primary connection goes down (voip, video/audio streaming or even a continuous ping), it seems to remain on the redundant connection.

                          The following script seems to work for my situation (4g modem failover with limited quota), it's nowhere near perfect but it'll shut the 4g interface down long enough for the states to be killed when the Primary WAN is up ,  would be better if it exited if there was no active states on 4G but meh..

                          (Using cron to run every 5 minutes or so,  */5 * * * * root /bin/sh /root/routercheck.sh)

                          #!/bin/sh

                          check_wan1=8.8.8.8
                          check_wan2=8.8.4.4

                          wan_ipaddress=ifconfig rl0 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1
                          backupwan_ipaddress=ifconfig rl1 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1

                          ping -c 2 -S {backupwan_ipaddress} ${check_wan2} > /dev/null 2>&1
                          wan2_resp=$?

                          backupwan_resp=expr ${wan2_resp}

                          if [ ${backupwan_resp} -gt 0 ]; then
                              exit 1
                          fi

                          ping -c 2 -S ${wan_ipaddress} ${check_wan1} > /dev/null 2>&1
                          wan1_resp=$?

                          wan_resp=expr ${wan1_resp}

                          if [ ${wan_resp} -eq 0 ]; then

                          #service netif restart rl1   
                          ifconfig rl1 down;sleep 15;ifconfig rl1 up

                          fi

                          #end

                          1 Reply Last reply Reply Quote 0
                          • E
                            eng1tx
                            last edited by

                            @jono_white:

                            The fail back seems to work providing the PC's connection is left idle for 20 Seconds or so, but if theres an active connection after your primary connection goes down (voip, video/audio streaming or even a continuous ping), it seems to remain on the redundant connection.

                            The following script seems to work for my situation (4g modem failover with limited quota), it's nowhere near perfect but it'll shut the 4g interface down long enough for the states to be killed when the Primary WAN is up ,  would be better if it exited if there was no active states on 4G but meh..

                            (Using cron to run every 5 minutes or so,  */5 * * * * root /bin/sh /root/routercheck.sh)

                            #!/bin/sh

                            check_wan1=8.8.8.8
                            check_wan2=8.8.4.4

                            wan_ipaddress=ifconfig rl0 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1
                            backupwan_ipaddress=ifconfig rl1 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1

                            ping -c 2 -S {backupwan_ipaddress} ${check_wan2} > /dev/null 2>&1
                            wan2_resp=$?

                            backupwan_resp=expr ${wan2_resp}

                            if [ ${backupwan_resp} -gt 0 ]; then
                                exit 1
                            fi

                            ping -c 2 -S ${wan_ipaddress} ${check_wan1} > /dev/null 2>&1
                            wan1_resp=$?

                            wan_resp=expr ${wan1_resp}

                            if [ ${wan_resp} -eq 0 ]; then
                                           
                            #service netif restart rl1   
                            ifconfig rl1 down;sleep 15;ifconfig rl1 up

                            fi

                            #end

                            Thank you for this…

                            I am not a script writer, but it would appear I need to change rl0 and rl1 to my specific interfaces.  Any other changes necessary?

                            Also, I have searched for a couple of hours and still cannot find what directory to install the script to, and what command to run at CLI to test.  I see that the "Filer" pkg was the preferred way, but is no longer available on my version, 2.3.4.

                            1 Reply Last reply Reply Quote 0
                            • J
                              jono_white
                              last edited by

                              Yeah , it needs to be changed to the physical interface names, not the name assigned in pfsense. script location can be anywhere, i just saved mine under /root/failback.sh , you'll need to allow it to run after saving, chmod 775 scriptname.sh should do it, aslong as the path in your cron points to the script it can go anywhere,

                              Thinking it may be better to just leave the 4g interface down until the wan stops responding though, it may have a better outcome, but it still seems to do the job

                              1 Reply Last reply Reply Quote 0
                              • J
                                jono_white
                                last edited by

                                i've changed it so the 4g is down until the primary wan stops working, this time cron is set every minute, the most time you should lose connection is maybe 70 or 80 seconds or so as it takes some time for the gateway to register as online again

                                #!/bin/sh

                                check_wan1=8.8.8.8
                                #check_wan2=8.8.4.4

                                wan_ipaddress=ifconfig rl0 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1
                                #backupwan_ipaddress=ifconfig rl1 | grep 'inet ' | awk '{ print $2}' | cut -d'/' -f1

                                #ping -c 2 -S {backupwan_ipaddress} ${check_wan2} > /dev/null 2>&1
                                #wan2_resp=$?

                                #backupwan_resp=expr ${wan2_resp}

                                #if [ ${backupwan_resp} -eq 1 ]; then
                                #    exit 1

                                #fi

                                ping -c 2 -S ${wan_ipaddress} ${check_wan1} > /dev/null 2>&1
                                wan1_resp=$?

                                wan_resp=expr ${wan1_resp}

                                if [ ${wan_resp} -eq 0 ]; then

                                ifconfig rl1 down

                                fi

                                if [ ${wan_resp} -gt 0 ]; then

                                #service netif restart rl1
                                ifconfig rl1 up

                                fi

                                #end

                                1 Reply Last reply Reply Quote 0
                                • D
                                  David127
                                  last edited by

                                  Hello.

                                  I have a similar problem with failover. I use one openvpn client as tier1 and the second openvpn client as tier2.
                                  After tier1 is online, pfsense does not switch back from tier2 to tier1.

                                  Is the solution from kimkhan suitable for me? Any other solutions?

                                  1 Reply Last reply Reply Quote 0
                                  • N
                                    nemanager
                                    last edited by

                                    Dear pfSense Staff, this is a very important issue, we can find a solution? :-)

                                    1 Reply Last reply Reply Quote 0
                                    • D
                                      David127
                                      last edited by

                                      I have perhaps found my problem in pfSense 2.3.4.

                                      My setup in System > Routing > Gateway Groups:

                                      OpenVPN Client1 = tier1
                                      OpenVPN Client2 = tier2

                                      I have not set 'Default gateway switching' or anything else.

                                      In System > Routing > Gateways I have set:

                                      Monitor IP of OpenVPN Client1 = 8.8.8.8
                                      Monitor IP of OpenVPN Client2 = 8.8.4.4

                                      If I disable OpenVPN Client1, then pfsense switches to OpenVPN Client2 correctly.
                                      But only after I have activated 'Apply Settings' in "System > Routing > Gateways"

                                      If I activate OpenVPN Client1, then pfsense switches back to OpenVPN Client1.
                                      But only after I have activated 'Apply Settings' in "System > Routing > Gateways" again.

                                      Does somebody has any idea, which settings should I make?

                                      1 Reply Last reply Reply Quote 0
                                      • L
                                        lobi
                                        last edited by

                                        I have the same issue with my failover WAN. If WAN1 goes down=Offline switching to WAN2 works correctly but it don't switch back to WAN1 if its available again. I have to deactivate and activate WAN1 manually.

                                        Anyone have a solution?

                                        I've set multi WAN as the screenshots below:

                                        13:37 WAN1 goes offline
                                        14:00 WAN1 is available again but don't switch back to WAN2

                                        System Logs:

                                        Feb 27 13:37:48 rc.gateway_alarm 91471 >>> Gateway alarm: WAN1GW (Addr:8.8.4.4 Alarm:1 RTT:18713ms RTTsd:4408ms Loss:22%)
                                        Feb 27 13:37:48 check_reload_status updating dyndns WAN1GW
                                        Feb 27 13:37:48 check_reload_status Restarting ipsec tunnels
                                        Feb 27 13:37:48 check_reload_status Restarting OpenVPN tunnels/interfaces
                                        Feb 27 13:37:48 check_reload_status Reloading filter
                                        Feb 27 13:37:50 php-fpm 5341 /rc.dyndns.update: Default gateway down setting WAN2_PPPOE as default!
                                        Feb 27 13:37:50 php-fpm 5341 /rc.dyndns.update: MONITOR: WAN1GW is down, omitting from routing group DualWAN 8.8.4.4|192.168.100.2|WAN1GW|18.747ms|4.479ms|25%|down
                                        Feb 27 13:37:50 php-fpm 5341 /rc.dyndns.update: Default gateway down setting WAN2_PPPOE as default!
                                        Feb 27 13:37:50 php-fpm 65471 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN1GW.
                                        Feb 27 13:37:50 php-fpm 65471 /rc.openvpn: Default gateway down setting WAN2_PPPOE as default!
                                        Feb 27 13:37:50 php-fpm 65471 /rc.filter_configure_sync: Default gateway down setting WAN2_PPPOE as default!
                                        Feb 27 13:40:20 php-fpm 11774 /services_dyndns.php: Default gateway down setting WAN2_PPPOE as default!
                                        Feb 27 13:40:24 check_reload_status Syncing firewall
                                        Feb 27 13:40:24 php-fpm 11697 /services_dyndns.php: Default gateway down setting WAN2_PPPOE as default!
                                        Feb 27 14:11:47 check_reload_status Syncing firewall
                                        Feb 27 14:11:47 check_reload_status Reloading filter
                                        Feb 27 14:11:48 php-fpm 23963 /rc.filter_configure_sync: Default gateway down setting WAN2_PPPOE as default!
                                        Feb 27 14:20:59 check_reload_status Syncing firewall
                                        Feb 27 14:20:59 check_reload_status Reloading filter
                                        Feb 27 14:21:00 php-fpm 34668 /rc.filter_configure_sync: Default gateway down setting WAN2_PPPOE as default!
                                        Feb 27 14:24:10 check_reload_status Syncing firewall
                                        Feb 27 14:24:10 check_reload_status Reloading filter
                                        Feb 27 14:24:11 php-fpm 5833 /rc.filter_configure_sync: Default gateway down setting WAN2_PPPOE as default!

                                        Thanks

                                        1 Reply Last reply Reply Quote 0
                                        • M
                                          markn455
                                          last edited by

                                          I am having a similar issue. However, I am unable to get it to work simply failing over from WAN1 to WAN2. The logs show that when WAN1 goes down the default it appears things switch. Whoever, traffic does not flow and the pfSense UI hangs (actually becomes very slow). Then Bringing WAN1 back on-line does to resume traffic flow. The quickest way to get things going again is to restart the box. I have configured the dual wan configuration in the simplest way similar to your. I have also tried the suggested configurations that does not use the automatic gateway switching building gateway groups in accordance with the configuration suggestions. I have tried using different hardware and rebuilding pfSense from scratch.  The frustrating part for me is I can take a commercial firewall that supports multi-wan and configure things in a similar fashion and it works perfect every time. I apologize for not having a proven solution. You getting it to work this far is great. The only thing different is that many of the multi-wan configuration recommendations is that You have an additional gateway group that handles flipping connections back the other way.

                                          This is one of many configuration examples out there https://www.cyberciti.biz/faq/howto-configure-dual-wan-load-balance-failover-pfsense-router/. I found this one to be helpful as most are. I think in my case it just my limited experience or perhaps I have a glaring simple issue preventing mine from working that I am just missing.

                                          S 1 Reply Last reply Reply Quote 0
                                          • S
                                            satadru @markn455
                                            last edited by

                                            @markn455 Did you ever find a working configuration for failing back to a primary connection once it comes back up?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.