Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    How to configure failback for WAN1 up

    Scheduled Pinned Locked Moved Routing and Multi WAN
    38 Posts 11 Posters 11.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Offline
      MikeDPitt
      last edited by

      I have pfsense 2.2, and currently have two different static IP's setup on WAN1 and OPT1. I have 2 gateway groups setup, one preferring each gateway. I also have two firewall rules, one which makes WAN1 failover to OPT1 after packet loss to google DNS, and the other makes OPT1 failback to WAN1 also for packet loss to google DNS. My question is, how can I configure it so that it fails back not when OPT1 fails, but when WAN1 is back up. Is this possible?

      1 Reply Last reply Reply Quote 0
      • B Offline
        burnsl
        last edited by

        I just asked this same question and i got smacked with a data cap overage when our surveillance system continued to pull video feeds from the failover link even though the main link came back up.

        1 Reply Last reply Reply Quote 0
        • M Offline
          Mr. Jingles
          last edited by

          @MikeDPitt:

          I have pfsense 2.2, and currently have two different static IP's setup on WAN1 and OPT1. I have 2 gateway groups setup, one preferring each gateway. I also have two firewall rules, one which makes WAN1 failover to OPT1 after packet loss to google DNS, and the other makes OPT1 failback to WAN1 also for packet loss to google DNS. My question is, how can I configure it so that it fails back not when OPT1 fails, but when WAN1 is back up. Is this possible?

          This, according to my knowledge, is standard functionality (at least, it works that way for me).

          You need only 1 GW-group. When WAN is down it should fail over to OPT, when WAN is up again it should go back to WAN.

          Of course: states. There is a setting for it in advanced settings/network if I recall correctly.

          6 and a half billion people know that they are stupid, agressive, lower life forms.

          1 Reply Last reply Reply Quote 0
          • B Offline
            burnsl
            last edited by

            Well, this is not failing back.

            The "kill sates" function is working to fail over, but the connections are not failing back.
            Any established connection on the opt-wan is maintained and not "killed" to fail back.

            Any new connections requested after the fail-back are being made ont eh primary WAN but there is no fail-back for current connections on the opt-WAN.

            This is very problematic, especially if your opt-wan interface has a silly data cap like ours.

            1 Reply Last reply Reply Quote 0
            • luckman212L Offline
              luckman212 LAYER 8
              last edited by

              I think I'm having the same problem with some VoIP phones at one of our offices. They have a dual WAN setup with 1 gateway group set so WAN1–>WAN2 fails over.  For web browsing this works great; when the primary WAN fails, users are seamlessly routed via WAN2 and when WAN1 comes back up, they are put back on WAN1.

              But the VoIP phones get "stuck" on WAN2 and do not "fail back" until I physically down the WAN2 interface. This is problematic for us because this particular VoIP provider only allows 1 authorized IP at a time to make calls -- so we wind up with 1/2 the office having dead phones.

              Is there any script that can be run to kill any states on any Tier2 WAN links when the WAN1 returns to service?

              1 Reply Last reply Reply Quote 0
              • K Offline
                kujina
                last edited by

                I can also confirm this behavior. I'm on the latest 2.2-RELEASE (i386)

                I have wan1 & wan2 in a gateway group called Failover and wan1=Tier1 and wan2=Tier2 and the Trigger Level is Member Down. I confirmed this issue when testing with a newsgroup client to download from Usenet.
                If I unplugged wan1 pfsense goes over to wan2, when I plug wan1 back in, the newsgroup client is still downloading through wan2.

                1 Reply Last reply Reply Quote 0
                • luckman212L Offline
                  luckman212 LAYER 8
                  last edited by

                  Does anyone know a way to trigger e.g. a pfctl -k when a "WAN UP" event is detected on the primary?

                  1 Reply Last reply Reply Quote 0
                  • T Offline
                    tofutim
                    last edited by

                    Just wanted to increment this thread and say that in 2.2.1 it is still not failing back. :P

                    1 Reply Last reply Reply Quote 0
                    • B Offline
                      benmaca
                      last edited by

                      Hi,
                      i just want to let you know we are experiencing the same issue.
                      have to force WAN2 down to redefine routing table and redirect traffic to WAN1
                      thanks.

                      1 Reply Last reply Reply Quote 0
                      • K Offline
                        kapara
                        last edited by

                        Any resolution to this issue?

                        Skype ID:  Marinhd

                        1 Reply Last reply Reply Quote 0
                        • K Offline
                          kapara
                          last edited by

                          I have this issue with voip.  I just posted a bounty to get this resolved as a possible package or script.  Join in and lets get this resolved! :-)

                          Skype ID:  Marinhd

                          1 Reply Last reply Reply Quote 0
                          • K Offline
                            kapara
                            last edited by

                            What would be nice would be if a script would run that when switching back to original gateway that s script kills connections to a specific gateway based on ip or range or even alias

                            Skype ID:  Marinhd

                            1 Reply Last reply Reply Quote 0
                            • J Offline
                              jmonline
                              last edited by

                              I am able to create the same issue, running a clean install of v2.3.1

                              2 WANs setup in gateway groups called "failover"
                              WAN1 - tier 1
                              WAN2 - tier 2

                              LAN Firewall rule specifying the Gateway as the failover gateway group.

                              If WAN1 goes down, all traffic fails over to WAN2 as expected - you can see this in Diag > States and can confirm by doing a trace route from any LAN device.

                              When WAN1 comes back up (status > Gateway confirms "online") - some state's remain over WAN2. Standard HTTP traffic will revert to WAN1 within a few minutes. However traffic such as VoIP/SIP remains over WAN2 and the diag>states table confirms this. Can be left for 8hrs+ and remains the same.

                              It's a pain having our VoIP traffic sent over the wrong WAN for any length of time.

                              1 Reply Last reply Reply Quote 0
                              • K Offline
                                kapara
                                last edited by

                                I think the best way to do this would be to ensure that the traffic that needs to fail over properly is on it's on VLAN so that in the event that we need to kill states this can be done to the entire interface rather than Trying to pick and choose IP address is to do this too. This could possibly include shutting down the interface and starting the interface back up again unless that does not kill the states.

                                I like the idea of an entire subnet or VLAN because I am pretty sure that the command would be much easier and simpler to run when affecting an entire subnet

                                Skype ID:  Marinhd

                                1 Reply Last reply Reply Quote 0
                                • K Offline
                                  kapara
                                  last edited by

                                  I posted a bounty for this.  If no one steps up, I am looking at engaging someone to create a script fo some kind..CRON etc to get this going.  If anyone is interested please contact me.  I am also looking at upwork if no one bites here on the pfsense forum.

                                  Skype ID:  Marinhd

                                  1 Reply Last reply Reply Quote 0
                                  • A Offline
                                    Aerin
                                    last edited by

                                    @jmonline:

                                    I am able to create the same issue, running a clean install of v2.3.1

                                    2 WANs setup in gateway groups called "failover"
                                    WAN1 - tier 1
                                    WAN2 - tier 2

                                    LAN Firewall rule specifying the Gateway as the failover gateway group.

                                    If WAN1 goes down, all traffic fails over to WAN2 as expected - you can see this in Diag > States and can confirm by doing a trace route from any LAN device.

                                    When WAN1 comes back up (status > Gateway confirms "online") - some state's remain over WAN2. Standard HTTP traffic will revert to WAN1 within a few minutes. However traffic such as VoIP/SIP remains over WAN2 and the diag>states table confirms this. Can be left for 8hrs+ and remains the same.

                                    It's a pain having our VoIP traffic sent over the wrong WAN for any length of time.

                                    Hello everyone, first post here, glad to be now an active member of the community :)
                                    I've had a similar problem for three days now, and after searching about it, I found this (see post #21 from Chris). Maybe that's what you are running into.

                                    But the similarity ends here, my problem is that none of the traffic is sent back to WAN1 after recovering :-\ A few explanations:
                                    My hardware setup: Alix board with 3Gb ethernet interfaces: re0 as WAN1, re1 as LAN, re2 as WAN2. I have a 300Mb fiber connection (from the french ISP SFR) on WAN1 connected to the fiber > ethernet adapter provided by the ISP, with DHCP enabled so that I got my IP from them (so no other router between my Pf appliance and their network), and borrowed a 10Mb/s 4G connection (provided by another french ISP, Bouygues) at the office on a D-Link router, connected to my WAN2 interface, also as DHCP (so it gets its IP from de D-Link router).
                                    My software setup: pfSense 2.3.1. WAN1 and WAN2 are bonded in a Gateway Group, WAN1 as Tier1, WAN2 as Tier2, trigger: member down. WAN1 is the default gateway. Both monitoring IP are external (Google DNS). DNS servers under "System > General Setup" are a mix between Google DNS and OpenDNS, and I checked "Do not use DNS forwarder or resolver as DNS server for the firewall". I added a rule for the LAN interface to use the Gateway Group as default gateway.

                                    The problem: under normal operations, all the traffic is routed through WAN1, no problem. If I unplug WAN1, the traffic is routed through WAN2, again, no problem. But if I re-plug WAN1, the traffic never goes back to the Tier1 gateway, even after hours, even after reseting the states under "Diagnostic > States". The only way to get it back to WAN1 is to unplug WAN2. It concern every traffic (HTTPS, HTTPS, VoIP, …) on every device (my computer, the smartphones, my media box, my home server).
                                    For information, my WAN1 interface is up, the gateway external monitoring IP is reachable, the DNS are responding.

                                    Did I miss something ? I think I made a mistake somewhere, but after hours of research, I cannot point it out... Or does anyone runs into the same problem ?
                                    Thanks everyone, and sorry about my english, not my native language :-[

                                    1 Reply Last reply Reply Quote 0
                                    • K Offline
                                      kapara
                                      last edited by

                                      Also I have been doing some research.  I am no linux or freebsd admin but I did find this and there must be some way to script this so that you can check which interface the states are connected to and if both interfaces are up/up and the states are still connected via the failover interface that the pfctl -k xxx.xxx.xxx.xxx/xxx can be executed to force the states to return to their primary or tier 1 interface.

                                      https://www.freebsd.org/cgi/man.cgi?query=pfctl&sektion=8

                                      from shell I performed the following commands and it looks to have killed all that states related to either the IP or subnet.

                                      pfctl -k 10.20.30.115

                                      pfctl -k 10.20.30.0/24

                                      NAME
                                          pfctl – control the packet filter (PF) device

                                      SYNOPSIS
                                          pfctl [-AdeghmNnOPqRrvz] [-a anchor] [-D macro= value] [-F modifier]
                                        [-f file] [-i interface] [-K host | network] [-k host | network |
                                        label | id] [-o level] [-p device] [-s modifier] [-t table -T
                                        command [address …]] [-x level]

                                      -k host | network | label | id
                                          Kill all of the state entries matching the specified host,
                                          network, label, or id.

                                      For example, to kill all of the state entries originating from
                                          ``host'':

                                      # pfctl -k host

                                      A second -k host or -k network option may be specified, which
                                          will kill all the state entries from the first host/network to
                                          the second.  To kill all of the state entries from host1'' to     host2'':

                                      # pfctl -k host1 -k host2

                                      To kill all states originating from 192.168.1.0/24 to
                                          172.16.0.0/16:

                                      # pfctl -k 192.168.1.0/24 -k 172.16.0.0/16

                                      A network prefix length of 0 can be used as a wildcard.  To kill
                                          all states with the target ``host2'':

                                      # pfctl -k 0.0.0.0/0 -k host2

                                      It is also possible to kill states by rule label or state ID.  In
                                          this mode the first -k argument is used to specify the type of
                                          the second argument.  The following command would kill all states
                                          that have been created from rules carrying the label ``foobar'':

                                      # pfctl -k label -k foobar

                                      To kill one specific state by its unique state ID (as shown by
                                          pfctl -s state -vv), use the id modifier and as a second argument
                                          the state ID and optional creator ID.  To kill a state with ID
                                          4823e84500000003 use:

                                      # pfctl -k id -k 4823e84500000003

                                      To kill a state with ID 4823e84500000018 created from a backup
                                          firewall with hostid 00000002 use:

                                      # pfctl -k id -k 4823e84500000018/2

                                      Skype ID:  Marinhd

                                      1 Reply Last reply Reply Quote 0
                                      • K Offline
                                        kapara
                                        last edited by

                                        I posted a job on upwork since I am not getting any takers on the pfsense Bounty page…

                                        Skype ID:  Marinhd

                                        1 Reply Last reply Reply Quote 0
                                        • K Offline
                                          kapara
                                          last edited by

                                          Had script created…

                                          Have not tested yet though....

                                          https://forum.pfsense.org/index.php?topic=113643.0

                                          Skype ID:  Marinhd

                                          1 Reply Last reply Reply Quote 0
                                          • J Offline
                                            jmonline
                                            last edited by

                                            Just to show you have I have posted onto the Redmine Bug#5090

                                            In simple terms, take a VoIP/SIP phone service, if a connection failovers over from the primary WAN1 connection to a secondary WAN2 connection, at what point should that VoIP/SIP connection be expected to fall back onto the WAN1 connection when it becomes available again. Are you saying that with state killing on failback it would move these sessions immediately?
                                            Or how long would/should the state remain open on the WAN2 connection?

                                            We are currently having real problems with this on 2 client sites setup as follows:

                                            WAN1 - ADSL connection just used for VoIP traffic
                                            WAN2 - EFM higher bandwidth connection used for all internet access, VPN etc.

                                            Gateway group named "EFMFirst"
                                            WAN2 EFM - Tier 1
                                            WAN1 ADSL - Tier 2

                                            Gateway group named "DSLFirst"
                                            WAN1 ADSL - Tier 1
                                            WAN2 EFM - Tier 2

                                            Firewall Rules for Voice network:
                                            Traffic set to Gateway: DSLFirst

                                            Firewall Rules for LAN network:
                                            Traffic set to Gateway: EFMFirst

                                            The problem is that if the ADSL line drops, the VoIP traffic goes onto the EFM connection. This is fine for a short period of time, but due to the other traffic on this line the bandwidth is not enough so we can get call quality issues. This is not a problem for a short period of time (better to have some phone service than none at all).

                                            When the ADSL line comes back online (Status>Gateways confirms this), the VoIP traffic stays over the EFM connection. Looking at the State table you can see the TCP & UDP traffic stuck to WAN2.

                                            It can be left for 24hrs and still the VoIP traffic will be on the wrong WAN. It will never move the traffic back onto the ADSL connection where it should be. Therefore the call quality issues remain due to the lack of bandwidth.

                                            What would you suggest, is this truly not a bug?
                                            Is there not something that can force the states to re-associate with the firewall rule and therefor the correct WAN gateway after a specified period of time perhaps?

                                            Also if you Kill the 2 States for each VoIP phone in the Diagnostics > States section, they re-appear straight away on the same ports and interfaces as they were previously.
                                            This is done by filtering the state's list by the IP address of the device. You can then see both UDP states (one on the internal network & one on the wan). Then press the "Kill States" button. This removes the 2 states very briefly, but then they reappear, still on the wrong WAN interface.
                                            They have definitely cleared since the Byte count returns down to 0KB and starts counting again.
                                            Surely clearing the state should have forced it to reconnect and follow the current rule and gateway group to the correct gateway??

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.