• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Multi-WAN gateway failover not switching back to tier 1 gw after back online

Routing and Multi WAN
35
119
53.0k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Y
    yanakis
    last edited by Sep 7, 2015, 8:49 PM

    @Derelict:

    Depends on whether or not you want states killed on a gateway failure.

    Well, isn't better to have them reset on a gw failure? The definition is a bit tricky for this option

    1 Reply Last reply Reply Quote 0
    • L
      luckman212 LAYER 8
      last edited by Sep 7, 2015, 8:58 PM

      I only skimmed through this thread so I apologize if this was already suggested but – are you certain your clients are set to use the pfSense IP as their DNS resolver?  If e.g. you have a gateway defined with a custom monitor IP of 8.8.8.8 or the DNS servers on your General settings page are locked to a specific gateway, then static routes are built which will force traffic out that specific gateway, even if it's down.  So this could result in DNS being "dead" when one of the gateways goes down.  Is this possibly what's happening?

      1 Reply Last reply Reply Quote 0
      • Y
        yanakis
        last edited by Sep 7, 2015, 9:12 PM

        @luckman212:

        I only skimmed through this thread so I apologize if this was already suggested but – are you certain your clients are set to use the pfSense IP as their DNS resolver?  If e.g. you have a gateway defined with a custom monitor IP of 8.8.8.8 or the DNS servers on your General settings page are locked to a specific gateway, then static routes are built which will force traffic out that specific gateway, even if it's down.  So this could result in DNS being "dead" when one of the gateways goes down.  Is this possibly what's happening?

        Monitor IPs are currently set to one of each ISP, in General I have a pair of DNSes set for each gateway (four servers in total). Clients DNS is manullay set  192.168.1.1 (pfsense)

        General.PNG
        General.PNG_thumb
        gateways.PNG
        gateways.PNG_thumb

        1 Reply Last reply Reply Quote 0
        • D
          Derelict LAYER 8 Netgate
          last edited by Sep 7, 2015, 9:13 PM

          I tried both the resolver and the forwarder, some sites are just not resolved.

          If you do not know how to get more information than that about what's actually happening, you are probably in over your head.

          Chattanooga, Tennessee, USA
          A comprehensive network diagram is worth 10,000 words and 15 conference calls.
          DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
          Do Not Chat For Help! NO_WAN_EGRESS(TM)

          1 Reply Last reply Reply Quote 0
          • Y
            yanakis
            last edited by Sep 7, 2015, 9:20 PM

            @Derelict:

            I tried both the resolver and the forwarder, some sites are just not resolved.

            If you do not know how to get more information than that about what's actually happening, you are probably in over your head.

            Oh, nice. What can I say?Thanks? :)…..Thanks.

            1 Reply Last reply Reply Quote 0
            • A
              arcanos
              last edited by Sep 8, 2015, 11:50 AM

              Hi

              yanakis, in my case we have the fiber media converter and the router (not PPPoE), and happens the same. I switched off or disconnected the router, but never tried switching off media converter (good idea).

              In my last installations I don't usually use DNS forwarder/resolver for localhost, but in this case I do (I configured it in the past and never change it). Have you tried deactivating that option in General Settings? Just to see if something changes.

              I understand luckman212 concerns about DNS and static routes created by pfsense for each DNS associated to a wan, but in my case we had two different DNS configured and working, and failed. And in any case, once wan is recovered again, DNS works again and everything should work again.

              By the way, I tried with "State Killing on Gateway Failure" on and off, and recover fails in both cases. I keep it unchecked, because with external sip connections is mandatory to make failover work (at least in my case). And I personally prefer to reset states if a gateway fails, to avoid problems.

              Regards

              PD: I don't think you are in over your head… Thanks for all

              1 Reply Last reply Reply Quote 0
              • Y
                yanakis
                last edited by Sep 12, 2015, 4:04 PM

                @arcanos:

                Hi

                yanakis, in my case we have the fiber media converter and the router (not PPPoE), and happens the same. I switched off or disconnected the router, but never tried switching off media converter (good idea).

                In my last installations I don't usually use DNS forwarder/resolver for localhost, but in this case I do (I configured it in the past and never change it). Have you tried deactivating that option in General Settings? Just to see if something changes.

                I understand luckman212 concerns about DNS and static routes created by pfsense for each DNS associated to a wan, but in my case we had two different DNS configured and working, and failed. And in any case, once wan is recovered again, DNS works again and everything should work again.

                By the way, I tried with "State Killing on Gateway Failure" on and off, and recover fails in both cases. I keep it unchecked, because with external sip connections is mandatory to make failover work (at least in my case). And I personally prefer to reset states if a gateway fails, to avoid problems.

                Regards

                PD: I don't think you are in over your head… Thanks for all

                Well, I left empty the DNS fields in General but failback to WAN still not working after WAN recovery unless I change something in Firewall or Routing and apply changes  :(

                1 Reply Last reply Reply Quote 0
                • Y
                  yanakis
                  last edited by Sep 12, 2015, 5:03 PM

                  @jahonix:

                  @arcanos:

                  …and this looks like a pfsense problem...

                  I cannot second that!
                  I have this working for quite some time now with WAN1 (100Mb cable) and a rather old WAN2 (6Mb DSL).
                  I have failover to W2 if W1 is down and immediately W1 again when available.

                  Show us your System | Routing | Gateway Groups page.

                  Hi Cris. Can you please post your setup? Thanks

                  1 Reply Last reply Reply Quote 0
                  • J
                    jahonix
                    last edited by Sep 13, 2015, 8:59 PM

                    Well, Derelict wrote that my config seems to be a bit more complicated than necessary.
                    Since I trust him I usually would test his suggestion first and post afterwards. I just don't have the time for that in the foreseeable future…

                    I only use DSL as failover (it's 6Mbit) and rely on cable which is 100Mbit.
                    You will know why if you have teen kids...
                    Just checked and failover to DSL is working as well as fallback to cable when available again.

                    I set this up about a year ago and used the pfsense docs for that.

                    System-Gateways.png
                    System-Gateways.png_thumb
                    System-Gateway_groups.png
                    System-Gateway_groups.png_thumb
                    System-Gateways-Edit_gw_group.png
                    System-Gateways-Edit_gw_group.png_thumb

                    1 Reply Last reply Reply Quote 0
                    • D
                      Derelict LAYER 8 Netgate
                      last edited by Sep 13, 2015, 9:10 PM

                      I have mine set up exactly like you do (A group with Tier 1 Cable, Tier 2 DSL and a group with Tier 1 DSL, Tier 2 Cable).

                      I was just commenting it's only necessary if you want to have rules that prefer the other circuit while maintaining the ability for those rules to fail over too.

                      I tested my failover yesterday since I was putting new splitters on my cable in anticipation of MoCA 2.0.  It all worked exactly as configured and when I was done it brought my Tier 1 back online just as it has many times before.

                      Chattanooga, Tennessee, USA
                      A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                      DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                      Do Not Chat For Help! NO_WAN_EGRESS(TM)

                      1 Reply Last reply Reply Quote 0
                      • J
                        jahonix
                        last edited by Sep 13, 2015, 9:34 PM

                        i didn't criticize! I only mentioned that I might be done with half the work.  ;)

                        1 Reply Last reply Reply Quote 0
                        • A
                          arcanos
                          last edited by Sep 16, 2015, 11:09 AM

                          Hi again

                          Past week we've installed a new machine with 2.2.4 and two WAN with failover, and same problem. In this case we have to different LANs, and each one has one failover group with different order (one with wan1->wan2 and the other with wan2->wan1), and none of them redirect traffic to the main one when it's recovered (we have to do some change and Save, as yanakis says).

                          It's a new installation without anything strange. We run several tests in both directions, and I can confirm the problem exists. Never went back automatically to the recovered main wan.

                          We didn't find nothing new or more clues, it just doesn't work.

                          Regards

                          1 Reply Last reply Reply Quote 0
                          • superweaselS
                            superweasel
                            last edited by Sep 16, 2015, 6:17 PM

                            +1 with arcanos.

                            In fact, instead of attempting to troubleshoot this and failing, it would be better if someone that has this working, would post a complete series of screenshots showing their setup. Then we can all learn from a working environment.

                            pfSense rig: pfSense SG-4860/120GB SSD
                            WAN: CenturyLink Gigabit Fiber

                            1 Reply Last reply Reply Quote 0
                            • D
                              Derelict LAYER 8 Netgate
                              last edited by Sep 16, 2015, 7:04 PM

                              Is PPPoE a common factor for those that don't work?  Both my WANs are DHCP.

                              There's really nothing to it.  Create a gateway group with a Tier 1 and Tier 2 with member down as the trigger level and policy route to it.

                              Chattanooga, Tennessee, USA
                              A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                              DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                              Do Not Chat For Help! NO_WAN_EGRESS(TM)

                              1 Reply Last reply Reply Quote 0
                              • superweaselS
                                superweasel
                                last edited by Sep 16, 2015, 7:30 PM

                                Correct, PPPoE is the default and DHCP is the failover.

                                pfSense rig: pfSense SG-4860/120GB SSD
                                WAN: CenturyLink Gigabit Fiber

                                1 Reply Last reply Reply Quote 0
                                • A
                                  arcanos
                                  last edited by Sep 17, 2015, 8:28 AM

                                  Not PPPoE in my case. This last case are two cable connections with routers with NAT and DMZ pointing to the wan interface of pfsense. But I've seen the problem with DSL and cable in bridge mode.

                                  1 Reply Last reply Reply Quote 0
                                  • E
                                    Enrica_CH
                                    last edited by Sep 26, 2015, 2:45 PM

                                    On the first view I have the same issue but looking deeper I can see that my Gateway keeps really offline until I reapply the interface config page or reboot.

                                    Short decription of config and behavior:

                                    I have two gateways (1. fiber / 2. cable modem) with a routing group for balancing (tier 1 / tier 1). Both gateways are monitored against external DNS servers. The routing group is defined as gateway in FW rule. "Use sticky connection" on System-advanced-misc. is on.

                                    At the beginning after reboot all works fine an traffic is distributed to both gateways with weight 1:4.

                                    But after some minutes / hours always second gateway goes offline (100% package loss) and keeps this status until I reapply the interface config or reboot. It's not an apinger problem. The gateway is really broken. A ping from Diagnostic - ping with source of gateway doesn't work (100% loss). The cable modem isn't disconnect and it works if a plugin a notebook there. So Pfsense stops the gateway really and keep it broken. Even if I disconnect the lan wire and reconnect no reaction.

                                    Same happens on my backup Pfsense which is running in CARP mode. There is no traffic load but GW stops also.

                                    If I set routing group in redundant mode (GW 1 tier 1 / GW 2 tier 2 OR GW 2 tier 1 / GW 1 tier 2) then all work OK. The gateways keep online. Also after reconnection of wire the interface comes online again.

                                    My estimation is that there must be something wrong with balancing gateways. But I need the capacity of both gateways.

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      cheonne
                                      last edited by Oct 7, 2015, 8:36 AM

                                      you should put a working monitor ip for each interfaces like dns ip

                                      1 Reply Last reply Reply Quote 0
                                      • M
                                        MrD
                                        last edited by Jun 16, 2016, 9:18 PM

                                        Hello,

                                        I'm facing the same problem. I've read those (with no solution)

                                        • https://forum.pfsense.org/index.php?topic=111143.0
                                        • https://redmine.pfsense.org/issues/5090

                                        I'm runing 2.3.1 wit 2 wans (1 cable/main and 1dsl-pppoe/secondary), 2 groups. Failover is working (trigger ok) but not switching back after weak connection is back at 100%.

                                        Ready to send screenshots. Ask

                                        logs:

                                        Jun 16 13:49:20 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 9274us stddev 5829us loss 21%
                                        Jun 16 13:49:42 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7586us stddev 4056us loss 15%
                                        Jun 16 13:53:58 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7362us stddev 4941us loss 21%
                                        Jun 16 13:54:15 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6543us stddev 3719us loss 19%
                                        Jun 16 13:54:39 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 6692us stddev 3840us loss 21%
                                        Jun 16 13:54:57 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6644us stddev 3338us loss 15%
                                        Jun 16 13:56:03 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8839us stddev 5402us loss 21%
                                        Jun 16 13:56:19 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8292us stddev 4864us loss 19%
                                        Jun 16 13:56:43 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8431us stddev 5556us loss 22%
                                        Jun 16 13:57:02 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7940us stddev 5158us loss 15%
                                        Jun 16 13:58:35 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 12630us stddev 12111us loss 21%
                                        Jun 16 13:58:53 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8282us stddev 4592us loss 15%
                                        Jun 16 13:59:21 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8983us stddev 5856us loss 21%
                                        Jun 16 13:59:32 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8447us stddev 5473us loss 16%
                                        Jun 16 13:59:58 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8206us stddev 5630us loss 21%
                                        Jun 16 14:00:11 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7373us stddev 4132us loss 14%
                                        Jun 16 14:01:14 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8049us stddev 4691us loss 21%
                                        Jun 16 14:01:44 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7842us stddev 3865us loss 18%
                                        Jun 16 14:01:47 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7944us stddev 3892us loss 21%
                                        Jun 16 14:02:18 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7717us stddev 3673us loss 12%
                                        Jun 16 14:03:51 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7952us stddev 4608us loss 21%
                                        Jun 16 14:04:16 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7030us stddev 3415us loss 12%
                                        Jun 16 14:04:28 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7711us stddev 4555us loss 21%
                                        Jun 16 14:04:56 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7538us stddev 4081us loss 14%
                                        Jun 16 14:05:10 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8504us stddev 5216us loss 21%
                                        Jun 16 14:05:32 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8245us stddev 4794us loss 13%
                                        Jun 16 14:05:51 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8544us stddev 5200us loss 21%
                                        Jun 16 14:06:14 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7369us stddev 3934us loss 16%
                                        Jun 16 14:06:26 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7862us stddev 4613us loss 21%
                                        Jun 16 14:06:56 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 7151us stddev 3861us loss 13%
                                        Jun 16 14:11:12 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 7910us stddev 4976us loss 21%
                                        Jun 16 14:11:26 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6648us stddev 3553us loss 15%
                                        Jun 16 14:11:49 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 6910us stddev 4027us loss 21%
                                        Jun 16 14:12:11 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6271us stddev 2901us loss 15%
                                        Jun 16 14:12:28 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 6705us stddev 3698us loss 21%
                                        Jun 16 14:12:52 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 6371us stddev 2763us loss 11%
                                        Jun 16 14:13:45 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Alarm latency 8346us stddev 5486us loss 21%
                                        Jun 16 14:14:57 dpinger WAN2CABLEGW xxx.xxx.xxx.xxx: Clear latency 8065us stddev 5624us loss 16%

                                        dash.jpg
                                        dash.jpg_thumb
                                        gw.jpg
                                        gw.jpg_thumb
                                        GW-grups.png
                                        GW-grups.png_thumb
                                        WAN1.png
                                        WAN1.png_thumb
                                        WAN2.jpg
                                        WAN2.jpg_thumb

                                        1 Reply Last reply Reply Quote 0
                                        • D
                                          Derelict LAYER 8 Netgate
                                          last edited by Jun 17, 2016, 12:07 AM

                                          I would change the monitor IP in the WAN2CABLEGW to 8.8.8.8 or anything else that responds reliably and see if things improve. You can't expect any multi-WAN routing solution to perform with any semblance of continuity with flapping like that.

                                          Chattanooga, Tennessee, USA
                                          A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                          DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                          Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.