Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN

    Routing and Multi WAN
    5
    87
    7.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      preston
      last edited by

      One thing I haven't tried yet is a factory reset and starting all over without restoring from my backup.

      I did do a reset and restore from backup, but now I wonder if the restore backup just transferred the bad settings back. A no-op.

      J 1 Reply Last reply Reply Quote 0
      • J
        jimeez @preston
        last edited by

        @preston

        This is exactly what I was about to do when I saw your first reply to my post. Literally I was about to do a fresh install on an old PC I have lying (laying?) around. Had just installed a dual NIC card and was about to start the process of deploying a "fresh" device. No restore from backup. No other packages. Just a clean start, dual wan fail-over setup.

        But than I saw your reply. And I thought, "What are the odds that two of us have the exact same problem at roughly the exact same time?" It's very unlikely. Something changed somewhere. Either at CenturyLink's end or StarLink's end. Or elsewhere. But I'm 99% positive it can't be our equipment or the configuration.

        I have this other PC ready to go. Maybe I'll give it a shot some night this week just for shits and giggles.

        1 Reply Last reply Reply Quote 0
        • chpalmerC
          chpalmer @jimeez
          last edited by

          @jimeez

          If you do a traceroute to something outside what is the first address that answers?

          I would seriously consider another address to monitor than 8.8.8.8 on that gateway.

          I have had issues using that address in the past.

          Triggering snowflakes one by one..
          Intel(R) Core(TM) i5-4590T CPU @ 2.00GHz on an M400 WG box.

          J 1 Reply Last reply Reply Quote 1
          • J
            jimeez @chpalmer
            last edited by

            @chpalmer

            Thank you for the reply. I have tried a few different monitoring addresses. Doesn't seem to make a difference. And the problem ONLY exists when both interfaces are active. I can have the routing set to only one of the gateways...no fail-over....and it still cycles a down member due to packet loss. As soon as I disable one of the two interfaces everything works fine again.

              1    <1 ms    <1 ms    <1 ms  fw1.xxxxxx.localdomain [192.168.1.1]
              2    19 ms    20 ms    19 ms  100.64.0.1
              3    16 ms    20 ms    23 ms  172.16.252.90
              4    19 ms    19 ms    23 ms  undefined.hostname.localhost [206.224.65.136]
              5    17 ms    27 ms    16 ms  undefined.hostname.localhost [206.224.64.173]
              6    60 ms    47 ms    20 ms  140.248.126.222
              7    17 ms    21 ms    20 ms  151.101.67.5
            
            
            P 1 Reply Last reply Reply Quote 0
            • P
              preston @jimeez
              last edited by

              @jimeez said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

              @chpalmer

              Thank you for the reply. I have tried a few different monitoring addresses. Doesn't seem to make a difference. And the problem ONLY exists when both interfaces are active. I can have the routing set to only one of the gateways...no fail-over....and it still cycles a down member due to packet loss. As soon as I disable one of the two interfaces everything works fine again.

              My issue is exactly the same as jimeez. I too have tried different monitoring addresses wiht no change. Simply enabling the Century Link interface causes 100% packet loss after 15 minutes on the Starlink WAN. After a minute or so, the Starlink WAN will return online, but then fail with 100% packet loss every 15 minutes.

              1 Reply Last reply Reply Quote 0
              • chpalmerC
                chpalmer
                last edited by chpalmer

                But look at the ultimate latency of both links. One is satellite which by default will have a higher latency.. and the other is DSL which with interleaving will generally have 38ms or so.. (educated guess)

                If you do not have the second faster (latency) interface then the system will simply stay on the only gateway is see's.

                If (and I have not had the opportunity yet to play with Starlink although at work we will be soon..) your Starlink interface see's a change in latency that is drastic enough then I can see your system trying to switch to a more stable link..

                Try this. From a command prompt.. c> Ping -t 8.8.8.8 and let that run for an hour or so. Watch the latency there and see if it changes much. If it does not then I am probably barking up the wrong tree.. But my SWAG says you will probably see some latency swings. Of coarse take your second link down and only allow it on the Starlink.

                Triggering snowflakes one by one..
                Intel(R) Core(TM) i5-4590T CPU @ 2.00GHz on an M400 WG box.

                P 1 Reply Last reply Reply Quote 0
                • P
                  preston @chpalmer
                  last edited by preston

                  @chpalmer

                  Thanks for the reply.

                  Here are stats from my Starlink for the last 24 hours. The Starlink app statistics also match the pfSense stats.

                  Clipboard01.jpg

                  chpalmerC 1 Reply Last reply Reply Quote 0
                  • P
                    preston
                    last edited by preston

                    I don't want to get my hopes up, but it's been 62 minutes and I have not lost the Starlink WAN. Here is what I did today:

                    1. Deleted the Centurylink Gateway and Centurylink interface.

                    2. Assigned the Centurylink interface and gateway.

                    3. Power cycled the Centurylink modem.

                    4. Disabled the kea-dhcp6 service

                    5. Under System/Routing/Gateways: Changed default IPv6 to NONE.

                    I haven't added any Gateway groups and failover settings yet, but so far the Starlink WAN is staying up. For now (testing) I have "Block private networks and loopback addresses" and "Block bogon networks" both checked. I also haven't set up a monitor IP or DNS server for Centurylink (one thing at a time).

                    .
                    .

                    I really think this alone might have done the trick:

                    IPV6.jpg

                    J 1 Reply Last reply Reply Quote 0
                    • J
                      jimeez @preston
                      last edited by jimeez

                      @preston

                      Oh wow. No kidding? That would be amazing if this solved things. If it does, I wonder what that means in terms of why this started happening. Something with CenturyLink perhaps?

                      Also, general question regarding the IP address of your CenturyLink WAN. I've seen this in a lot of the hot-to videos I watch. Why is the IP address of the CL WAN 192.168.0.1 rather than a CL-assigned IP address?

                      1 Reply Last reply Reply Quote 0
                      • P
                        preston
                        last edited by preston

                        That screenshot above is showing the 192.168.0.1 as a monitor IP. I was trying to make as few changes as possible to see what would break it so I did not have a monitor IP or DNS server set.

                        I have now added a DNS server and monitor address of 8.8.8.8 to the CL connection and it is now showing the CL IP address on the dashboard correctly. After doing so, I had to disable and re-enable the CL interface to get to pull a proper IP.

                        !!! After adding the DNS server to the CL connection I lost Starlink at the 15 minute mark! D@mn! !!!

                        Maybe I'm getting closer to the answer since Starlink stayed online for several hours and only went down when I made DNS changes to the CL connection.

                        J 2 Replies Last reply Reply Quote 1
                        • chpalmerC
                          chpalmer @preston
                          last edited by

                          @preston said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

                          @chpalmer

                          Thanks for the reply.

                          Here are stats from my Starlink for the last 24 hours. The Starlink app statistics also match the pfSense stats.

                          Clipboard01.jpg

                          Actually I appreciate you posting those numbers.. It will help me with my day job when we get our setup for a remote site we have.. ;)

                          Triggering snowflakes one by one..
                          Intel(R) Core(TM) i5-4590T CPU @ 2.00GHz on an M400 WG box.

                          1 Reply Last reply Reply Quote 0
                          • J
                            jimeez @preston
                            last edited by

                            @preston

                            Curious how you're making out over these last 24 hours. Planning to tackle this later this afternoon. Was hoping to see that you've maintained solid connections before embarking on a fresh config. ;-)

                            P 1 Reply Last reply Reply Quote 0
                            • P
                              preston @jimeez
                              last edited by

                              @jimeez

                              No luck. I thought I had it, but adding the DNS server to the CL connection, it broke the Starlink connection.

                              How are you running your DNS servers for the dual wan? I am wondering if that is somehow causing Starlink to drop offline.

                              J 1 Reply Last reply Reply Quote 0
                              • J
                                jimeez @preston
                                last edited by jimeez

                                @preston said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

                                @jimeez

                                How are you running your DNS servers for the dual wan? I am wondering if that is somehow causing Starlink to drop offline.

                                I wish I knew how to answer that, but sadly I don't. I followed a guide a year and a half ago and it's been working ever since....until recently. I don't recall doing anything specific directly related to DNS. I do recall though thinking how simple it was to set up.

                                One question for you: are you running your CL modem in transparent bridge mode?

                                P 1 Reply Last reply Reply Quote 0
                                • P
                                  preston @jimeez
                                  last edited by

                                  @jimeez

                                  I am running the CL modem in transparent bridge mode. My modem is the Zyxel C1100Z.

                                  I am using DNS forwarder. I have tried different combinations of 1.1.1.1 for Starlink and 8.8.8.8 for Centurylink. I have also tried using the DNS servers supplied by Starlink and Centurylink. I may be barking up the wrong tree with the DNS thing, but I'm at my wits end.

                                  I too remember how easy and painless it was to set up the dual wans and like you it ran fine for a long time.

                                  J 1 Reply Last reply Reply Quote 1
                                  • J
                                    jimeez @preston
                                    last edited by

                                    @preston

                                    Ok. Thanks. Yep, my setup is identical.

                                    1 Reply Last reply Reply Quote 0
                                    • J
                                      jimeez @preston
                                      last edited by jimeez

                                      @preston said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

                                      !!! After adding the DNS server to the CL connection I lost Starlink at the 15 minute mark! D@mn! !!!

                                      So I think you're on to something here with the 15 minute thing. I never really paid attention to the time intervals before but made sure to time it tonight. It's 15 minutes on the nose! Literally.

                                      What in the world could cause a dual interface setup to kill one of them due to measured/perceived packet loss every 15 minutes AND kill the NUT service? Very weird.

                                      1 Reply Last reply Reply Quote 1
                                      • P
                                        preston
                                        last edited by preston

                                        If anyone has any ideas, I am still working this problem.

                                        Here are my DHCP log entries from about the time I enabled the Centurylink WAN 2 (ix2) interface 11:55 to to the time that Starlink WAN1 goes offline with 100% packet loss 15 minutes later. I hope there are some 'log whisperers' out there that can help. Am I barking up the wrong tree thinking it's a DHCP issue?

                                        The correlation I see here is that at 11:55:30 dhc client binds to the Centurylink IP with a 900 second renewal. Exactly 900 seconds later, Starlink WAN1 goes offline with 100% packet loss. It takes Starlink WAN1 about 1-2 minutes to come back online and then the 15 minute cycle repeats.

                                        Thank-you.

                                        DHCP Log 2.txt

                                        J GertjanG 2 Replies Last reply Reply Quote 1
                                        • J
                                          jimeez @preston
                                          last edited by

                                          @preston

                                          I haven't given up yet. While I have had zero success getting it to work on pfSense, I figured I'd give OPNsense a try next. Planning to work on it this coming weekend. Will report back with my findings.

                                          Surely we can be the only two having this issue.

                                          P 1 Reply Last reply Reply Quote 2
                                          • P
                                            preston @jimeez
                                            last edited by

                                            @jimeez

                                            Agreed. Two people with working dual WANs that suddenly stops working.

                                            Some kind of change happened with Centurylink, Starlink ,or pfSense.

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.