Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN

    Scheduled Pinned Locked Moved Routing and Multi WAN
    87 Posts 5 Posters 7.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      knoppolis
      last edited by

      I tried disabling the DSL interface (WAN01 - Tier 2) and wouldn't you know, Starlink interface (WAN02 - Tier 1) starts to work without issue. Re-enable DSL interface (WAN01 - Tier 2) and within an hour I am seeing the same issue with packet loss shooting up to 100% on the Starlink connection

      The DSL connection has a static IP address, but for years now I have just left the interface IPv4 Configuration Type as "DHCP" without issue. As a quick test I switched it over to "Static IPv4" along with it's assigned IP address. Hours now with both the DSL and Starlink interfaces active with no issues. Everything is running like it was a couple weeks back. Will continue to monitor for the rest of the day.

      For now, while I monitor I need to sit here and think about why this appears to be the solution for me, and why it is only a recent problem.

      @jimeez or @preston do either of you have a static IP for your respective DSL connections?

      P J 2 Replies Last reply Reply Quote 1
      • P
        preston @knoppolis
        last edited by

        @knoppolis said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

        @jimeez or @preston do either of you have a static IP for your respective DSL connections?

        My CenturyLink DSL connection is not static. This is a good data point though, thanks. Let us know how it does.

        1 Reply Last reply Reply Quote 1
        • J
          jimeez @knoppolis
          last edited by jimeez

          @knoppolis said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

          @jimeez or @preston do either of you have a static IP for your respective DSL connections?

          Also a no here.

          I am very curious to see if this holds up for you. Although, if it does, my and @preston's issue will be an even bigger mystery.

          1 Reply Last reply Reply Quote 1
          • J
            jimeez
            last edited by jimeez

            So @preston mentioned something to me in a private chat that got my wheels turning. He brought up the fact that, prior to this issue, his StarLink connection would drop out around 4AM most days then come right back up. Mine did this too. Like clockwork. I always thought the reason was that the StarLink unit was receiving an update and restarting or something. But now I'm wondering if that 24 hour cycle is somehow related to this problem. Only now instead of every 24 hours it's happening every 15 minutes.

            I went back and checked my notification logs. This 24 hour drop out was very consistent. Then on August 24th the 15 minute dropout started happening.

            1 Reply Last reply Reply Quote 0
            • J
              jimeez
              last edited by

              So I lobbed a support ticket to StarLink. Referenced this thread. Their response as follows:

              Would you be able to confirm how you currently have your health checks set up for a failover to occur? The typical recommendation we provide our enterprise customers is to relax heath checks (i.e. pings, etc.) to deal with occasional connection drops from Starlink. Checking every 10 seconds & getting 5 fails in a row would be a good threshold to start with.

              Would anyone be able to tell me where to go look in pfsense to find the answer to their question?

              K 1 Reply Last reply Reply Quote 0
              • K
                knoppolis @jimeez
                last edited by

                @jimeez

                Assuming they are talking about System>Routing>Gateways and making edits to the details for your Starlink gateway.

                Here is what I currently have for my Starlink gateway:
                aef8fdc0-f188-4c52-b6db-4713bf6b8fae-image.png

                I have been playing around with the "Packet Loss Thresholds" to keep the failover from happening with a low of 30 and a high of 60. I also played around with the other intervals but it really made no difference.

                I used this reddit post as a starting point/reference for these adjustments.
                https://www.reddit.com/r/PFSENSE/comments/1eg0wpk/starlink_monitoring_in_pfsense/

                1 Reply Last reply Reply Quote 1
                • K
                  knoppolis
                  last edited by

                  Update on the status/health of my set up. Everything has been running fine for the last 9 hours.

                  Here is what the packet loss situation looked like for the last 48 hours. No issues after setting a static IP for my DSL connection.

                  b8e3082c-4d17-4274-aeb8-023aeeae3278-Screenshot 2024-09-12 161353.png

                  J 1 Reply Last reply Reply Quote 0
                  • J
                    jimeez @knoppolis
                    last edited by

                    @knoppolis said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

                    Update on the status/health of my set up. Everything has been running fine for the last 9 hours.

                    Here is what the packet loss situation looked like for the last 48 hours. No issues after setting a static IP for my DSL connection.

                    Interesting. So how does one set a static IP for their DSL connection? Doesn't the provider set that?

                    K 1 Reply Last reply Reply Quote 0
                    • K
                      knoppolis @jimeez
                      last edited by

                      @jimeez my DSL service came with a static IP address, in pfsense go to the interface in question. Need to change the IPv4 Configuration Type to Static IPv4, then below you will have the ability to set the IPv4 Address to said static address.

                      0d4c3306-7f7d-4496-a2ff-694eafe9a1e5-image.png

                      J 1 Reply Last reply Reply Quote 0
                      • J
                        jimeez @knoppolis
                        last edited by jimeez

                        @knoppolis said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

                        @jimeez my DSL service came with a static IP address...

                        I guess that's what I'm asking. You can't just go and assign a static IP to an interface if it's not set up that way with your service provider, can you?

                        K 1 Reply Last reply Reply Quote 0
                        • K
                          knoppolis @jimeez
                          last edited by

                          @jimeez ahh, sorry. So you can't just give yourself whatever IP address you want, but you could try entering whatever IP you have currently been granted from your provider to test if you get the same results? Honestly not sure if that would work, the big thing would be that as soon as the assigned IP changed your DSL connection would go down until you went back to the DHCP setting for the interface in pfsense.

                          1 Reply Last reply Reply Quote 0
                          • J
                            jimeez
                            last edited by jimeez

                            Surprisingly enough, I got a (for now) positive response from StarLink. They are telling me they are going to look into this. Their 1st level support staff asked me some questions which I answered. I then got a reply thanking me for the input and saying that they would dig into it. I was NOT expecting that response. Will see what happens.

                            P 1 Reply Last reply Reply Quote 1
                            • P
                              preston @jimeez
                              last edited by

                              @jimeez

                              That's great news. I hope they get back to you with something.

                              When I contacted them (in the beginning of all of this) they thought it might be my original gen 1 circular dish causing the problems. They sent me a new gen 3 dish and router....but same results.

                              J 1 Reply Last reply Reply Quote 0
                              • J
                                jimeez @preston
                                last edited by

                                @preston

                                Turns out their response was one that was already bounced around here and on Reddit.

                                While I do not have any exact guidance for how to configure this specific router. The probe interval does seem very strict as it is set to check every 500 milliseconds or 0.5seconds compared to the general recommendation of checking every 10 seconds or 10000 milliseconds. The frequent failovers may be improved if you attempt relaxing these health checks to deal with the occasional drops in service due to utilizing a satellite internet service.

                                I suspected this would not work but did it anyway so I could report back to them with factual info. And, unfortunately it did not fix it. Every 15 minutes, like clockwork, to the second, the StarLink interface fails due to high packet loss and eventually is perceived to be offline...even though it is not. After a bit comes back up. Then fails again exactly 15 minutes later. Turn off the second interface and everything works fine. Weirdest, most frustrating thing.

                                Couple more questions for you regarding your config. There has to be something here that will eventually lead to an answer.

                                • Do you use pfBlocker?
                                • You already confirmed that you don't use NUT, but have you noticed any other services that fail when you activate the DSL interface like NUT does for me?
                                  *Assuming you have some port forwarding configured what do you use for the Dest. Address? Individual interfaces? Any? Perhaps something else?

                                The NUT service failing really has me scratching my head and I believe must be a clue to what's going on. Why would that service fail immediately upon activation of the second (DSL) interface. It never used to. Only after August 22nd....

                                P 2 Replies Last reply Reply Quote 0
                                • P
                                  preston @jimeez
                                  last edited by preston

                                  @jimeez

                                  I also tried changing the probe interval. No help.

                                  • I am not running pfBlocker.

                                  • The services that fail (most of the time, but not allways) when the Starlink goes offline are EITHER the kea-dhcp4 or the kea-dhcp6 server. Which is what was taking me down the dhcp rabbit hole.

                                  • The only 'extra' package that I have running is Tailscale.

                                  • I applied the recommended pfSense 24.03 system patches (through the Netgate System Patch package) last week with no help.

                                  1 Reply Last reply Reply Quote 0
                                  • P
                                    preston @jimeez
                                    last edited by

                                    @jimeez

                                    Do you use kea-dhcp, and if so does it fail for you?

                                    J 1 Reply Last reply Reply Quote 0
                                    • J
                                      jimeez @preston
                                      last edited by

                                      @preston said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

                                      Do you use kea-dhcp, and if so does it fail for you?

                                      I do not use it. Had this disabled previously and didn't even know it until someone suggested this as a fix.

                                      Last night I got my hands on a 4-port EdgeRouter. Did some reading last night and think I have enough knowledge to test this thing out in a dual-WAN scenario. Hope to get to it this weekend and see what happens. This should give us some insight into where the problem lies: CenturyLink, StarLink, or pfSense.

                                      P 1 Reply Last reply Reply Quote 1
                                      • P
                                        preston @jimeez
                                        last edited by

                                        @jimeez

                                        I'm leaning toward it being a pfSense issue. Starlink works fine by itself, CenturyLink works fine by itself. Some setting has to be wrong, or some bug has cropped up.

                                        Side note, my Starlink has been very reliable. I don't really need the Centurylink anymore, but I want (and the whole reason I went with Netgate/pfSense) the failover option.

                                        J 1 Reply Last reply Reply Quote 0
                                        • J
                                          jimeez @preston
                                          last edited by

                                          @preston said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

                                          @jimeez

                                          I'm leaning toward it being a pfSense issue. Starlink works fine by itself, CenturyLink works fine by itself. Some setting has to be wrong, or some bug has cropped up.

                                          I do tend to agree, although both of us stated that we made no changes to our hardware or configurations prior to the onset of this issue. I mean I went about a year and a half with no issue using the same config. So who knows.

                                          Side note, my Starlink has been very reliable. I don't really need the Centurylink anymore, but I want (and the whole reason I went with Netgate/pfSense) the failover option.

                                          I am in the exact same boat. With the exception of VERY heavy rain and snow storms StarLink has been rock solid. Which was not the case early on. But SL works great now. I also use the DSL line for a Dynamic DNS client.

                                          Hopefully the EdgeRouter setup gives us more insight. Will reply as soon as I get a chance to test it.

                                          P 1 Reply Last reply Reply Quote 0
                                          • P
                                            preston @jimeez
                                            last edited by

                                            @jimeez said in Dual WAN Fail-over Issue - Tier 1 WAN frequently failing upon activation of the second Tier 2 WAN:

                                            I am in the exact same boat. With the exception of VERY heavy rain and snow storms StarLink has been rock solid. Which was not the case early on. But SL works great now. I also use the DSL line for a Dynamic DNS client.

                                            Wow, the similarities continue to amaze me.

                                            Only the heavy rain seems to take SL down for a minute or so here. I use (or at least I did before all this started) the Centurylink WAN for all the IOT devices around here, a DynDNS, and an OpenVPN server.

                                            I eliminated all of that for this troubleshooting.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.