Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Gateway drops and never comes back

    Scheduled Pinned Locked Moved Routing and Multi WAN
    42 Posts 8 Posters 10.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      scottmsilver
      last edited by

      Hi there -

      I'm running in to a new problem with 2.5.x. I'm noticing that one of my gateways (Comcast Internet, DHCP acquired address) seems to disappear momentarily from dpinger's perspective. My Gateway / Gateway Group configuration isn't super interesting, but I could provide it if interested. The Internet connections does come back, but pfSense keeps it down thinking there is packet loss. It's as if it forgot to check. I checked and dpinger for that interface was still running. My other gateway is a manually configured line and doesn't have these issues.

      Is the Gateway logs I can see (at failure time a few entries like this, and then no more)

      WAN2ONOPT1_DHCP 1.1.1.1: sendto error: 50
      

      However, if I login to pfSense and run dpinger manually, I can verify that pfSense can send packets through this network successfully

      e.g. the following succeeds

      dpinger -f -B 10.1.10.229 1.1.1.1

      Finally, if I force down the Gateway and then bring it back up, that seems to bring things back up to normal.

      Would appreciate any help. Happy to provide more configuration details.

      Thanks,
      Scott

      1 Reply Last reply Reply Quote 0
      • S
        SypsG
        last edited by

        I have the same problem with my customer but with a different version of pfSense, and our Comcast is not on DHCP. We have a monitor up to let us know when Comcast has issues, so when it stops having issues we can manually down/up the gateway as you are doing.

        Is this a very active community? The first few posts I've looked at don't have any responses.

        Mainly commenting to boost your posts visibility,
        George

        S 1 Reply Last reply Reply Quote 0
        • S
          scottmsilver @SypsG
          last edited by

          @sypsg I did eventually, I think, what seems to be a workaround. But it hasn't been long enough to know for sure. I think the idea is that some networks don't like certain sized ping (ICMP) payloads.

          It came from this kind of bug.

          I changed mine from non-zero and it hasn't dropped in about 2 weeks.

          S 1 Reply Last reply Reply Quote 0
          • S
            scottmsilver @scottmsilver
            last edited by

            This post is deleted!
            S 1 Reply Last reply Reply Quote 0
            • S
              scottmsilver @scottmsilver
              last edited by

              @scottmsilver A more correct version.

              @scottmsilver @SypsG Ok, I think I worked this out. My theories were wrong.

              Basically what is going on is pfSense is forgetting to reset the gateway monitor when the Comcast interface comes back up since it comes back up on the same IP address it was on before.

              Here are the details:

              • The Comcast interface goes away, so pfSense loses one of its WANs.
              • When Comcast comes back pfSense requests a new IP via DHCP.
              • Subsequently there is code that is suppose to run when a WAN interfaces gets a new IP.
              • This code is guarded by roughly "isSameAddress()" and since Comcast issues the same address, pfSense does not run this code.
              • This code, in particular, resets the gateway monitor. Since pfSense does not reset it, the old instance of the gateway monitor (dpinger) will continue to run. However, it can never send out any new ICMP/ping messages because the socket refers to a dead interface and not the new one so no pings come back.
              • Thus dpinger never thinks the interface comes back.
              • So why does running dpinger from the command line work, even when the gateway monitor instance doesn't? When run dpinger from the command it it gets a working socket for the new interface.
              • The "quick but wrong" fix is to make this code on line 204 always run. See that I OR'd in 1 into the conditional below.
              if (/*added*/ 1 || !is_ipaddr($oldip) || ($curwanip != $oldip) ||
                  (!is_ipaddrv4($config['interfaces'][$interface]['ipaddr']) && ($config['interfaces'][$interface]['ipaddr'] != 'dhcp'))) {
              	/*
              	 * Some services (e.g. dyndns, see ticket #4066) depend on
              	 * filter_configure() to be called before, otherwise pass out
              	 * route-to rules have the old ip set in 'from' and connections
              	 * do not go through the correct link
              	 */
              	filter_configure_sync();
              
              	/* reconfigure our gateway monitor, dpinger results need to be 
              	 * available when configuring the default gateway */
              	setup_gateways_monitor();
              
              S ? 2 Replies Last reply Reply Quote 0
              • S
                SteveITS Galactic Empire @scottmsilver
                last edited by

                @scottmsilver If you can reproduce you can enter a bug report at redmine.pfsense.org.

                A while ago we had trouble with a client with multi-WAN which wouldn't fail back, and we had to call the gateway page (eventually, via cron but I think we could manually go to the System/Gateways page) and it would realize it was up again. That was resolved I want to say about a year ago?

                An alternate workaround would be to disable the gateway monitoring which assumes the connection is always up. (checkbox when editing a gateway)

                Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                Upvote ๐Ÿ‘ helpful posts!

                S 1 Reply Last reply Reply Quote 0
                • S
                  scottmsilver @SteveITS
                  last edited by

                  @steveits I think you were likely running into this bug I think I found. I agree your workarounds are reasonable (though not excited about turning off gateway monitoring...:-)). The bug fix I suggested does fix this problem, I think at the root cause. There may be other problems with my solution though, so I'll figure out how to file a bug.

                  S 1 Reply Last reply Reply Quote 0
                  • S
                    SteveITS Galactic Empire @scottmsilver
                    last edited by SteveITS

                    @scottmsilver said in Gateway drops and never comes back:

                    not excited about turning off gateway monitoring

                    Yeah it's not ideal but if there's only one WAN it kinda doesn't matter. It's not a great a workaround if multiple WAN. See if opening the Gateway page lets pfSense rediscover the gateway is up. IIRC I didn't even have to edit anything just view the page. (which is why we ended up

                    On many DHCP connections the IP isn't going to change for short disconnections so it sounds like the logic is faulty.

                    I see you found the existing redmine I just found. I tried finding my old forum topic but couldn't in a quick search.

                    Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                    When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                    Upvote ๐Ÿ‘ helpful posts!

                    S 1 Reply Last reply Reply Quote 0
                    • S
                      scottmsilver @SteveITS
                      last edited by

                      @steveits Thanks. I also found the bug that they were trying to fix that created this (https://redmine.pfsense.org/issues/11142?tab=history)

                      1 Reply Last reply Reply Quote 0
                      • S SteveITS referenced this topic on
                      • S SteveITS referenced this topic on
                      • ?
                        A Former User @scottmsilver
                        last edited by

                        I might also running into this problem.

                        I have 4 WANS, with one dynamic IP and the other 3 of them having a fixed IP. Once in a while I notice that pfsense thinks one of them is down ("Offline, Packetloss"). It is not; I also monitor them from the outside with Uptime Robot, so I do know when they have been up/down in case I need to do something about it.

                        So I go to System -> Routing -> Gateways, edit the gateway, remove the monitor IP, save changes, and it comes back up. Edit the gateway again, configure the same monitor IP it had, and now it will stay up.

                        I would say that this does not happen to the WAN that has dynamic IP, but I am not so sure. I will keep an eye of this.

                        Next time I will go the disable/enable gateway route to see if it also works.

                        M 1 Reply Last reply Reply Quote 0
                        • M
                          MindTwist @A Former User
                          last edited by

                          That was me above ^^

                          1 Reply Last reply Reply Quote 0
                          • M
                            MindTwist
                            last edited by

                            So I had this happen to me again tonight. On two different pfSenses, both of them with 2 WANs, were the second WAN has fixed IP and is on the same ISP. They both went down tonight at the same time, and they both came back 13 minutes later. But on pfSense they remained offline.

                            temp1.jpg
                            temp2.jpg

                            I tried to disable/enable the gateway as @scottmsilver did above, but I was unable to do so, since they are part of a gateway group.

                            So I did as usual, remove the monitor IP, so it will use my own routr as monitor IP. A few seconds later the gateway is back up. And reconfigure again the same monitor IP I had.

                            temp3.jpg
                            temp4.jpg

                            1 Reply Last reply Reply Quote 0
                            • R
                              rune-san
                              last edited by

                              I am having a similar (same?) issue on 22.01 as well as 21.05.1 and 21.05.2 on a SG-2100. I have two gateways, a bog standard configuration composed of a DHCP WAN interface gateway, as well as an OVPN Gateway on a virtual interface. The WAN interface represents a remote cellular connection, and as you might expect, it isn't that stable. I have a gateway monitor applied pinging Cloudflare DNS Servers, and this works until the first time the Gateway goes down. At that point, the Gateway sticks in "Pending, Gathering Data" in the Gateway Group. Just as @scottmsilver points out, in the logs they'll be entries showing sendto errors for a couple of tries, and then nothing more. The Gateway is forever in pending.

                              For me, the fix is simpler. If I go to System -> Routing -> Edit the WAN_DHCP Gateway, then simply scroll to the bottom and click Save without changing anything on the page, and finally Apply Changes, the Gateway immediately comes Online. In the logs, dpinger immediately logs the configuration of the monitor:

                              send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 1.1.1.1 bind_addr <ipaddress> identifier "WAN_DHCP "

                              From there, the Gateway will stay up again until the cellular link is lost, then it will go back to pending. It just seems like dpinger gives up monitoring the new WAN interface every time DHCP applies a new lease on link restoration.

                              P 1 Reply Last reply Reply Quote 0
                              • P
                                pete35 @rune-san
                                last edited by

                                I got the same issue here. instable celluar connection, gateway goes to pending, saving the gateway again makes it work again. Now i changed it from DHCP to static , will see if this will be better...

                                <a href="https://carsonlam.ca">bintang88</a>
                                <a href="https://carsonlam.ca">slot88</a>

                                S 1 Reply Last reply Reply Quote 0
                                • S
                                  scottmsilver @pete35
                                  last edited by scottmsilver

                                  @pete35 @rune-san I am sorry you are having these experiences. I just want to point you up thread to a fix I posted that you can make. It's not an elegant one, but for most people it will probably fix things. See "quick but wrong" fix above.

                                  R 1 Reply Last reply Reply Quote 0
                                  • R
                                    rune-san @scottmsilver
                                    last edited by

                                    @scottmsilver thanks for that. I had seen it while making my post and I can confirm it does the job on my side too. Was searching to see if you had opened a bug on redmine for this (I couldn't find one). If you had not, I was going to so that there's at least a chance this can get fixed in a future revision.

                                    S 1 Reply Last reply Reply Quote 0
                                    • S
                                      scottmsilver @rune-san
                                      last edited by

                                      @rune-san Yeah. I did and they are fixing it. It looks like they have it targeted now for 2.7.0.

                                      P 1 Reply Last reply Reply Quote 1
                                      • P
                                        pete35 @scottmsilver
                                        last edited by

                                        @jimp
                                        I got some systems with multiple WAN on unstable celluar connections, with GW groups.

                                        Is there a chance to get this fix or changeset

                                        https://redmine.pfsense.org/projects/pfsense/repository/1/revisions/ec73bb89489d830ec21c4e04ffa3ec401791b55d/diff

                                        for 2.7 as a patch for 2.6?

                                        <a href="https://carsonlam.ca">bintang88</a>
                                        <a href="https://carsonlam.ca">slot88</a>

                                        S 1 Reply Last reply Reply Quote 0
                                        • S
                                          SteveITS Galactic Empire @pete35
                                          last edited by

                                          @pete35 In System Patches, Add New Patch and use the ID on that diff page (ec73bb89489d830ec21c4e04ffa3ec401791b55d). The patches just apply the diff to the files on disk.

                                          Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                          When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                          Upvote ๐Ÿ‘ helpful posts!

                                          S P 2 Replies Last reply Reply Quote 0
                                          • S
                                            scottmsilver @SteveITS
                                            last edited by

                                            @steveits That's pretty cool. I didn't know about that!

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.