Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    25.07 - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down

    Scheduled Pinned Locked Moved Routing and Multi WAN
    80 Posts 8 Posters 13.5k Views 8 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dennypageD Offline
      dennypage @luckman212
      last edited by

      @luckman212 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

      People (or IoT crap) often use 8.8.8.8, 8.8.4.4 etc as hardcoded DNS servers and so I don't want to statically route those out of either WAN.

      Yeah, I have a lot of those as well. To address this, and prevent devices from bypassing the host overrides in the DNS resolver, I redirect all external DNS requests on my internal subnets to the firewall using port forwarding:

      Screenshot 2025-08-06 at 18.28.17.png
      Screenshot 2025-08-06 at 18.32.26.png

      luckman212L 1 Reply Last reply Reply Quote 0
      • luckman212L Offline
        luckman212 LAYER 8 @dennypage
        last edited by

        That's a smart trick, but it makes it impossible to use or test any external DNS servers, which is something I need to be able to do for work. It also won't work for DoT/DoH.

        dennypageD 1 Reply Last reply Reply Quote 0
        • dennypageD Offline
          dennypage @luckman212
          last edited by

          @luckman212 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

          That's a smart trick, but it makes it impossible to use or test any external DNS servers, which is something I need to be able to do for work. It also won't work for DoT/DoH.

          The rule above is the device network, which is where the majority of the IoT devices are. The LAN rule looks like this:

          Screenshot 2025-08-07 at 05.18.48.png

          host_admin is an alias list of admin hosts, such as my workstation, that are permitted to make direct DNS enquiries outside the network as needed.

          Yep, can't stop DoT. But not a lot of IoT devices using that yet. 😊

          luckman212L 1 Reply Last reply Reply Quote 0
          • luckman212L Offline
            luckman212 LAYER 8 @dennypage
            last edited by

            @dennypage Ah, indeed - that's a nice way to handle it

            1 Reply Last reply Reply Quote 0
            • stephenw10S Offline
              stephenw10 Netgate Administrator
              last edited by

              Mmm, this still seems painful but I think we need to accept it's not going to change at least in the short term. This must be solvable but the number of interacting pieces here makes it non-trivial!

              1 Reply Last reply Reply Quote 1
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by stephenw10

                Ok, here's a hacky workaround that works for me you might try.

                Add a 3rd dummy gateway that always remains up to provide a default route. Add that to the failover group as some high tier.

                So in my case I added the LAN interface as a gateway on LAN. It's local so always up and doesn't require a static route. It take a few loops to come back up but does end up with the tier 2 gateway as default.

                So:

                [25.07-RELEASE][root@m470-3.stevew.lan]/root: netstat -rn4
                Routing tables
                
                Internet:
                Destination        Gateway            Flags         Netif Expire
                0.0.0.0            172.21.16.1        UGS            igb0
                10.0.5.1           link#14            UHS             lo0
                10.0.5.128         link#20            UH           pppoe0
                127.0.0.1          link#14            UH              lo0
                172.21.16.0/24     link#5             U              igb0
                172.21.16.1        link#5             UHS            igb0
                172.21.16.182      link#14            UHS             lo0
                192.168.182.0/24   link#6             U              igb1
                192.168.182.1      link#14            UHS             lo0
                

                Before failover:

                [25.07-RELEASE][root@m470-3.stevew.lan]/root: pfSsh.php playback gatewaystatus
                Name             Monitor        Source             Delay   StdDev  Loss  Status  Substatus
                LAN_GW           192.168.182.1  192.168.182.1    0.059ms   0.02ms  0.0%  online       none
                PPPOE_WAN_PPPOE  1.1.1.1        10.0.5.1         5.694ms  0.199ms  0.0%  online       none
                WAN_DHCP         1.0.0.1        172.21.16.182    6.011ms   0.15ms  0.0%  online       none
                

                Immediately after disconnecting igb0, the DHCP WAN:

                [25.07-RELEASE][root@m470-3.stevew.lan]/root: pfSsh.php playback gatewaystatus
                Name             Monitor  Source      Delay  StdDev  Loss  Status  Substatus
                PPPOE_WAN_PPPOE  1.1.1.1  10.0.5.1      0ms     0ms  100%    down   highloss
                

                After a few restart loops:

                [25.07-RELEASE][root@m470-3.stevew.lan]/root: pfSsh.php playback gatewaystatus
                Name             Monitor        Source             Delay   StdDev  Loss  Status  Substatus
                LAN_GW           192.168.182.1  192.168.182.1    0.056ms  0.016ms  0.0%  online       none
                PPPOE_WAN_PPPOE  1.1.1.1        10.0.5.1         7.242ms  0.164ms  0.0%  online       none
                

                Might be able to improve that behaviour....

                w0wW 2 Replies Last reply Reply Quote 0
                • M Offline
                  marcosm Netgate
                  last edited by

                  I say "detached" because that's what the system log says when I disconnect the interface on the VM - it results in the interface being "UP" with a status of "no carrier".

                  Let's keep in focus the following: what exactly is the problem that needs to be solved that necessitates avoiding a route? The checkbox in question removes the static route but I don't see much difference in the traffic being routed by the OS or being routed by pf. One way or another the traffic has to go out the intended the interface. I'm not convinced that a pf-only routing solution is necessary.

                  luckman212L P 2 Replies Last reply Reply Quote 0
                  • luckman212L Offline
                    luckman212 LAYER 8 @marcosm
                    last edited by luckman212

                    @stephenw10 Interesting workaround you posted above, I will try it!

                    @marcosm To answer the question, "what exactly is the problem that needs to be solved that necessitates avoiding a route?", my answer would be:

                    Adding a static route to a monitor IP can (not will) cause 2 main problems:

                    Problem 1

                    Users who try to access a service (DNS, HTTP etc) hosted on that IP will always use that one specific gateway. This gateway might be:

                    1. slow
                    2. expensive
                    3. both 1 & 2
                    4. administratively down or limited (e.g. 4G with data cap)
                    5. blocked at the far side by firewall rules

                    People using the network, who are likely unaware of such a configuration, will not understand why certain things are slow or broken, and simply complain. These users might be business users, or worse– family (wife, kids etc).

                    Problem 2

                    As a network administrator, having such a static route in place makes troubleshooting certain things difficult. For example, using 8.8.8.8 as a monitor IP means that you can't perform DNS lookups to Google DNS without adding more layers of complexity to your setup such as static LAN IPs and firewall rules to redirect DNS queries (as mentioned in the clever solution by Denny above).

                    One simple example:

                    • WAN2 is a backup connection (LTE, metered) with monitor IP 8.8.8.8
                    • A user joins the network and, being a savvy user, has their DNS server hard-coded to 8.8.8.8
                    • Savvy user makes a lot of DNS requests
                    • All of that traffic egresses WAN2
                    • Company receives a $100 mobile data bill for exceeding their data cap for the month

                    This might seem to be an extreme example but, it has happened to me.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Offline
                      stephenw10 Netgate Administrator
                      last edited by

                      Mmm, a good solution here would be some anycast ping targets that aren't DNS servers. But using DNS servers there is really convenient! 😉

                      dennypageD 1 Reply Last reply Reply Quote 1
                      • dennypageD Offline
                        dennypage
                        last edited by

                        Effectively, @luckman212’s request is for a static route that only applies to IGMP echos originating from the firewall itself.

                        M 1 Reply Last reply Reply Quote 0
                        • M Offline
                          marcosm Netgate @dennypage
                          last edited by

                          @dennypage FWIW that doesn't happen currently even with pf. The route-to rule is based on the interface's source address with any destination that's not in the interface's subnet. Still, a rule can be created that applies to the correct traffic.

                          Given the feedback, it sounds like the issue isn't that a route should not exist, but rather some route is needed to allow pf to force the traffic. That's effectively the workaround @stephenw10 showed. Any potential undesired behavior from that kind of solution needs to be considered.

                          1 Reply Last reply Reply Quote 2
                          • dennypageD Offline
                            dennypage @stephenw10
                            last edited by

                            @stephenw10 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                            Mmm, a good solution here would be some anycast ping targets that aren't DNS servers. But using DNS servers there is really convenient! 😉

                            Convenient yes, but from time to time, Google and others get annoyed with everyone using their DNS servers as monitor targets and put temporary blocks in place. I generally recommend people to use regional routers in their ISP instead.

                            luckman212L 1 Reply Last reply Reply Quote 1
                            • luckman212L Offline
                              luckman212 LAYER 8 @dennypage
                              last edited by luckman212

                              @dennypage Exactly! I had written a script called hopfinder that I mentioned farther up, which already does this successfully & automatically for the FIOS connection, where traceroute works properly. On the LTE network, no such luck so I've resorted to querying the RDAP database (which has a nice parseable JSON output) for /32 hosts in T-mobile's network, and then iterating over a handful of them to find a few with the lowest latency. "it works" but the script takes about 45 seconds from start to finish, so not something to run every day, but once a week seems about right.

                              I'm planning to publish the updated script soon, trying to decide if it's worth making into a full package with a GUI.

                              S 1 Reply Last reply Reply Quote 1
                              • S Offline
                                SteveITS Galactic Empire @luckman212
                                last edited by

                                Would a workaround for the fees be to block from LAN to 8.8.8.8 with a policy routing rule? Or would the static route override that? (haven't looked, just brainstorming)

                                FWIW since it was mentioned above, pfBlocker can block DoT, which it has tucked under "DNSBL SafeSearch." Though as I've mentioned elsewhere I know that at least the Dish DVR video on demand "app" (though not the DVR software) is hardcoded to use Google DoT, I think it was.

                                Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                When upgrading, allow 10-15 minutes to reboot, or more depending on packages, CPU, and/or disk speed.
                                Upvote 👍 helpful posts!

                                1 Reply Last reply Reply Quote 0
                                • P Offline
                                  Patch @marcosm
                                  last edited by

                                  @marcosm said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                                  Let's keep in focus the following: what exactly is the problem that needs to be solved

                                  Probably my understanding of how the system works but from my perspective it’s the required choices for monitoring

                                  • the monitoring address must be very reliable as a failure at that site will result in a failure of the monitored interface.so probably needs to be a site with extensive redundancy and probably with more than one option as even major sites like Microsoft go down and assuming otherwise results on extensive secondary failure. Using ISP routing hardware can also be problematic as when this hardware is heavily loaded pings maybe ignored which would result in secondary failure of the monitored interface.

                                  • The current approach requires a static route resulting in all traffic to the monitoring site to always use that specific interface. For a major site doing so over rides system interface loading objectives. Which suggests a monitoring site should be almost never used by real users traffic so some back water site.

                                  • The above requirements result in diametrically opposite choices. This raises the question of why this choice is actually required. Normal load balancing requires dynamically allocating traffic to one of several interfaces depending on current loading. Surely interface selection for interface monitoring should operate at this level.

                                  • imo ideally users would specify a pool of monitoring addresses. For general internet interfaces the monitoring pool for all interfaces would probably be the same however when monitoring a particular interface, that particular interface would be used.

                                  luckman212L dennypageD 2 Replies Last reply Reply Quote 0
                                  • luckman212L Offline
                                    luckman212 LAYER 8 @Patch
                                    last edited by

                                    @Patch Nice summary, it conjures up memories of this 8-year-old idea (and bounty) of mine: dpinger multiple targets - aka gwmond

                                    1 Reply Last reply Reply Quote 0
                                    • dennypageD Offline
                                      dennypage @Patch
                                      last edited by

                                      @Patch said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                                      ideally users would specify a pool of monitoring addresses.

                                      The idea of monitoring multiple addresses has been discussed at length previously.

                                      luckman212L 1 Reply Last reply Reply Quote 0
                                      • luckman212L Offline
                                        luckman212 LAYER 8 @dennypage
                                        last edited by

                                        Yes, there's been much discussion about this, and for many many years. That it keeps coming up is a testament to the fact that for many people, a more robust solution is warranted.

                                        In the redmine you linked, the final comment (from @jimp himself) sums it up nicely:

                                        dpinger is only a daemon that pings and reports responses. It doesn't make decisions about what is good or bad for a pfSense gateway as a whole only its specific single target. It isn't up to dpinger to handle multiple targets or different protocols.

                                        What is needed is more like some middleware-ish daemon to sit between pfSense and other gateway monitoring daemons like dpinger (cough cough, gwmond) that would be capable of coordinating multiple monitoring techniques for each gateway and making more informed decisions about their status.

                                        Given the responses on the dpinger github it appears its author agrees that it's out of scope for dpinger itself.

                                        I agree with Jim (and you @dennypage) that dpinger already does its job well, and should stay focused and simple. I do think pfSense needs that yet-to-be-coded "middleware" which could do a better job of orchestrating multiple dpinger instances + possibly other check methods such as curl/wget fetches to test under conditions where ICMP isn't good enough to rule out false positives/negatives.

                                        1 Reply Last reply Reply Quote 0
                                        • w0wW Offline
                                          w0w @stephenw10
                                          last edited by w0w

                                          @stephenw10 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                                          Add a 3rd dummy gateway that always remains up to provide a default route. Add that to the failover group as some high tier.

                                          Maybe I’m doing something wrong, but when I create a dummy interface, set it to the lowest priority (ex, Tier 3—we don’t really use it as a gateway, right?), and then configure the other two gateways with the “Do not create static routes” option enabled, after a reboot I get the LANGW status “pending” and no default route. So this needs to be another option activated on dummy, "Disable Gateway Monitoring Action"?

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S Offline
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            If it's showing as pending that implies the gateway is not available yet which should never be true for a local interface/IP address. You set something that actually exists I assume?

                                            w0wW 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.