Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down

    Scheduled Pinned Locked Moved Plus 25.07 Develoment Snapshots (Retired)
    47 Posts 6 Posters 1.2k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • luckman212L Offline
      luckman212 LAYER 8 @Bob.Dig
      last edited by luckman212

      @Bob.Dig I don't think that's what's happening. If you scroll up a few posts to where I have a section called "Some pings (with source address binding) and routes" you can see that the pings are traversing each separate gateway (you can tell from the vastly different latencies).

      I just ran a few tcpdumps to confirm as well, the packets are definitely egressing out the separate correct gateways without the static routes:

      [25.07-RC][root@r1.lan]/root: tcpdump -ni ix0 dst host 8.8.8.8
      tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
      listening on ix0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
      ^C
      0 packets captured        <<–– ✅ no packets to the monitor IP seen on the WAN1 interface
      857 packets received by filter
      0 packets dropped by kernel
      
      [25.07-RC][root@r1.lan]/root: tcpdump -ni ix2 dst host 8.8.8.8
      tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
      listening on ix2, link-type EN10MB (Ethernet), snapshot length 262144 bytes
      06:22:32.463054 IP 192.168.191.2 > 8.8.8.8: ICMP echo request, id 22849, seq 36, length 9
      06:22:37.497085 IP 192.168.191.2 > 8.8.8.8: ICMP echo request, id 22849, seq 37, length 9
      06:22:42.500047 IP 192.168.191.2 > 8.8.8.8: ICMP echo request, id 22849, seq 38, length 9
      ^C
      3 packets captured        <<–– ✅ packets are being sent via WAN2
      166 packets received by filter
      0 packets dropped by kernel
      
      luckman212L 1 Reply Last reply Reply Quote 0
      • luckman212L Offline
        luckman212 LAYER 8 @luckman212
        last edited by

        @stephenw10 @marcosm Since you guys seem to be unable to replicate this (?) would you be able to send me the 25.07-RELEASE image to test with? I see on redmine (e.g. here, here, and here) that there's a build you guys are testing on tagged -RELEASE (built on 2025-07-22). Maybe there are some small differences in that build that are affecting my results? I've lost a good portion of my weekend on this and growing more desperate.

        1 Reply Last reply Reply Quote 0
        • M Offline
          marcosm Netgate
          last edited by

          I am able to reproduce the issue by checking the option to not add the automatic route and failing over to a DHCP WAN from a static WAN. Arguably this is not a valid setup when you want to monitor multiple WANs hence the issue of it not working is not in itself necessarily a bug. Note that even if the service is bound to an address or interface, as mentioned, the OS still decides where that traffic will be routed. That's why you see the state with origif for ix0 - pf overrides the OS and sends it over ix2.

          The fact that the system is left without a default gateway does warrant further digging. From looking at the code I see that it's left without a default gateway because at that moment both gateways have been marked as down. I will need to dig further to understand why it's marked as down and if that's an accurate status at that point.

          luckman212L 1 Reply Last reply Reply Quote 1
          • luckman212L Offline
            luckman212 LAYER 8 @marcosm
            last edited by luckman212

            @marcosm Thanks very much for looking. I would't really mind leaving the static route, IF there were any pingable hosts along the nearby path that I seem to be able to derive from traceroute on that WAN2 (Tmobile 4G) connection. I don't want to use 8.8.8.8, 8.8.4.4, 1.1.1.1, 9.9.9.9 etc because then ALL traffic to that host will flow over the backup (slow, expensive) connection.

            I enabled the hidden system/route-debug option, and am still trying to track down the chain of events that leads pfSense to marking WAN2 down and removing the default route. But any help would be most appreciated and let me know if I can provide anything more.

            P dennypageD 2 Replies Last reply Reply Quote 0
            • P Offline
              Patch @luckman212
              last edited by

              @luckman212 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

              I don't want to use 8.8.8.8, 8.8.4.4, 1.1.1.1, 9.9.9.9 etc because then ALL traffic to that host will flow over the backup (slow, expensive) connection.

              Is that really the case?
              Surely both the main and backup internet connection can reach all internet sites but the route taken by each packet does not just depend on which route has reached that site in the past.

              luckman212L 1 Reply Last reply Reply Quote 0
              • dennypageD Offline
                dennypage @luckman212
                last edited by

                @luckman212 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                I don't want to use 8.8.8.8, 8.8.4.4, 1.1.1.1, 9.9.9.9 etc because then ALL traffic to that host will flow over the backup (slow, expensive) connection.

                If you want to monitor the backup connection, something has to flow over that connection. No way around that. If you need a public DNS server as a target, just pick an address that you are not using as an active DNS server. There are lots to choose from, even from the common DNS hosts (8.8.8.8, 8.8.4.4, 1.1.1.1, 1.0.0.1, and your ISP's DNS servers). You don't need all of them as DNS servers.

                However, if you absolutely don't want anything going over the backup connection, another option would be to just disable gateway monitoring on the backup connection altogether. Given your setup, I expect that you have disabled the gateway monitoring action on the backup connection, so the monitoring of the backup connection is really only for human consumption.

                1 Reply Last reply Reply Quote 0
                • luckman212L Offline
                  luckman212 LAYER 8 @Patch
                  last edited by

                  @Patch said

                  Is that really the case?
                  Surely both the main and backup internet connection can reach all internet sites but the route taken by each packet does not just depend on which route has reached that site in the past.

                  Yes, it is really the case - if you set a monitor IP to e.g. 8.8.8.8, a static route gets created which forces all traffic over that gateway. Even if it's not your active/primary gateway.

                  P 1 Reply Last reply Reply Quote 1
                  • P Offline
                    Patch @luckman212
                    last edited by

                    @luckman212 what a weird way of coding the monitoring.

                    I had assumed if monitoring was specified for a particular gateway then the monitoring packets would be sent over the monitored interface without implying any other changes to the routing policy.

                    Similar to when pinging from an interface doesn't imply all routing to that server suddenly also must go through that interface.

                    dennypageD 1 Reply Last reply Reply Quote 0
                    • dennypageD Offline
                      dennypage @Patch
                      last edited by

                      @Patch said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                      I had assumed if monitoring was specified for a particular gateway then the monitoring packets would be sent over the monitored interface without implying any other changes to the routing policy.

                      Similar to when pinging from an interface doesn't imply all routing to that server suddenly also must go through that interface.

                      Routing in Unix is IP destination based rather than source based. The way monitor packets are forced out an interface is with a static routing rule that says "If you are sending a packet to this IP address, the packet must be sent out this interface." This means that all packets destined for that IP address will out the interface specified.

                      luckman212L 1 Reply Last reply Reply Quote 0
                      • luckman212L Offline
                        luckman212 LAYER 8 @dennypage
                        last edited by luckman212

                        I saw 25.07 release was published. So I guess this is a moot point for now, as the next major release won't be before 25.11 at the earliest. I will keep monkeying around I guess.

                        @dennypage if what you wrote is true, then how can you explain the tcpdumps above, when both WAN1 and WAN2 are "up", and I have the "don't create static routes for monitor IPs" option enabled on WAN2, and I see no packets to 8.8.8.8 leaving ix0—they are 100% going out on ix2, confirmed with tcpdump and the 50+ms latency indicative of the 4G connection, and at the same time my default route being via the WAN1/FIOS... ?

                        dennypageD M 2 Replies Last reply Reply Quote 0
                        • dennypageD Offline
                          dennypage @luckman212
                          last edited by

                          @luckman212 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                          @dennypage if what you wrote is true, then how can you explain the tcpdumps above, when both WAN1 and WAN2 are "up", and I have the "don't create static routes for monitor IPs" option enabled on WAN2, and I see no packets to 8.8.8.8 leaving ix0—they are 100% going out on ix2, confirmed with tcpdump and the 50+ms latency indicative of the 4G connection, and at the same time my default route being via the WAN1/FIOS... ?

                          “If what you wrote is true”? Do you think I am lying to you? Really?

                          Yes, it’s true that Unix uses destination based routing. Yes, it’s true that static routes are required for monitoring Multi-Wan. And monitoring works correctly if you set the static route, yes? QED. I don’t know what else to say.

                          If it’s important to you to understand the reason for the specific results of the test above, it’s your system so you’ll have to figure it out based on the system state at the time of the test. I’d suggest that you start by examining your routing tables:

                          netstat -rn
                          
                          luckman212L 1 Reply Last reply Reply Quote 0
                          • M Offline
                            marcosm Netgate @luckman212
                            last edited by

                            @luckman212 It works without the option because pf "catches" the traffic before it leaves ix0 - hence my previous comment "pf overrides the OS and sends it over ix2". The reason why pf can't do its job in your case is because the default route goes away; since there's no route for the OS to use for dpinger, you get the sendto error and pf doesn't get the chance to override the path to send it out of ix2.

                            luckman212L 1 Reply Last reply Reply Quote 1
                            • luckman212L Offline
                              luckman212 LAYER 8 @dennypage
                              last edited by

                              Nobody said anything about lying. I should have phrased it as "Let's assume that FreeBSD routing behaves as you've outlined... in that case, how can I be observing XYZ"

                              I'm sorry this thread is starting to derail. I appreciate all your help. I am not nor never claimed to have all the answers. Just looking for explanations for the new, unwanted and somewhat unexplainable behavior I am seeing here.

                              1 Reply Last reply Reply Quote 0
                              • luckman212L Offline
                                luckman212 LAYER 8 @marcosm
                                last edited by luckman212

                                @marcosm said:

                                pf can't do its job in your case is because the default route goes away

                                So is that still being considered a bug then? I still can't figure out why WAN1 going down (either by way of physically downing the interface by removing the cable, or by dpinger triggering a down event) should cause pfSense to mark the other gateway down and/or remove the default gateway. Feels wrong.

                                Is the explanation that, WAN1 goes down, and before the system has a chance to set WAN2 as the default gateway, the pings to 8.8.8.8 start failing because "technically" there's no longer or not yet a valid default route to send those packets (pf ignored) - and this causes WAN2 to then go down leaving the box dead as a doornail?

                                If that's loosely what's going on here, then what about adding a simple option to the routing page something like "Do not remove a default gateway if there are no other online gateways in the group"

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S Offline
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  It seems like a bug to me. Because the WAN2 gateway would remain marked up for a while, even if dpinger starts to lose pings, and should be set as default.

                                  If there was any default route then dpinger would use it and pf would catch and reroute that via WAN2.

                                  It's an interesting issue. I don't think I've ever seen anyone using it without the static route set. I've seen numerous issues with conflicting routes for DNS and dpinger though 😉 But I have always resolved them by simply using a different target or making sure the both use the same gateway.

                                  1 Reply Last reply Reply Quote 1
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Most of that code is script though so it should be patchable.

                                    1 Reply Last reply Reply Quote 0
                                    • M Offline
                                      marcosm Netgate
                                      last edited by

                                      At least there seem to be improvements to be made. I will dig further.

                                      1 Reply Last reply Reply Quote 2
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.