Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down

    Scheduled Pinned Locked Moved Plus 25.07 Develoment Snapshots
    31 Posts 5 Posters 809 Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • luckman212L Online
      luckman212 LAYER 8
      last edited by luckman212

      adding here from https://redmine.pfsense.org/issues/16331 for more discussion and eyes:

      On my home 6100 that I factory erased and formatted with a fresh 25.07RC via Netgate installer (25.07.r.20250715.1733) I am having a bad situation occur where the default route (0.0.0.0/0) gets removed if the link drops (no carrier) on my WAN. This includes during operation, or at boot-time. The default route is not replaced by anything, thus breaking just about everything.

      I thought it might have been because I have a S2S Wireguard tunnel that uses Policy Based Routing and has a Peer with "Allowed IPs" set to 0.0.0.0/0 but I tried disabling that peer and the behavior continued. I have tried rebooting a few times to be sure this wasn't a one-off.

      If I manually go to System > Routing and choose a specific V4 gateway (my Tier2) then things start to work again.

      Similarly, if I ssh in and type

      route add default <ip_of_my_tier2_gw>
      

      That gets things working temporarily as well.

      My setup is:

      • FIOS via a 10Gtek SFP+ adapter on ix0 as Tier1 (DHCP+DHCP6)
      • a Teltonika RUTX11 as my Tier2 WAN on ix2 (RJ45)
      • LAN on ix1 (another SFP+ to a Unifi 10G switch)

      Packages installed currently are:

      • acme
      • arping
      • aws-wizard
      • Backup (removed)
      • Cron
      • Filer
      • iperf
      • ipsec-profile-wizard
      • mDNS-Bridge
      • Netgate_Firmware_Upgrade
      • Nexus
      • pfBlockerNG (removed)
      • Shellcmd
      • softflowd
      • sudo
      • System_Patches
      • Tailscale
      • WireGuard

      I have a couple of status_output.tgz debug archives collected before and immediately after a reboot. Happy to send those off to whoever @netgate to help troubleshoot this, or any other sort of troubleshooting. I'm surprised nobody else has hit this during the beta testing.

      1 Reply Last reply Reply Quote 0
      • M Offline
        marcosm Netgate
        last edited by

        I use a gateway group as the default gateway for both IPv4 and IPv6 and it works as expected - igb0 is tier 1 and igb1 is tier 2:

        # netstat -rn | grep default
        default            192.168.1.254      UGS            igb1
        default                           fe80::da21:daff:fe19:dbb0%igb1 UG            igb1
        
        # ifconfig igb0 | grep status
        	status: no carrier
        

        You can share the files/logs here for review:
        https://nc.netgate.com/nextcloud/s/Dj3ZbjQstNB52e7

        luckman212L 1 Reply Last reply Reply Quote 0
        • stephenw10S Offline
          stephenw10 Netgate Administrator
          last edited by

          Mmm, I'm failing to duplicate that here too.

          What do you see logged when the tier 1 gateway goes down?

          luckman212L 1 Reply Last reply Reply Quote 0
          • luckman212L Online
            luckman212 LAYER 8 @marcosm
            last edited by

            @marcosm Thanks somehow I didn't get the reply notification so I just saw this.

            I uploaded 2 tgz archives, one from before a reboot and one right after a fresh reboot.

            07b546f4-013f-4980-a7a7-6e7f4d047acf-image.png

            I will run some additional tests now and capture more logs.

            luckman212L 1 Reply Last reply Reply Quote 0
            • luckman212L Online
              luckman212 LAYER 8 @luckman212
              last edited by

              This is highly reproducible for me. Any additional debug info I can provide just let me know. Happy to give remote GUI/SSH access to Netgate as well if there's any need for someone to take a look.

              1 Reply Last reply Reply Quote 0
              • luckman212L Online
                luckman212 LAYER 8 @stephenw10
                last edited by luckman212

                @stephenw10 I ran the command below and then "pulled the plug" on the ix0 interface (wan1):

                tail -f dhcpd.log filter.log gateways.log resolver.log routing.log system.log
                

                I assembled the output into a single file called logs_198256.txt and uploaded it to the same nextcloud link above. I also culled out the large amount of tailscaled logspew from system.log and moved it to its own section at the bottom.

                I thought this might be related to the new Firewall State Policy options (Floating States, Interface Bound States) but I tried it both ways (rebooting between changes) and the results are the same. Again, happy to provide remote access to this firewall if there's any remote troubleshooting that needs to happen.

                luckman212L 1 Reply Last reply Reply Quote 0
                • luckman212L Online
                  luckman212 LAYER 8 @luckman212
                  last edited by luckman212

                  Any further detail I can provide here? I know 25.07 must be very close, and while I assume this must be an edge case, there's nothing too out of the ordinary in my config. I installed "fresh" and configured everything from scratch— I didn't upgrade or import any old config when I upgraded.

                  Wondering if it's the ix0 (10G) interface vs the igb interfaces here that are the reason you can't reproduce.

                  I also removed pfBlockerNG for now to rule that out (did not change anything wrt this bug)

                  1 Reply Last reply Reply Quote 0
                  • luckman212L luckman212 referenced this topic
                  • stephenw10S Offline
                    stephenw10 Netgate Administrator
                    last edited by

                    Hmm, it looks like you have 'Do not add static route for gateway monitor IP address via the chosen interface' set on both gateways. As a result of that there is no static route for 8.8.8.8 via WAN2 and dpinger is probably starting the pings via WAN1. Hence when WAN1 goes down the pings for WAN also fail:

                    <12>1 2025-07-28T09:20:52.492871-04:00 r1.lan dpinger 53649 - - WAN2_RUT 8.8.8.8: sendto error: 65
                    <12>1 2025-07-28T09:20:57.502914-04:00 r1.lan dpinger 53649 - - WAN2_RUT 8.8.8.8: sendto error: 65
                    <12>1 2025-07-28T09:21:02.500997-04:00 r1.lan dpinger 53649 - - WAN2_RUT 8.8.8.8: Alarm latency 0us stddev 0us loss 100%
                    

                    Thus it is also omitted from the gateway group leaving no gateways available to set as default:

                    <27>1 2025-07-28T09:20:54.496304-04:00 r1.lan php-fpm 581 - - /rc.openvpn: MONITOR: WAN2_RUT has packet loss, omitting from routing group GWG_Failover_V4
                    

                    Is there any particular reason you have that set? It's not set by default because you pretty much always want dpinger to se the gateway on the interface it's monitoring.

                    dennypageD 1 Reply Last reply Reply Quote 1
                    • dennypageD Offline
                      dennypage @stephenw10
                      last edited by

                      @stephenw10 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                      Hmm, it looks like you have 'Do not add static route for gateway monitor IP address via the chosen interface' set on both gateways. As a result of that there is no static route for 8.8.8.8 via WAN2 and dpinger is probably starting the pings via WAN1.

                      Hey @luckman212, I remember the conversation a few years ago about setting this parameter. I didn't have multiple WAN connections at the time, and I don't recall you having them either... is my memory faulty?

                      I've since added a second WAN connection, and turned this parameter back off. FWIW, I don't see how dpinger monitoring multiple WAN connections will work otherwise. YMMV.

                      luckman212L 1 Reply Last reply Reply Quote 1
                      • stephenw10S Offline
                        stephenw10 Netgate Administrator
                        last edited by

                        Mmm, it does seem slightly odd since I'd expect anything sourced from the WAN2 address directly to have route-to applied via the WAN2 gateway. 🤔

                        1 Reply Last reply Reply Quote 0
                        • luckman212L Online
                          luckman212 LAYER 8 @dennypage
                          last edited by luckman212

                          @dennypage Thanks for looking at this! and @stephenw10 thank you for testing as well.

                          Yes I do vaguely remember our conversation also, but it's been a few years and much has changed. I used to be more familiar with the pfSense code but haven't been tracking it lately since the switch to pfSense+.

                          Do I misunderstand the significance of dpinger's -B (bind) argument? I assumed that flag directed dpinger to send pings out of (and expect the response on) the correct interface, making setting a global static route unnecessary. Checking with pgrep -lf dpinger I see that pfSense does set this flag:

                          # pgrep -lf dpinger
                          46497 /usr/local/bin/dpinger -S -r 0 -i WAN1_FIOS -B 74.101.221.156 -p /var/run/dpinger_WAN1_FIOS~74.101.221.156~100.41.221.88.pid -u /var/run/dpinger_WAN1_FIOS~74.101.221.156~100.41.221.88.sock -C /etc/rc.gateway_alarm -d 1 -s 500 -l 2000 -t 60000 -A 1000 -D 500 -L 20 100.41.221.88
                          46943 /usr/local/bin/dpinger -S -r 0 -i WAN2_RUT -B 192.168.191.2 -p /var/run/dpinger_WAN2_RUT~192.168.191.2~8.8.8.8.pid -u /var/run/dpinger_WAN2_RUT~192.168.191.2~8.8.8.8.sock -C /etc/rc.gateway_alarm -d 1 -s 5000 -l 2000 -t 120000 -A 10000 -D 500 -L 75 8.8.8.8
                          

                          I also just performed this simple test from the commandline:

                          # dpinger -r 1000 -f -i WAN2_test -B 192.168.191.2 -s 1000 -d 1 -D 500 -L 75 8.8.4.4
                          

                          When I yank the cable out of WAN2, the pings start failing. They are not routed out of the other WAN1 gateway (that is good/expected). So while I do not want to question you as the author of the program 😳 I guess I don't quite understand the statement "I don't see how dpinger monitoring multiple WAN connections will work otherwise"

                          I enable the dpinger_dont_add_static_route option so I can use common anycast IPs like 1.1.1.1 or 8.8.8.8 without breaking access to those hosts (and thus breaking DNS) when a gateway goes down. I've had this config for years- nothing new here, I've had multi-WAN since at least 2016 or thereabouts. This has worked fine on 22.x, 23.x and 24.x releases. Am I missing something new?

                          Also, more generally, I would question any logic that removes the default / 0.0.0.0 route entirely. Even in a case where pfSense believes "all" gateways are down, it should assign the default route to the lowest priority gateway as a failsafe. A "router" without a default route seems odd...

                          dennypageD 2 Replies Last reply Reply Quote 2
                          • dennypageD Offline
                            dennypage @luckman212
                            last edited by

                            @luckman212 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                            es I do vaguely remember our conversation also, but it's been a few years and much has changed. I used to be more familiar with the pfSense code but haven't been tracking it lately since the switch to pfSense+.

                            Do I misunderstand the significance of dpinger's -B (bind) argument? I assumed that flag directed dpinger to send pings out of (and expect the response on) the correct interface, making setting a global static route unnecessary.

                            There's no magic with the bind argument. The bind argument causes dpinger to bind to a specific address, which will control the source address of the packets sent. However, binding to an address does not actually mean that packets will leave via the interface associated with that address.

                            With the exception of multicast, how packets are routed is always controlled by the OS and its system routing tables. If a static route for the destination has not been set, the outbound packet will be routed via whatever interface the OS deems appropriate at that moment. This is why the static route option is necessary, and on by default. It is only unnecessary if there is a single WAN interface.

                            @stephenw10, it might be nice to add a note in the help text for the static route option to this effect...

                            luckman212L 1 Reply Last reply Reply Quote 0
                            • stephenw10S Offline
                              stephenw10 Netgate Administrator
                              last edited by

                              But if dpinger binds to the WAN address then that traffic should always go via the WAN gateway because the rule that passes it is:

                              pass out  route-to ( ix2 192.168.191.1 ) from 192.168.191.2 to !192.168.191.0/24 ridentifier 1000010012 keep state allow-opts label "let out anything from firewall host itself"
                              

                              That could be broken by a user outbound rule since it's not quick but I don't see one here.

                              I also don't see a state for 8.8.8.8 icmp from 192.168.191.2 in either status output?

                              1 Reply Last reply Reply Quote 0
                              • luckman212L Online
                                luckman212 LAYER 8 @dennypage
                                last edited by luckman212

                                @dennypage I see what you're saying, but has something changed in FreeBSD 15 that could be affecting this? I am fairly sure this was all working as expected before.

                                I just ran another test, pulled my WAN2 cable out and executed this command:

                                # dpinger -r 1000 -f -i WAN2_test -B 192.168.191.2 -s 1000 -d 1 -D 500 -L 75 1.0.0.1
                                send_interval 1000ms  loss_interval 4000ms  time_period 60000ms  report_interval 1000ms  data_len 1  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 75%  alarm_hold 10000ms  dest_addr 1.0.0.1  bind_addr 192.168.191.2  identifier "WAN2_test "
                                WAN2_test 0 0 0
                                WAN2_test 0 0 0
                                WAN2_test 0 0 0
                                WAN2_test 0 0 100
                                WAN2_test 1.0.0.1: Alarm latency 0us stddev 0us loss 100%
                                WAN2_test 0 0 100
                                WAN2_test 0 0 100
                                WAN2_test 0 0 100
                                ...
                                

                                So, 100% packet loss, gateway down. Then, without the -B switch:

                                # dpinger -r 1000 -f -i WAN2_test -s 1000 -d 1 -D 500 -L 75 1.0.0.1
                                send_interval 1000ms  loss_interval 4000ms  time_period 60000ms  report_interval 1000ms  data_len 1  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 75%  alarm_hold 10000ms  dest_addr 1.0.0.1  bind_addr (none)  identifier "WAN2_test "
                                WAN2_test 3309 0 0
                                WAN2_test 3517 208 0
                                WAN2_test 3421 222 0
                                WAN2_test 3464 210 0
                                WAN2_test 3524 230 0
                                WAN2_test 3565 231 0
                                WAN2_test 3636 269 0
                                WAN2_test 3707 314 0
                                WAN2_test 3709 303 0
                                WAN2_test 3706 280 0
                                WAN2_test 3760 327 0
                                ...
                                

                                (pings succeed)

                                # route -n get 1.0.0.1
                                   route to: 1.0.0.1
                                destination: 0.0.0.0
                                       mask: 0.0.0.0
                                    gateway: 74.101.221.1
                                        fib: 0
                                  interface: ix0
                                      flags: <UP,GATEWAY,DONE,STATIC>
                                 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
                                       0         0         0         0      1500         1         0
                                
                                # netstat -rn -f inet | grep -E '1\.0\.0\.|0\.0\.0\.0'
                                0.0.0.0            74.101.221.1       UGS             ix0
                                
                                # ping 1.0.0.1
                                PING 1.0.0.1 (1.0.0.1): 56 data bytes
                                64 bytes from 1.0.0.1: icmp_seq=0 ttl=60 time=3.363 ms
                                64 bytes from 1.0.0.1: icmp_seq=1 ttl=60 time=4.120 ms
                                64 bytes from 1.0.0.1: icmp_seq=2 ttl=60 time=2.914 ms
                                ^C
                                --- 1.0.0.1 ping statistics ---
                                3 packets transmitted, 3 packets received, 0.0% packet loss
                                round-trip min/avg/max/stddev = 2.914/3.466/4.120/0.498 ms
                                
                                dennypageD 2 Replies Last reply Reply Quote 1
                                • stephenw10S Offline
                                  stephenw10 Netgate Administrator
                                  last edited by stephenw10

                                  With WAN2 up check the state opened by dpinger and the rule that opened it.

                                  It should show a state on ix2 from 192.168.191.2 and being opened by the default allow out rule above.

                                  But I suspect it won't because that should not go down when WAN1 does.

                                  luckman212L 1 Reply Last reply Reply Quote 0
                                  • luckman212L Online
                                    luckman212 LAYER 8 @stephenw10
                                    last edited by luckman212

                                    @stephenw10 said:

                                    I also don't see a state for 8.8.8.8 icmp from 192.168.191.2 in either status output?

                                    State for 8.8.8.8:

                                    [25.07-RC][root@r1.lan]/root: pfctl -vvss | grep -A3 'ix2.*8.8.8.8'
                                    ix2 icmp 192.168.191.2:18153 -> 8.8.8.8:8       0:0
                                       age 00:48:33, expires in 00:00:10, 581:581 pkts, 16849:16849 bytes, rule 107, allow-opts
                                       id: 52c2976800000000 creatorid: 7d506d72 route-to: 192.168.191.1@ix2
                                       origif: ix0
                                    

                                    Rules (anything look wrong here?):

                                    [25.07-RC][root@r1.lan]/root: pfctl -vvsr | grep -A3 '@107'
                                    @107 pass out route-to (ix2 192.168.191.1) inet from 192.168.191.2 to ! 192.168.191.0/24 flags S/SA keep state (if-bound) allow-opts label "let out anything from firewall host itself" ridentifier 1000010023
                                      [ Evaluations: 423034    Packets: 688       Bytes: 40320       States: 0     ]
                                      [ Inserted: uid 0 pid 0 State Creations: 19    ]
                                      [ Last Active Time: Thu Jul 31 20:45:00 2025 ]
                                    
                                    [25.07-RC][root@r1.lan]/root: pfctl -sr | grep 'pass.*192.168.191'
                                    pass out route-to (ix2 192.168.191.1) inet from 192.168.191.2 to ! 192.168.191.0/24 flags S/SA keep state (if-bound) allow-opts label "let out anything from firewall host itself" ridentifier 1000010023
                                    pass in quick on ix2 reply-to (ix2 192.168.191.1) inet from <OPT3__NETWORK> to any flags S/SA keep state (if-bound) label "USER_RULE: allow inet" label "id:1561436253" ridentifier 1561436253
                                    
                                    [25.07-RC][root@r1.lan]/root: cat /tmp/rules.debug | grep 'pass.*192.168.191'
                                    pass out  route-to ( ix2 192.168.191.1 ) from 192.168.191.2 to !192.168.191.0/24 ridentifier 1000010023 keep state allow-opts label "let out anything from firewall host itself"
                                    pass  in  quick  on $WAN2_RUT reply-to ( ix2 192.168.191.1 ) inet from $OPT3__NETWORK to any ridentifier 1561436253 keep state label "USER_RULE: allow inet" label "id:1561436253"
                                    
                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S Offline
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      No that looks like exactly what I would expect. It's being passed by the route-to rule and forced via the WAN2 gateway. As such I would not expect that to fail if WAN1 is disconnected.

                                      However I assume it does fail?

                                      luckman212L 1 Reply Last reply Reply Quote 0
                                      • luckman212L Online
                                        luckman212 LAYER 8 @stephenw10
                                        last edited by luckman212

                                        @stephenw10 Yes, pings out of WAN2 start to fail as soon as I pull the WAN1 cable...

                                        [25.07-RC][root@r1.lan]/root: dpinger -r 1000 -f -i WAN2_test -B 192.168.191.2 -s 1000 -d 1 -D 500 -L 75 1.0.0.1
                                        send_interval 1000ms  loss_interval 4000ms  time_period 60000ms  report_interval 1000ms  data_len 1  alert_interval 1000ms  latency_alarm 500ms  loss_alarm 75%  alarm_hold 10000ms  dest_addr 1.0.0.1  bind_addr 192.168.191.2  identifier "WAN2_test "
                                        WAN2_test 69570 0 0
                                        WAN2_test 70163 649 0
                                        WAN2_test 68235 2777 0
                                        WAN2_test 65246 5713 0
                                        WAN2_test 60112 11467 0
                                        WAN2_test 63049 12360 0
                                        WAN2_test 62877 11451 0
                                        WAN2_test 60334 14108 0
                                        WAN2_test 60794 13456 0
                                        WAN2_test 59918 13124 0
                                        WAN2_test 60180 12596 0
                                        WAN2_test 59935 12134 0
                                        WAN2_test 59935 12134 0
                                        WAN2_test 59567 11766 0
                                        WAN2_test 59506 11371 0
                                        WAN2_test 58916 11244 0
                                        WAN2_test 58469 11053 0
                                        WAN2_test 57181 11984 0
                                        WAN2_test 57413 11703 0
                                        WAN2_test 57553 11424 0
                                        WAN2_test 58340 11692 0
                                        WAN2_test 58237 11431 0
                                        WAN2_test 58841 11538 0
                                        ( here is where I yank the WAN1 cable... )
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 0
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 0
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 0
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 0
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 4
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 11
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 14
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 17
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 20
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 23
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 25
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 28
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 30
                                        WAN2_test 1.0.0.1: sendto error: 65
                                        WAN2_test 58841 11538 32
                                        
                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S Offline
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Hmm, and presumably it still logs the WAN2 gateway going down?

                                          Does the state still exist after pulling WAN1? Still on ix2?

                                          To be clear, does it work as expected if you allow it to create the static route?

                                          luckman212L 1 Reply Last reply Reply Quote 0
                                          • dennypageD Offline
                                            dennypage @luckman212
                                            last edited by

                                            @luckman212 said in 25.07 RC - no default gateway being set if default route is set to a gateway group and the Tier 1 member interface is down:

                                            I see what you're saying, but has something changed in FreeBSD 15 that could be affecting this?

                                            Not that I am aware of. The general behavior of routing in Unix systems goes back to system 3 times.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.