Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    2.5.1: missing route to localhost (no joke)

    Scheduled Pinned Locked Moved Routing and Multi WAN
    12 Posts 3 Posters 1.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 612brokeaf6
      612brokeaf
      last edited by 612brokeaf

      Here's a "funny" one:

      Upgraded one node to 2.5.1 from 2.5.0 and my DNS resolver at localhost stopped working. In fact, it's IPv4 localhost that stopped working!

      2.5.0:

      [2.5.0-RELEASE][xx]/root: netstat -rn | grep ^127
      127.0.0.1          link#3             UH          lo0
      [2.5.0-RELEASE][xx]/root: ifconfig lo0
      lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
      	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
      	inet6 ::1 prefixlen 128
      	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
      	inet6 xx prefixlen 128
      	inet6 xx prefixlen 128
      	inet 127.0.0.1 netmask 0xff000000
      	inet xx netmask 0xffffffff
      	inet xx netmask 0xffffffff
      	groups: lo
      	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
      [2.5.0-RELEASE][xx]/root: ping 127.0.0.1
      PING 127.0.0.1 (127.0.0.1): 56 data bytes
      64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=4.759 ms
      

      2.5.1:

      [2.5.1-RELEASE][xx]/root: netstat -rn | grep ^127 | wc -l
             0
      [2.5.1-RELEASE][xx]/root: ifconfig lo0
      lo0: flags=8149<UP,LOOPBACK,RUNNING,PROMISC,MULTICAST> metric 0 mtu 16384
      	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
      	inet6 ::1 prefixlen 128
      	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
      	inet6 xx prefixlen 128
      	inet6 xx prefixlen 128
      	inet 127.0.0.1 netmask 0xff000000
      	inet xx netmask 0xffffffff
      	inet xx netmask 0xffffffff
      	groups: lo
      	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
      [2.5.1-RELEASE][xx]/root: ping 127.0.0.1
      PING 127.0.0.1 (127.0.0.1): 56 data bytes
      ping: sendto: Can't assign requested address
      

      How is it even possible for a directly connected route not to show in the routing table, unless it is specifically removed... Looks like some odd side effect where something went wrong while applying some specific config. Obviously there was no such issue with 2.5.0. Note that I do have config that touches lo0, and that's multiple secondary IPs (using the VIP functionality) that I use for routing.

      Any clues?

      GertjanG 1 Reply Last reply Reply Quote 0
      • GertjanG
        Gertjan @612brokeaf
        last edited by

        @612brokeaf

        Yeah, a couple of weeks ago, since 2.5.1, commands like dig, ping and others showed a 'error' messages that localhost (127.0.0.1) was absent.

        To get it back : reboot.

        I guess there is already a redmine issue for it.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        612brokeaf6 1 Reply Last reply Reply Quote 0
        • viktor_gV
          viktor_g Netgate
          last edited by

          Unable to reproduce, but it may be related to https://redmine.pfsense.org/issues/11806

          You can try this patch: 221.diff

          GertjanG 612brokeaf6 2 Replies Last reply Reply Quote 0
          • 612brokeaf6
            612brokeaf @Gertjan
            last edited by

            @gertjan Rebooted n times, no change.

            1 Reply Last reply Reply Quote 0
            • GertjanG
              Gertjan @viktor_g
              last edited by

              @viktor_g said in 2.5.1: missing route to localhost (no joke):

              Unable to reproduc

              I was referring to these forum post.

              The issue should pop up when you declare something like this :

              0eaa6557-4dae-4bb0-8b7c-5bcb548cb578-image.png

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              1 Reply Last reply Reply Quote 0
              • 612brokeaf6
                612brokeaf @viktor_g
                last edited by 612brokeaf

                @viktor_g OK, that patch works, albeit not completely. I now have the route to 127.0.0.1 in the table, but another route for localhost (a secondary 172.16.x.x/32) has disappeared after rebooting, meaning my routing is now completely broken until I manually add that route pointing it to localhost, because I rely on this for BGP etc. Interestingly, there is another /32 on lo0 from the same range that I use as GRE source, and that was unaffected.

                viktor_gV 2 Replies Last reply Reply Quote 0
                • viktor_gV
                  viktor_g Netgate @612brokeaf
                  last edited by

                  @612brokeaf said in 2.5.1: missing route to localhost (no joke):

                  @viktor_g OK, that patch works, albeit not completely. I now have the route to 127.0.0.1 in the table, but another route for localhost (a secondary 172.16.x.x/32) has disappeared after rebooting, meaning my routing is now completely broken until I manually add that route pointing it to localhost, because I rely on this for BGP etc. Interestingly, there is another /32 on lo0 from the same range that I use as GRE source, and that was unaffected.

                  Could you show your complete routing config?

                  612brokeaf6 1 Reply Last reply Reply Quote 0
                  • 612brokeaf6
                    612brokeaf @viktor_g
                    last edited by 612brokeaf

                    @viktor_g Not unless you have a config sanitisation tool where I could securely paste the XML and hide config details.

                    I can describe what I have though.

                    I have a hub and spoke type setup with multiple pfSense hosts in different regions as the hub(s). Spokes are a mix of pfSense and traditional big name hardware vendors.

                    On each hub:

                    • 10+ pairs of IPSec tunnels over GRE (several spokes + full mesh between hubs), meaning 10+ GRE interfaces, times two - for each location there is v4 and v6 (IPSec tunnel + GRE for each)
                    • 4 x extra secondary IPs on lo0 (Firewall -> VIPs -> type: alias): two IPv4 (172.16.x.x/32) and two IPv6 (fd00:xx::xx/128). For both v4 and v6, one is used as GRE source and this -> remote is what the IPSec tunnels cover, and the other is a general loopback for services/router IDs/BGP peering.
                    • Running FRR with OSPF + OSPF3 to distribute v4 and v6 loopbacks, and BGP via those loopbacks, hubs run a route reflector.
                    • A single WAN interface with a primary static v4 IP and multiple secondary v4 IPs. Secondaries / extra v4 IPs are /32s, gateway is the primary gateway. Also a /56 public IPv6 on each, ND-RA/SLAAC with /64 PDs.

                    I think possibly the issue triggers when setting up aliases on lo0. After the upgrade from 2.5.0 to 2.5.1, the 127.0.0.1 route was gone from routing table, even though the IP was configured correctly. After the patch you suggested, the route for 127.0.0.1 was in, but the route for another v4 alias for lo0 was not, while the third one was in. This broke most of my VPN tunnels, because some spokes have dynamic IPs and DNS is used to resolve the IPSec tunnel endpoints. Dnsmasq listens on 127.0.0.1 and this is what indirectly broke things. Before the patch I changed the local DNS server to 172.16.x.x as a workaround, but with the patch, that IP didn't make it into the routing table, resulting in the same issue.

                    For now I added manual shellcmds to install the missing lo0 routes on boot.

                    For completeness: I have another manual modification in place, in /etc/inc/config.lib.inc, and that is changing alias_make_table(); to alias_make_table($config);, because otherwise I kept getting crash reports / PHP errors complaining about alias_make_table being called with zero arguments and expecting one. This was being triggered from the ACME cert renewal cron job. There is also another bug in ACME, complaining about the function getarraybyref() not found. Even though all PHP include chains look fine, I can't find another way to fix this than pasting that function into the same scope in ACME. This is for another topic though - this issue looked fixed in 2.5.0, but maybe I fixed it by hand and forgot about it until 2.5.1.

                    viktor_gV 1 Reply Last reply Reply Quote 0
                    • 612brokeaf6
                      612brokeaf
                      last edited by

                      Instead of shellcmds, I added manual static routes to 127/8 and the two other /32s I have on lo0 in the GUI. The node now survives reboot entirely intact.

                      Side note: When adding static routes, gateway / interface selection lists lo0 as respectively "null4" and "null6". This naming is a little confusing - to a network engineer this looks like blackhole routes, and it is probably meant to be exactly that, but this hints that there may be additional rules in place that actually drop the traffic rather than just push it to the CPU, just like in other network OSes there can be a dedicated null interface.

                      1 Reply Last reply Reply Quote 0
                      • 612brokeaf6
                        612brokeaf
                        last edited by 612brokeaf

                        Correction: setting those missing loopback routes as static routes apparently only fixed it on one node and only temporarily.

                        @viktor_g looks like the patch did not change much - I ran debug on that function and it didn't seem to be touching the v4 loopback, so this may be elsewhere - possibly IPSec scripts, since there were so many fixes in 2.5.1? Probably a good test would be to stop/start IPSec and see if this breaks the loopback again, at least it would narrow this down somewhat.

                        Anyhow, I added shellcmds (regular, not early) adding routes to the various lo0 addresses, and that seems to have worked so far, 10+ reboots. It's an ugly fix but I'm not touching it until some proper resolution comes up. I've ran out of downtime credits for now so can't test much for the next dew days.

                        1 Reply Last reply Reply Quote 0
                        • viktor_gV
                          viktor_g Netgate @612brokeaf
                          last edited by

                          @612brokeaf said in 2.5.1: missing route to localhost (no joke):

                          @viktor_g OK, that patch works, albeit not completely. I now have the route to 127.0.0.1 in the table, but another route for localhost (a secondary 172.16.x.x/32) has disappeared after rebooting, meaning my routing is now completely broken until I manually add that route pointing it to localhost, because I rely on this for BGP etc. Interestingly, there is another /32 on lo0 from the same range that I use as GRE source, and that was unaffected.

                          Unable to reproduce on the latest dev snapshot:
                          Screenshot from 2021-05-09 16-04-01.png

                          all OK after rebooting:

                          # netstat -rn | grep 127
                          5.5.5.0/24         127.0.0.1          UGSB        lo0
                          6.6.6.6/32         127.0.0.1          UGSB        lo0
                          127.0.0.1          link#5             UH          lo0
                          
                          1 Reply Last reply Reply Quote 0
                          • viktor_gV
                            viktor_g Netgate @612brokeaf
                            last edited by

                            @612brokeaf said in 2.5.1: missing route to localhost (no joke):

                            For completeness: I have another manual modification in place, in /etc/inc/config.lib.inc, and that is changing alias_make_table(); to alias_make_table($config);, because otherwise I kept getting crash reports / PHP errors complaining about alias_make_table being called with zero arguments and expecting one. This was being triggered from the ACME cert renewal cron job. There is also another bug in ACME, complaining about the function getarraybyref() not found. Even though all PHP include chains look fine, I can't find another way to fix this than pasting that function into the same scope in ACME. This is for another topic though - this issue looked fixed in 2.5.0, but maybe I fixed it by hand and forgot about it until 2.5.1.

                            Please create a bugreport about this issue:
                            https://docs.netgate.com/pfsense/en/latest/development/bug-reports.html

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.