Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Updating to pfSense+ 24.3 breaks routing - kernel routes now gone

    Scheduled Pinned Locked Moved FRR
    51 Posts 7 Posters 3.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      marcosm Netgate
      last edited by

      To clarify, is the default route is missing from both Zebra and the kernel, or just Zebra?

      It crazily thinks that some high-cost OSPF route is a better option than its directly connected default gateway, and it shouldn't.

      Is this happening while the lower-cost route exists in the kernel? If so, is that happening with newly established traffic as well (as in not traffic for which states already exist)?

      G 1 Reply Last reply Reply Quote 0
      • M
        marcosm Netgate
        last edited by

        Please test this patched frr 9.1 version and let us know if the issue persists.

        M 1 Reply Last reply Reply Quote 0
        • G
          Gcon @marcosm
          last edited by

          @marcosm I tested that in my production lab by upgrading the lab PfSense Plus 23.x to 24.x and seeing the breakage (K routes disappearing), and then I stopped FRR and applied that patched version, and started it again - Kernel routes showing up. Rebooted - still have the K routes.

          Is there a bug reference ID you can link to? I'm really curious! I've spent days on this and would love to find out.

          Would you recommend I use this in production? Maybe I am best waiting for 24.8 - where perhaps an updated FRR build will have more testing? Then I can skip 24.3 altogether and just go straight to 24.8.

          This patched version has this:

          configured with:
              '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-vtysh' '--disable-doc-html' '--sysconfdir=/var/etc/frr' '--localstatedir=/var/run/frr' '--disable-nhrpd' '--disable-pathd' '--disable-ospfclient' '--disable-pimd' '--disable-pbrd' '--with-vtysh-pager=cat' '--enable-backtrace' '--disable-config-rollbacks' '--disable-datacenter' '--enable-fpm' '--disable-ldpd' '--disable-doc' '--without-libpam' '--enable-rpki' '--disable-sharpd' '--disable-shell-access' '--enable-snmp' '--disable-tcmalloc' '--prefix=/usr/local' '--mandir=/usr/local/man' '--disable-silent-rules' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd15.0' 'build_alias=amd64-portbld-freebsd15.0' 'PKG_CONFIG=pkgconf' 'PKG_CONFIG_LIBDIR=/wrkdirs/usr/ports/net/frr9/work/.pkgconfig:/usr/local/libdata/pkgconfig:/usr/local/share/pkgconfig:/usr/libdata/pkgconfig' 'CC=cc' 'CFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -L/usr/local/lib -fstack-protector-strong ' 'LIBS=' 'CPPFLAGS=-I/usr/local/include -I/usr/local/include' 'CPP=cpp' 'CXX=c++' 'CXXFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'PYTHON=/usr/local/bin/python3.11'
          

          I compared it to other builds and nothing stands out. SNMP was off in one of the builds (for CE) and one of the other builds had "--mandir=/usr/local/share/man" instead of "--mandir=/usr/local/man" so am thinking that the fix was more than just build config.

          In case this info is still required.... even though the root cause seems to have been identified/fixed....

          To clarify, is the default route is missing from both Zebra and the kernel, or just Zebra?

          The default route was missing just from Zebra. It was in the kernel.

          Is this happening while the lower-cost route exists in the kernel?

          Yes that's right.

          If so, is that happening with newly established traffic as well (as in not traffic for which states already exist)?

          Yes. Internet web browsing to new websites was broken. Traffic would go from a workstation to the Mikrotik cloud core LAN router. The Mikrotik could see the default route to the local pfsense, and also a default route over the microwave link to the other site's pfsense. The microwave link has a high OSPF cost, so the LAN router would correctly send the Internet traffic to the local pfSense. But then the local pfSense had an OSPF-learned route to the remote site over the microwave link and no K route for the local connected gateway, and bounced the traffic back to the LAN router, which then sent it back to the local pfsense . Can see that with traceroutes - traffic oscillating between firewall and LAN router until TTL timeout.

          I don't know how it all works, but my experience suggests that if a route exists in Zebra and is subsequently added to the Zebra FIB, then this is the forwarding that gets used. If Zebra has no RIB/FIB entry, then it falls back to the system RIB/FIB (as given by "netstat -rn") before failing. This layering would make sense so that Zebra can start and stop with the least amount of impact. It's a massive danger though when the kernel routes don't get pushed from system/kernel to Zebra, because an incomplete view can lead to extremely poor routing decisions.

          1 Reply Last reply Reply Quote 0
          • M
            marcosm Netgate
            last edited by

            We found what looks to be the root cause - info has been posted to the Redmine report.

            The route redistribution issue still needs testing with the patched version, any help with that would be appreciated.

            I suggest waiting until we pick back the fix to 24.03 for your production systems.

            1 Reply Last reply Reply Quote 1
            • M
              mAineAc @marcosm
              last edited by

              @marcosm said in Updating to pfSense+ 24.3 breaks routing - kernel routes now gone:

              Please test this patched frr 9.1 version and let us know if the issue persists.

              How do you install this? Sorry pretty new. Can I just scp this to my netgate 7100 and use some sort of package manager to install? Any particular process that won't break further releases?

              M 1 Reply Last reply Reply Quote 0
              • M
                marcosm Netgate @mAineAc
                last edited by

                @mAineAc See the previous comment.

                M 1 Reply Last reply Reply Quote 0
                • M
                  mAineAc @marcosm
                  last edited by

                  @marcosm Yeah, after installing no change. rebooted no change. I don't see the default route in FRR and it is not redistributing the default route.

                  M 1 Reply Last reply Reply Quote 0
                  • M
                    marcosm Netgate @mAineAc
                    last edited by marcosm

                    @mAineAc Try to rule out configuration issues by verifying what version it last worked on.

                    @Gcon The updated frr9 package is now available in 24.03. You can pull in the update by running pfSense-upgrade in the CLI. Please let us know if it works on your system(s).

                    G M 2 Replies Last reply Reply Quote 1
                    • G
                      Gcon @marcosm
                      last edited by

                      @marcosm I just tested in my production simulation lab and all looks good. I'll update the actual production firewall this weekend. This is a great result - thanks so much for your efforts - it's really appreciated.

                      1 Reply Last reply Reply Quote 1
                      • M
                        mAineAc @marcosm
                        last edited by

                        @marcosm Will this be coming to 24.08.a.20240702.0600? I am running this and the package listed does not seem to work and i am still having the same issue. I have not seen any updated packages.

                        M 1 Reply Last reply Reply Quote 0
                        • M
                          marcosm Netgate @mAineAc
                          last edited by

                          @mAineAc No - you'd have to build/install it manually for the public dev build. I'm not aware of any official bug report for the issue you're experiencing. My suggestion is to treat it like any other bug report: provide steps to reproduce it, and determine if it's a regression by finding the version(s) of the related software when it last worked.

                          1 Reply Last reply Reply Quote 0
                          • K
                            Kevin S Pare
                            last edited by

                            I just following up on this. We tried to upgrading from PFS 22.05/FRR 7.5.1 to PFS 24.11/FRR 9.1.2

                            We found that traffic was spotting and simply wouldn't route properly. If we turn down one of the 2 peers traffic would work perfectly. but as long as both peers were up traffic was spotty and would drop.

                            We would like to stick with a netgate router but at this point we are looking to switch over to a cisco asr instead.

                            22.05 would be fine for us to stay on but unfortunately we can't downgrade a router and install the older frr anymore due to a php error.

                            M 1 Reply Last reply Reply Quote 0
                            • M
                              michmoor LAYER 8 Rebel Alliance @Kevin S Pare
                              last edited by

                              @Kevin-S-Pare Out of curiosity, do you have a high level diagram of how the pfsense is routing? Is a pfsense box with 2x upstream peers terminated on the same firewall? Is this OSPF or BGP?

                              Firewall: NetGate,Palo Alto-VM,Juniper SRX
                              Routing: Juniper, Arista, Cisco
                              Switching: Juniper, Arista, Cisco
                              Wireless: Unifi, Aruba IAP
                              JNCIP,CCNP Enterprise

                              K 1 Reply Last reply Reply Quote 0
                              • K
                                Kevin S Pare @michmoor
                                last edited by

                                @michmoor

                                You got it. two peers advertising 2 /24's with bgp. Nothing fancy and quite basic.

                                M 1 Reply Last reply Reply Quote 0
                                • M
                                  michmoor LAYER 8 Rebel Alliance @Kevin S Pare
                                  last edited by

                                  @Kevin-S-Pare Yeah pretty basic i agree.
                                  So when you advertise your routes to both peers, what happens? I take it your upstream imports the routes and sends it out to their peers.
                                  What specifically is happening? So say you have Upstream1 and Upstream2. You are advertising your routes to both Upstreams and return traffic comes back on Upstream2 (don't know how you are steering traffic into your AS). What is spotty?

                                  Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                  Routing: Juniper, Arista, Cisco
                                  Switching: Juniper, Arista, Cisco
                                  Wireless: Unifi, Aruba IAP
                                  JNCIP,CCNP Enterprise

                                  K 1 Reply Last reply Reply Quote 0
                                  • K
                                    Kevin S Pare @michmoor
                                    last edited by

                                    @michmoor what ends up happening is traffic is either not going out or not getting back. trace routes show as ok so do ping but when we try to get out to websites only certain ones work. and will work for a period and then the route is lost and we are unable to hit a site again.

                                    I was upgrading from an HP server to a netgate 8200 so we just went back to the old box and all works perfectly fine.

                                    heres a cleansed version of my config.

                                    ##################### DO NOT EDIT THIS FILE! ######################
                                    ###################################################################

                                    This file was created by an automatic configuration generator.

                                    The contents of this file will be overwritten without warning!

                                    ###################################################################
                                    !
                                    frr defaults traditional
                                    hostname hostname
                                    password password
                                    ip nht resolve-via-default
                                    service integrated-vtysh-config
                                    !
                                    router bgp 3
                                    bgp log-neighbor-changes
                                    bgp router-id 192.168.1.2
                                    no bgp network import-check
                                    bgp deterministic-med
                                    bgp always-compare-med
                                    bgp bestpath as-path multipath-relax
                                    neighbor 192.168.1.1 remote-as 1
                                    neighbor 192.168.1.1 description Peer1
                                    neighbor 192.168.1.1 timers 20 60
                                    neighbor 192.168.2.1 remote-as 2
                                    neighbor 192.168.2.1 description Peer2
                                    neighbor 192.168.2.1 timers 20 90
                                    !
                                    address-family ipv4 unicast
                                    network 192.168.10.0/24
                                    network 192.168.11.0/24
                                    neighbor 192.168.1.1 activate
                                    neighbor 192.168.2.1 activate
                                    no neighbor 192.168.1.1 send-community
                                    neighbor 192.168.1.1 next-hop-self
                                    neighbor 192.168.1.1 prefix-list PEER1-IN in
                                    neighbor 192.168.1.1 prefix-list PEER1-OUT out
                                    no neighbor 192.168.2.1 send-community
                                    neighbor 192.168.2.1 next-hop-self
                                    neighbor 192.168.2.1 prefix-list PEER2-IN in
                                    neighbor 192.168.2.1 prefix-list PEER2-OUT out
                                    exit-address-family
                                    !
                                    !
                                    ip prefix-list PEER1-IN seq 10 deny 0.0.0.0/8 le 32
                                    ip prefix-list PEER1-IN seq 20 deny 10.0.0.0/8 le 32
                                    ip prefix-list PEER1-IN seq 30 deny 127.0.0.0/8 le 32
                                    ip prefix-list PEER1-IN seq 40 deny 169.254.0.0/16 le 32
                                    ip prefix-list PEER1-IN seq 50 deny 172.16.0.0/12 le 32
                                    ip prefix-list PEER1-IN seq 60 deny 192.0.0.0/24 le 32
                                    ip prefix-list PEER1-IN seq 70 deny 192.0.2.0/24 le 32
                                    ip prefix-list PEER1-IN seq 80 deny 192.168.0.0/16 le 32
                                    ip prefix-list PEER1-IN seq 90 deny 198.18.0.0/15 le 32
                                    ip prefix-list PEER1-IN seq 100 deny 198.51.100.0/24 le 32
                                    ip prefix-list PEER1-IN seq 110 deny 203.0.113.0/24 le 32
                                    ip prefix-list PEER1-IN seq 120 deny 224.0.0.0/4 le 32
                                    ip prefix-list PEER1-IN seq 130 permit 0.0.0.0/0 le 32
                                    ip prefix-list PEER1-OUT seq 10 permit 192.168.10.0/24
                                    ip prefix-list PEER1-OUT seq 11 permit 192.168.11.0/24
                                    ip prefix-list PEER2-IN seq 10 deny 0.0.0.0/8 le 32
                                    ip prefix-list PEER2-IN seq 20 deny 10.0.0.0/8 le 32
                                    ip prefix-list PEER2-IN seq 30 deny 127.0.0.0/8 le 32
                                    ip prefix-list PEER2-IN seq 40 deny 169.254.0.0/16 le 32
                                    ip prefix-list PEER2-IN seq 50 deny 172.16.0.0/12 le 32
                                    ip prefix-list PEER2-IN seq 60 deny 192.0.0.0/24 le 32
                                    ip prefix-list PEER2-IN seq 70 deny 192.0.2.0/24 le 32
                                    ip prefix-list PEER2-IN seq 80 deny 192.168.0.0/16 le 32
                                    ip prefix-list PEER2-IN seq 90 deny 198.18.0.0/15 le 32
                                    ip prefix-list PEER2-IN seq 100 deny 198.51.100.0/24 le 32
                                    ip prefix-list PEER2-IN seq 110 deny 203.0.113.0/24 le 32
                                    ip prefix-list PEER2-IN seq 120 deny 224.0.0.0/4 le 32
                                    ip prefix-list PEER2-IN seq 130 permit 0.0.0.0/0 le 32
                                    ip prefix-list PEER2-OUT seq 10 permit 192.168.11.0/24
                                    ip prefix-list PEER2-OUT seq 11 permit 192.168.10.0/24
                                    !
                                    route-map ALLOW-ALL permit 100
                                    !
                                    line vty
                                    !

                                    M 1 Reply Last reply Reply Quote 0
                                    • M
                                      michmoor LAYER 8 Rebel Alliance @Kevin S Pare
                                      last edited by michmoor

                                      @Kevin-S-Pare

                                      Nothing offensive in the config.
                                      I don't know why you have bgp always-compare-med and bgp-determinstic-med configured at the same time.. If you are using MED to influence outbound routing then you should pick one option.

                                      Based on the fact that you stated traceroutes and pings work out to the internet than we know that routing is good.
                                      I do know there were behavorial changes to pfsense after 22.05 namely state policy changes.

                                      https://www.netgate.com/blog/state-policy-default-change#:~:text=State%20Policy%20Options&text=As%20pfSense%20software%20is%20security,the%20system%20default%20State%20Policy

                                      I have a sneaky suspicion you are running into this. I can see it happening if traffic leaves Upstream1 and comes back on Upstream2.

                                      If i were you i would change to Floating state policy and perform your tests. It really seems you are hitting this behavior change.

                                      Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                      Routing: Juniper, Arista, Cisco
                                      Switching: Juniper, Arista, Cisco
                                      Wireless: Unifi, Aruba IAP
                                      JNCIP,CCNP Enterprise

                                      K 3 Replies Last reply Reply Quote 1
                                      • K
                                        Kevin S Pare @michmoor
                                        last edited by

                                        @michmoor said in Updating to pfSense+ 24.3 breaks routing - kernel routes now gone:

                                        https://www.netgate.com/blog/state-policy-default-change#:~:text=State%20Policy%20Options&text=As%20pfSense%20software%20is%20security,the%20system%20default%20State%20Policy

                                        Interesting idea. I'll give that a try tonight! theres some pretty solid logic there.

                                        I'll do some more reading on the MED options.

                                        I did my bgp back when I did my ccna, ccnp and ccdp....so kinda brushing off the cobwebs lol

                                        1 Reply Last reply Reply Quote 0
                                        • K
                                          Kevin S Pare @michmoor
                                          last edited by

                                          @michmoor said in Updating to pfSense+ 24.3 breaks routing - kernel routes now gone:

                                          @Kevin-S-Pare

                                          Nothing offensive in the config.
                                          I don't know why you have bgp always-compare-med and bgp-determinstic-med configured at the same time.. If you are using MED to influence outbound routing then you should pick one option.

                                          Based on the fact that you stated traceroutes and pings work out to the internet than we know that routing is good.
                                          I do know there were behavorial changes to pfsense after 22.05 namely state policy changes.

                                          https://www.netgate.com/blog/state-policy-default-change#:~:text=State%20Policy%20Options&text=As%20pfSense%20software%20is%20security,the%20system%20default%20State%20Policy

                                          I have a sneaky suspicion you are running into this. I can see it happening if traffic leaves Upstream1 and comes back on Upstream2.

                                          If i were you i would change to Floating state policy and perform your tests. It really seems you are hitting this behavior change.

                                          for MED and Best path selection, being that these are both internet peers, will either of these options really have any impact on outgoing traffic? We only have one router, and it seems these settings are more for configs with multiple local routers connecting to internet peers?

                                          M 1 Reply Last reply Reply Quote 0
                                          • M
                                            michmoor LAYER 8 Rebel Alliance @Kevin S Pare
                                            last edited by

                                            @Kevin-S-Pare
                                            MED is so far down the BGP path selection, realistically, i would be surprised if its used by the firewall to make a path decision. I have seen it used within an enterprise with multiple colocation sites.

                                            Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                            Routing: Juniper, Arista, Cisco
                                            Switching: Juniper, Arista, Cisco
                                            Wireless: Unifi, Aruba IAP
                                            JNCIP,CCNP Enterprise

                                            K 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.