Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Upgrade from 23.09.1 to 24.03 Completes Successfully, But NIC Will No Longer Pass Traffic

    Scheduled Pinned Locked Moved General pfSense Questions
    20 Posts 2 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      jsylvia007
      last edited by jsylvia007

      Howdy! First - THANK GOD FOR BOOT ENVIRONMENTS!!

      My system is a whitebox (SuperMicro) running pfSense Plus (initially 23.09.1), and has been rock stable for years (on this and previous releases). I finally got around to upgrade to 24.03 this afternoon, and the update finished just fine. I was using an IPMI connection to monitor the update/reboot and after it came up, everything looked just fine, BUT, I couldn't ping my LAN address, and none of my devices could actually access the internet over that interface. Other systems on other interfaces/VLANs were fine, but my LAN is a 25G connection between a Unifi Switch and the SuperMicro pfSense, and while the link was up, and everything LOOKED ok, no traffic would pass.

      I left it that way for about 30 minutes, figuring it was just all my packages reinstalling, etc. When it still didn't work, I rebooted from the console, and I still couldn't ping the LAN interface, even though the command line says it was UP. pfSense also couldn't ping OUT from that interface either.

      I reverted to 23.09.1, and interestingly enough, it STILL didn't come back. Thinking I was going crazy, i rebooted again... And then everything came up fine... Investigation started...

      I let it stabilize for another 30 min, working perfectly fine. Rebooted into the 24.03 Boot Environment, successful boot, waited for another 10-15 min... Still no LAN interface. Reboot again into 24.03, wait 10-15... No traffic.

      Rebooted into 23.09.1, booted successfully, no traffic on LAN... Wait 5 min, reboot again, into 23.09.1, traffic is back to normal...

      I did this whole iteration twice to see if it was repeatable. It is.

      So... Something is messing with my interface in 24.03 such that it requires 2 reboots in 23.09.1 to resolve the problem.

      So I'm back in 23.09.1, hoping someone here knows what MIGHT be going on.

      Thanks!!

      Device Specifics:

      System: Supermicro SYS-E300-8D
      

      Problem NIC:

      ixl0@pci0:7:0:0:        class=0x020000 rev=0x02 hdr=0x00 vendor=0x8086 device=0x158b subvendor=0x8086 subdevice=0x0002
          vendor     = 'Intel Corporation'
          device     = 'Ethernet Controller XXV710 for 25GbE SFP28'
          class      = network
          subclass   = ethernet
          bar   [10] = type Prefetchable Memory, range 64, base 0xf7000000, size 16777216, enabled
          bar   [1c] = type Prefetchable Memory, range 64, base 0xf8808000, size 32768, enabled
          cap 01[40] = powerspec 3  supports D0 D3  current D0
          cap 05[50] = MSI supports 1 message, 64 bit, vector masks
          cap 11[70] = MSI-X supports 129 messages, enabled
                       Table in map 0x1c[0x0], PBA in map 0x1c[0x1000]
          cap 10[a0] = PCI-Express 2 endpoint max data 256(2048) FLR
                       max read 4096
                       link x8(x8) speed 8.0(8.0) ASPM disabled(L1)
          ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
          ecap 0003[140] = Serial 1 103babfffffefd3c
          ecap 000e[150] = ARI 1
          ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                           0 VFs configured out of 64 supported
                           First VF RID Offset 0x0110, VF RID Stride 0x0001
                           VF Device ID 0x154c
                           Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
          ecap 0017[1a0] = TPH Requester 1
          ecap 000d[1b0] = ACS 1 Source Validation unavailable, Translation Blocking unavailable
                           P2P Req Redirect unavailable, P2P Cmpl Redirect unavailable
                           P2P Upstream Forwarding unavailable, P2P Egress Control unavailable
                           P2P Direct Translated unavailable, Enhanced Capability unavailable
          ecap 0019[1d0] = PCIe Sec 1 lane errors 0
        PCI-e errors = Correctable Error Detected
                       Unsupported Request Detected
      

      Edit: Sorry!! I thought I was on the General Forum! Please move post if necessary.

      1 Reply Last reply Reply Quote 0
      • stephenw10S stephenw10 moved this topic from Problems Installing or Upgrading pfSense Software on
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Do you see a difference in the output of: ifconfig -vvvm ixl0 between 23.09.1 and 24.03?

        Is it even showing as linked in 24.03?

        Steve

        J 1 Reply Last reply Reply Quote 0
        • J
          jsylvia007 @stephenw10
          last edited by

          @stephenw10 -- I won't be able to perform the reboot-two-step until later tonight, but I will get outputs of both.

          For now, here is what it says on 23.09.1:

          ixl0: flags=1008b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
                  description: LAN
                  options=48100b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,HWSTATS,MEXTPG>
                  capabilities=4f507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
                  ether 3c:fd:fe:ab:3b:10
                  inet REDACTED
                  inet6 REDACTED
                  media: Ethernet autoselect (25GBase-CR <full-duplex>)
                  status: active
                  supported media:
                          media autoselect
                          media 25GBase-LR
                          media 25GBase-SR
                          media 25GBase-CR
                          media 10GBase-KR
                          media 1000Base-KX
                          media 10Gbase-LR
                          media 10Gbase-SR
                          media 10Gbase-Twinax
                          media 1000baseLX
                          media 1000baseSX
                  nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
          

          Note, I don't THINK the media would matter, but this is a DAC connection, not Fiber or Copper. I did run a similar permutation of the ifconfig command and I do remember it saying status: active.

          I will reboot tonight to get the exact output for 24.03.

          1 Reply Last reply Reply Quote 1
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Hmm, I agree I wouldn't expect the actual link media reported to matter as long as the speed is correct.

            1 Reply Last reply Reply Quote 0
            • J
              jsylvia007
              last edited by

              H'okay... Got a chance to reboot and test. Here is the screenshot (the only way I could get the information because I didn't have SSH and had to use IPMI).

              Screenshot 2024-04-30 202837.png

              Interesting here, BUT, the inet line is completely missing... Yet the main page says that the IP is assigned. I also selected the command line option to assign an IP to the interface to re-assign the LAN IP, and it made no difference. The output of the command was identical.

              I then manually tried to add an IP to the interface, and it didn't like that either:

              Screenshot 2024-04-30 204000.png

              Took 2 reboots back into 23.09.1 to get the interface back. Interestingly enough, that FIRST reboot back, the output of the command is identical to what is shown in the above screenshot...

              Weird.

              Another thing I noticed, somehow my ntop-ng got hosed and even in the old boot environment it won't work. This is a different issue, so I will wait to fix that one until we get an idea on what might be going on here.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, curious. I assume specifying to subnet using CIDR notation also fails?

                What firmware version does that NIC have? It could be the newer driver trying to use some API update perhaps.

                sysctl dev.ixl.0.fw_version

                J 1 Reply Last reply Reply Quote 0
                • J
                  jsylvia007 @stephenw10
                  last edited by jsylvia007

                  @stephenw10 -- Interesting point... I never upgraded the firmware on this NIC. I've had real bad luck with NIC firmwares on some Intel Atom chips, so I avoided it.

                  I've never updated firmware on BSD... Guess I could crack the case open and see where the NIC came from to get updated firmware.

                  Output from the command:

                  dev.ixl.0.fw_version: fw 5.50.47059 api 1.5 nvm 5.51 etid 80002bca oem 1.262.0
                  

                  Edit: Apparently I got the card on eBay... This is the card:

                  Intel XXV710-DA2 25GbE Dual-Port Ethernet Network Adapter XXV710DA2BLK
                  
                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Oh that is an old firmware version. Importantly an old API version. I would try upgrading it. You'll probably have to do that from Windows or Linux though.

                    J 2 Replies Last reply Reply Quote 0
                    • J
                      jsylvia007 @stephenw10
                      last edited by

                      @stephenw10 - H'okay. Believe it or not they have a BSD & EFI version for the latest firmware. I'm going to try the EFI version tonight... I will let you know how it goes.

                      1 Reply Last reply Reply Quote 1
                      • J
                        jsylvia007 @stephenw10
                        last edited by

                        @stephenw10 - Ok... So... Bear with me... I just spent about 6 straight hours troubleshooting and didn't really accomplish much.

                        Upgraded the firmware on the card, that was a breeze. Took about 15 min through UEFI. Here is the new firmware information:

                        dev.ixl.0.fw_version: fw 9.140.76856 api 1.15 nvm 9.40 etid 8000ed12 oem 1.269.0
                        

                        Appears to be much newer, and BONUS, it STILL works with 23.09.1.

                        Long story short, SAME exact symptoms with 24.03.

                        So, I decided to factory reset the configuration. After the reboot, I manually reassigned the interfaces to be the correct ones for at least my LAN and WAN, manually set the IP address for the LAN and.... NOTHING. I performed a reboot just for giggles, and, wouldn't you know it, it WORKED. And it was repeatable. 3 reboots later and I was confident that it was 'stable'.

                        I took my backup config (downloaded from the working 23.09.1), loaded it on the GUI, and it... kinda worked after a reboot. The LAN interface came back, and all the other settings came back, but I got an effort for EVERY package that basically said, "Package ABCDEF does not exist in current Netgate pfSense Plus version and it has been removed.", for all 22 of my packages.

                        I rebooted a couple times, and it again seemed 'stable' on the LAN interface.

                        I started adding my packages, they all came up and worked no problem... Then I rebooted again... LAN interface was dead again. Reboot 3 more times... still dead. Reboot into the 23.09.1 boot environment, everything is hunky dory again.

                        So... Maybe it's related to SOMETHING in my configuration related to the packages I have installed?

                        Here is the list of all 22 packages that I use:

                        mailreport
                        iperf
                        nmap
                        mtr-nox11
                        openvpn-client-export
                        acme
                        bandwidthd
                        Cron
                        Status_Traffic_Totals
                        syslog-ng
                        Service_Watchdog
                        System_Patches
                        avahi-daemon
                        arpwatch
                        pimd
                        pfBlockerNG
                        zabbix-agent64
                        nut
                        WireGuard
                        suricata
                        ntopng
                        

                        If I had to make a guess, I would suspect that MAYBE it has something to do with either bandwidthd, Status_Traffic_Control, avahi-daemon, or pimd because I believe those actually have the ability to muck with the interfaces at a more substantial level than the rest of the packages. I'm reasonably certain that I never actually got media casting across VLANs to work successfully. so I think I can ditch avahi-daemon and pimd. Might be able to nix the others too, but I'm not actually sure if that's really needed.

                        I'm willing to share my config privately with support if you think there's something in there that might help.

                        Back on 23.09.1 for now...

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Ah, well some progress at least. And always good to prove a theory incorrect. New firmware doesn't hurt also good to know.

                          I'd guess bandwidthd or, more likely, Suricata if it's running in in-line mode which uses the NIC in netmap mode and can break everything!

                          J 2 Replies Last reply Reply Quote 0
                          • J
                            jsylvia007 @stephenw10
                            last edited by

                            @stephenw10 said in Upgrade from 23.09.1 to 24.03 Completes Successfully, But NIC Will No Longer Pass Traffic:

                            I'd guess bandwidthd or, more likely, Suricata if it's running in in-line mode which uses the NIC in netmap mode and can break everything!

                            Good to know. My suricata is IDS only, so it shouldn't be mucking with the interface. Tonight I'm hoping to go through this again, reload my config (hoping that it also 'fails' to load the packages), and then I will install one and reboot, rinse and repeat until I find the cranky package.

                            1 Reply Last reply Reply Quote 1
                            • J
                              jsylvia007 @stephenw10
                              last edited by

                              @stephenw10 - Ok... So, I'm at a loss. It HAS to be something with my config, but it's somewhat complex, and I really don't want to create everything by hand.

                              I reset 24.03 back to factory defaults, configured WAN and LAN, set the IPs, rebooted (working). Rebooted again (working)...

                              I installed the acme, zabbix, and Wireguard packages... Really low impact, right, and should be completely unrelated to the LAN interface. Install works, reboot... Dead. Reboot. Still dead.

                              Back to 23.09 I go...

                              I'm not above getting another NIC with another chipset entirely to try it, BUT this SHOULD work without an issue, and swapping out a NIC is going to kill my Netgate ID, which will kill my paid plus subscription, and to be honest, that whole implementation seems flakey to me, so I don't want to introduce yet another wrinkle.

                              Kinda at a loss... Really want to upgrade, but I now have NO idea what it could be, without manually recreating my config (consisting of almost a dozen interfaces, 6 VLANs, countless rules, and a ton of Suricata & pfBlocker-NG configurations). That would take a SIGNIFICANT amount of time to re-create and the risk of screwing something up in the details is REALLY a possibility.

                              Thoughts? I mean... This should work. So what else can I do?

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Hmm well of those 3 I'd have to suspect Wireguard. That can at least add an interface. Zabbix and ACME really could not prevent traffic.

                                J 1 Reply Last reply Reply Quote 0
                                • J
                                  jsylvia007 @stephenw10
                                  last edited by

                                  @stephenw10 - I will try just wire guard and see what happens. It worked on one of my previous attempts and reboots. So I figured it was safe.

                                  It still leaves me in a pretty crappy situation. I can't swap hardware, because I lose my Plus (different MAC), I can't actually upgrade because, well, it doesn't work.

                                  Anyone else there at Netgate have any ideas? This one happens to be my main router in my home lab, so it's kinda the lynchpin in everything. I DEFINITELY need wire guard to work.

                                  I guess I can wait until there's another release, but that leaves me in 23.09.1 for a long time without any security enhancements.

                                  I really think it might be something latent in my config. Is there anyone at Netgate who would take a look at the XML? Perhaps there's something I'm not seeing? Maybe you guys have better debug tools?

                                  I'll try to do more testing tomorrow...

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    It does seem like something in you config I agree. If it's not some package putting the NIC in an odd mode it could be a system tunable you have added.

                                    Are you able to upload the config for us to review here: https://nc.netgate.com/nextcloud/s/fcTw2Dy3FKD7bCK

                                    Steve

                                    J 2 Replies Last reply Reply Quote 0
                                    • J
                                      jsylvia007 @stephenw10
                                      last edited by

                                      @stephenw10 - Config uploaded.

                                      Note, specifically about tunables. I've never actually added any, and there are likely some in there from considerably different hardware, IF, that stuff carries forward. I'm not sure what should be there from default, or how to "safely" reset them back to "default", but I'd definitely be willing to try that too.

                                      1 Reply Last reply Reply Quote 0
                                      • J
                                        jsylvia007 @stephenw10
                                        last edited by jsylvia007

                                        @stephenw10 - HOLY CRAP I think I figured it out. Performing more testing. Will know in a few more reboots once I get the rest of the packages installed.

                                        It looks like it WAS wireguard in a, "this should never have worked" type of scenario...

                                        Will edit post shortly...

                                        Edit: YES!

                                        The issue was with a WireGuard Gateway Monitor IP. It just so happens that the LAN IP of my router and the LAN IP of the router on the other side of the WG Gateway are flipped, (think 192.168.1.1 and 192.1.168.1). Apparently, 23.09.1 didn't care that I had the LAN IP entered in there and was happy to just status something that was always up... 24.03 was none-too-happy with that config, however, and broke the LAN interface because of it.

                                        Troubleshooting:
                                        I enabled access to the Webconfigurator through another interface so that I could actually see what was going on, and noticed that there was an issue with that ONE WireGuard gateway and when I looked why, I saw it immediately.

                                        Problem. SOLVED. Awesome news on a Friday night, and dare I say it, this one was kinda fun!

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Wow nice catch! Interesting that worked in 23.09.1. Hmm. 🤔

                                          J 1 Reply Last reply Reply Quote 1
                                          • J
                                            jsylvia007 @stephenw10
                                            last edited by

                                            @stephenw10 - Right?

                                            Thanks for all the help!!

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.