Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7

    Scheduled Pinned Locked Moved General pfSense Questions
    82 Posts 7 Posters 5.2k Views 7 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N Offline
      netblues @rfranzke
      last edited by

      @rfranzke Its waaaay too difficult to blame faulty installation for random crashes.
      If something like that happens (say, a faulty drive) then crashes are immediate and repeatable.

      The bsd bug that Steven has found is a better candidate.
      Obviously its rare, if it wasn't there would be plenty of reports here about it.

      Now you are able to catch full crash dumps. A debug kernel is the next thing.
      This is deep waters and you know it.

      Give it some time.

      1 Reply Last reply Reply Quote 0
      • B Offline
        bweinel @rfranzke
        last edited by

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • R Offline
          rfranzke
          last edited by rfranzke

          So these have been fairly stable. I finally got one of them to panic this morning, but it was not the FW that I have the SWAP/debug stuff set up on. I set that up on the backup FW, as it was the main one that was having the issue, and of course now it won't panic. I'll add the dump file here but its likely not very useful. Stay tuned.

          textdump.tar (5).0

          1 Reply Last reply Reply Quote 0
          • stephenw10S Offline
            stephenw10 Netgate Administrator
            last edited by

            Yup that's identical to the second crash reported initially. Not much to go on unfortunately.

            1 Reply Last reply Reply Quote 0
            • R Offline
              rfranzke
              last edited by

              Got it!!! Quite a bit larger in size than the others and I can't seem to upload it here due to sizing limit. Somwhere else on here I can upload it?

              So to recap here, just want to speak a bit to my test bed. Using a Cisco 3750 switch as my 'inside' switch carved up into various VLANs to simulate my actual prod setup and to get OSPF routes into the FWs to make sure the routes get populated properly. The inside interfaces of both FWs are in 'VLAN 10'. No tagging is done in the FW configuration. Ports are just using 'switchport access vlan 10' in the switch and no VLANs are configured in the FWs. WAN ports are plugged into just an old Netgear steel case unmanaged switch. Switchports on there are FE for WAN and the LAN ports are gigabit. I don't think any of this matters but did want to include it here in case it does.

              Thanks for looking here and for being patient with this.

              1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by

                Aha, nice. Yup I'd expect it to be large. How big is it?

                You can probably upload it here: https://nc.netgate.com/nextcloud/s/3zcPmr5JE694eDn

                Though I think there is a size limit there.

                R 1 Reply Last reply Reply Quote 0
                • R Offline
                  rfranzke @stephenw10
                  last edited by

                  @stephenw10 Its about 2.6GB. I tried uploading to your link but the thing never sems to complete. I uploaded it to GDrive. Maybe can see it with this link:

                  https://drive.google.com/file/d/1ePOeUzoFD911MFNodwCZLY17gZdTpn6k/view?usp=drive_link

                  Let me know if that doesn't work. Thanks for looking.

                  R 1 Reply Last reply Reply Quote 0
                  • stephenw10S Offline
                    stephenw10 Netgate Administrator
                    last edited by

                    Great I see that. Let's see if it reveals anything...

                    1 Reply Last reply Reply Quote 0
                    • R Offline
                      rfranzke @rfranzke
                      last edited by rfranzke

                      Panic Dump Link

                      So today I got this to happen by restarting both FWs and the switch they connect to all at once. I think mostly when this happens its when I fire these up in the AM at which time everything gets started all together. After one of these things panics they seem to be pretty stable. I guess its somewhat rare anymore this happens just randomly throughout the day. Used to but not much anymore. Maybe this is something to do with the switch starting up doing something at startup. Guessing again but if true this might not be seen in my prod environment. But good to know whats going on here. Thanks again for looking.

                      stephenw10S 1 Reply Last reply Reply Quote 0
                      • stephenw10S Offline
                        stephenw10 Netgate Administrator @rfranzke
                        last edited by

                        @rfranzke said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                        Panic Dump Link

                        That's the same core? The link looks the same.

                        R 1 Reply Last reply Reply Quote 0
                        • R Offline
                          rfranzke @stephenw10
                          last edited by

                          @stephenw10 Its the same dump. I just made it into an actual link. I couldn't edit original post.

                          1 Reply Last reply Reply Quote 1
                          • stephenw10S Offline
                            stephenw10 Netgate Administrator
                            last edited by

                            Ok the core dump yielded useful info but not enough to solve it unfortunately.

                            Are you able to load a debug kernel on this and get a core with that running?

                            If so grab the pkg here. Install it. Then:
                            https://docs.netgate.com/pfsense/en/latest/troubleshooting/debug-kernel.html#booting-the-debug-kernel

                            R 1 Reply Last reply Reply Quote 0
                            • R Offline
                              rfranzke @stephenw10
                              last edited by

                              @stephenw10 Yes I can install this and run it but not sure the proper way to actually install the pkg file. Enable SSH, copy the file over, and then run it or is there some other method to do this via WebUI?

                              K 1 Reply Last reply Reply Quote 0
                              • K Offline
                                kprovost @rfranzke
                                last edited by

                                @rfranzke pkg install -U pfSense-kernel-debug-pfSense-2.8.0.b.20250814.0928.pkg

                                R 1 Reply Last reply Reply Quote 0
                                • R Offline
                                  rfranzke @kprovost
                                  last edited by rfranzke

                                  @kprovost So I think I have this loaded. I just enabled SSH on the box, connected to it, and transferred the download debug kernel file. Then ran the installer:

                                  [2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: pkg install -y pfSense-kernel-debug-pfSense-2.8.0.b.20250814.0928.pkg
                                  Updating pfSense-core repository catalogue...
                                  Fetching meta.conf: 0%
                                  Fetching data.pkg: 0%
                                  pfSense-core repository is up to date.
                                  Updating pfSense repository catalogue...
                                  Fetching meta.conf: 0%
                                  Fetching data.pkg: 0%
                                  pfSense repository is up to date.
                                  All repositories are up to date.
                                  Checking integrity... done (0 conflicting)
                                  The following 1 package(s) will be affected (of 0 checked):

                                  New packages to be INSTALLED:
                                  pfSense-kernel-debug-pfSense: 2.8.0.b.20250814.0928 [unknown-repository]

                                  Number of packages to be installed: 1

                                  The process will require 254 MiB more space.
                                  [1/1] Installing pfSense-kernel-debug-pfSense-2.8.0.b.20250814.0928...
                                  Extracting pfSense-kernel-debug-pfSense-2.8.0.b.20250814.0928: 100% 193 B 0.1kB/s 00:02
                                  [2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: reboot

                                  Then added this to the /boot/loader.conf.local file I created:

                                  kernel="kernel.debug"

                                  so it boots this kernel all the time. I made the file change before the reboot. Box booted up and I can access it still so assuming its running the debug kernel now. Slick way to tell if its running that new kernel?

                                  Edit: Maybe this tells us what we want to know?

                                  [2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: uname -a
                                  FreeBSD fw2.mdaemon.int 15.0-CURRENT FreeBSD 15.0-CURRENT #1 RELENG_2_8_0-n256082-57273ac5fb19-dirty: Thu Aug 14 09:32:59 CEST 2025 root@nut:/usr/home/kp/netgate/crossbuild-2.8.0/obj/amd64/OdB78hjz/usr/home/kp/netgate/crossbuild-2.8.0/sources/FreeBSD-src-RELENG_2_8_0/amd64.amd64/sys/pfSense-DEBUG amd64
                                  [2.8.0-RELEASE][admin@fw2.mdaemon.int]/root:

                                  K 1 Reply Last reply Reply Quote 0
                                  • K Offline
                                    kprovost @rfranzke
                                    last edited by

                                    @rfranzke said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                                    [2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: uname -a
                                    FreeBSD fw2.mdaemon.int 15.0-CURRENT FreeBSD 15.0-CURRENT #1 RELENG_2_8_0-n256082-57273ac5fb19-dirty: Thu Aug 14 09:32:59 CEST 2025 root@nut:/usr/home/kp/netgate/crossbuild-2.8.0/obj/amd64/OdB78hjz/usr/home/kp/netgate/crossbuild-2.8.0/sources/FreeBSD-src-RELENG_2_8_0/amd64.amd64/sys/pfSense-DEBUG amd64

                                    Yes, that's the kernel we want to be running now.
                                    It has one extra assertion on top of the default debug kernel. Hopefully that will give us more clues about why the panic happens.

                                    1 Reply Last reply Reply Quote 1
                                    • R Offline
                                      rfranzke
                                      last edited by rfranzke

                                      So got the box to crash again but no dump file was created at all this time:

                                      [2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: cd /var/crash
                                      [2.8.0-RELEASE][admin@fw2.mdaemon.int]/var/crash: ls
                                      [2.8.0-RELEASE][admin@fw2.mdaemon.int]/var/crash:

                                      No file. Any reason this debug version wouldn't create a debug file on panic? Should I have to reconfigure the debug configs as a result of loading the new kernel?

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S Offline
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        Hmm, does sysctl debug.ddb.scripting.scripts still show the modified script?

                                        R 1 Reply Last reply Reply Quote 0
                                        • R Offline
                                          rfranzke @stephenw10
                                          last edited by rfranzke

                                          @stephenw10 I believe so. Here in case I am missing something:

                                          [2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: sysctl debug.ddb.scripting.scripts
                                          debug.ddb.scripting.scripts: lockinfo=show locks; show alllocks; show lockedvnods
                                          pfs=bt ; show registers ; show pcpu ; run lockinfo ; acttrace ; ps ; alltrace
                                          kdb.enter.default=bt ; show registers ; dump ; reset
                                          kdb.enter.witness=run lockinfo

                                          [2.8.0-RELEASE][admin@fw2.mdaemon.int]/root:

                                          I forced a manual panic to test it again and it created a dump file as one would expect, so not sure what happened here:

                                          [2.8.0-RELEASE][admin@fw2.mdaemon.int]/var/crash: ls
                                          bounds info.0 info.last vmcore.0 vmcore.last

                                          Let me know if something is missing from above. Thanks for looking.

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S Offline
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            Huh that's odd. Did it actually panic before with the arp crash? Did it reboot automatically afterwards?

                                            Might need a new script line there if it's doing some special. In which case we might need wisdom from @kprovost 😉

                                            K R 2 Replies Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.