Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7

    Scheduled Pinned Locked Moved General pfSense Questions
    59 Posts 6 Posters 739 Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S Offline
      stephenw10 Netgate Administrator
      last edited by

      Hmm, OK that's identical to the first crash. Which was also in 2.8.0. Was that actually the same device?

      No there's no known issue with NVMe drives. We use them in our hardware.

      R 1 Reply Last reply Reply Quote 0
      • R Offline
        rfranzke @stephenw10
        last edited by rfranzke

        @stephenw10 said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

        Hmm, OK that's identical to the first crash. Which was also in 2.8.0. Was that actually the same device?

        No there's no known issue with NVMe drives. We use them in our hardware.

        Yes, same code, same device as the first dump post. It's the backup device in the CARP HA pair if that makes any difference. Both devices have the same exact hardware (CPU, MB, Disk, Mem, etc.).

        Does the dump tell you anything specific as to what is happening, or just that it is the same as before. Should we be able to glean anything from this type of dump as to a specific cause, or do they more tell you that a crash occurred. Like a marker that something happened in case you are not around to witness it firsthand. These are sitting right next to me, so I am lucky enough to know when panics happen before even seeing the dump file. Are these dumps not capturing enough data to tell WHY the crash happened? Something I can tweak in the config to get additional info as to the cause?

        Incidentally I got the one panic this morning and tried to get another today but this thing never panicked again. Go figure.

        1 Reply Last reply Reply Quote 0
        • stephenw10S Offline
          stephenw10 Netgate Administrator
          last edited by

          It doesn't mean anything to me. I can see it doesn't have much that's non-generic but I'll run it past some devs tomorrow.

          Next step is either to enable a full core dump to analyse or try running the debug kernel.
          https://docs.netgate.com/pfsense/en/latest/troubleshooting/debug-kernel.html

          Do you have SWAP enabled on those? How big is it?
          To get a full core dump usually requires SWAP at least as large as the RAM to dump it to.

          R 1 Reply Last reply Reply Quote 0
          • R Offline
            rfranzke @stephenw10
            last edited by rfranzke

            @stephenw10 said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

            It doesn't mean anything to me. I can see it doesn't have much that's non-generic but I'll run it past some devs tomorrow.

            Next step is either to enable a full core dump to analyse or try running the debug kernel.
            https://docs.netgate.com/pfsense/en/latest/troubleshooting/debug-kernel.html

            Do you have SWAP enabled on those? How big is it?
            To get a full core dump usually requires SWAP at least as large as the RAM to dump it to.

            I'm not sure how one would go about "enabling swap". I didn't do anything to specifically enable it anywhere that I am aware of. Installed using the installer, imported my config from before. If it's not enabled by default, then I likely don't have it enabled. Both boxes have 64GB of RAM installed in them and 2 TB Nvme drives.

            Thanks for the link. I'll see about loading up the debug kernel to see it reveals anything useful.
            Thanks for checking with the devs here. Again really appreciate the help on this.

            1 Reply Last reply Reply Quote 0
            • stephenw10S Offline
              stephenw10 Netgate Administrator
              last edited by

              It would be enabled by default but probably not at >64GB so dumping a full core to it may or may not be possible depending on how much RAM is actually in use. But first check how much SWAP there is. It's shown on the dashboard.

              1 Reply Last reply Reply Quote 0
              • R Offline
                rfranzke
                last edited by rfranzke

                Looks like maybe SWAP is enabled but only to 1024MB.

                42a91513-244b-40d0-a400-a2f362db1baa-image.png
                e9c93729-dca0-4227-900e-8342f3210d45-image.png

                Seem correct? Dark is primary, light is secondary and the one that's crashed the most.

                1 Reply Last reply Reply Quote 0
                • stephenw10S Offline
                  stephenw10 Netgate Administrator
                  last edited by

                  Hmm. Unfortunately I think you'd almost certainly need to reinstall that with a more SWAP to be able to dump the full core.

                  R 1 Reply Last reply Reply Quote 0
                  • R Offline
                    rfranzke @stephenw10
                    last edited by rfranzke

                    @stephenw10 said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                    Hmm. Unfortunately I think you'd almost certainly need to reinstall that with a more SWAP to be able to dump the full core.

                    Would the running debug kernel you propose avoid the need to reinstall to change swap to get what we would need, or would I need to change swap to get the dumps even with debug kernel in place? There is no way to change the amount of swap without reinstalling?

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Offline
                      stephenw10 Netgate Administrator
                      last edited by

                      If you have any additional backtraces to compare that would be useful. Particularly from 2.8.1 to confirm you get the same thing there repeatedly.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S Offline
                        stephenw10 Netgate Administrator
                        last edited by

                        The debug kernel doesn't require SWAP, so no reinstall, but it may not tell us more. It's worth trying though.

                        R 1 Reply Last reply Reply Quote 0
                        • stephenw10S Offline
                          stephenw10 Netgate Administrator
                          last edited by

                          Mmm, looks like that first crash could be this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=285813

                          R 1 Reply Last reply Reply Quote 0
                          • R Offline
                            rfranzke @stephenw10
                            last edited by rfranzke

                            @stephenw10 said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                            The debug kernel doesn't require SWAP, so no reinstall, but it may not tell us more. It's worth trying though.

                            If having proper swap in place to catch this is the sure-fire way to capture the relevant information needed to determine what this is, I'll work on that. I have the process of reinstalling down pretty good now.....not sure I know how to properly make swap adjustments needed to get this right but am willing to give it a go if it reveals something useful here.

                            Any guidance on what the swap size should be here? It doesn't look like I'm using a ton of memory and I think general FreeBSD guidance is twice the amount of memory in the system. Does that sound like a reasonable set up for this test?

                            1 Reply Last reply Reply Quote 0
                            • R Offline
                              rfranzke @stephenw10
                              last edited by rfranzke

                              @stephenw10 said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                              Mmm, looks like that first crash could be this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=285813

                              Hrmm......would that sort of issue be seen with CARP enabled? And seems that's still an open bug if true. Wonder how soon something like that gets fixed in BSD.

                              Strangely, I cannot get this to crash now. Sine the reinstall and the one panic I posted, these have been rock-solid with no panics. Good and bad.

                              1 Reply Last reply Reply Quote 0
                              • R Offline
                                rfranzke
                                last edited by

                                OK I think I have the swap configured now:

                                fd55a81e-5e9b-4ea4-ba68-a47827904bb2-image.png

                                Anything else required to get this to get the info we need?

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S Offline
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Ok cool. So to enable full core dumps you need to edit the file /etc/pfSense-ddb.conf.

                                  Change the script kdb.enter.default line to:
                                  script kdb.enter.default=bt ; show registers ; dump ; reset

                                  Reboot then check the output of: sysctl debug.ddb.scripting.scripts

                                  Make sure it shows the changed line.

                                  Then you can test it by manually triggering a panic by running: sysctl sysctl debug.kdb.panic=1
                                  You should see the core file after it reboots.

                                  After that just wait for the next crash or somehow trigger it if you can.

                                  R 1 Reply Last reply Reply Quote 0
                                  • R Offline
                                    rfranzke @stephenw10
                                    last edited by rfranzke

                                    @stephenw10 said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                                    Ok cool. So to enable full core dumps you need to edit the file /etc/pfSense-ddb.conf.

                                    Change the script kdb.enter.default line to:
                                    script kdb.enter.default=bt ; show registers ; dump ; reset

                                    Reboot then check the output of: sysctl debug.ddb.scripting.scripts

                                    Make sure it shows the changed line.

                                    Then you can test it by manually triggering a panic by running: sysctl sysctl debug.kdb.panic=1
                                    You should see the core file after it reboots.

                                    After that just wait for the next crash or somehow trigger it if you can.

                                    OK I think I have this done:

                                    # $FreeBSD$
                                    #
                                    # This file is read when going to multi-user and its contents piped thru
                                    # ddb'' to define debugging scripts. \# \# see man 4 ddb'' and ``man 8 ddb'' for details.
                                    #

                                    script lockinfo=show locks; show alllocks; show lockedvnods
                                    script pfs=bt ; show registers ; show pcpu ; run lockinfo ; acttrace ; ps ; alltrace

                                    # kdb.enter.panic panic(9) was called.
                                    # script kdb.enter.default=textdump set; capture on; run pfs ; capture off; textdump dump; reset
                                    script kdb.enter.default=bt ; show registers ; dump ; reset

                                    # kdb.enter.witness witness(4) detected a locking error.
                                    script kdb.enter.witness=run lockinfo

                                    sysctl debug.ddb.scripting.scripts

                                    debug.ddb.scripting.scripts: lockinfo=show locks; show alllocks; show lockedvnods
                                    pfs=bt ; show registers ; show pcpu ; run lockinfo ; acttrace ; ps ; alltrace
                                    kdb.enter.default=bt ; show registers ; dump ; reset
                                    kdb.enter.witness=run lockinfo

                                    I cannot seem to have this thing crash anymore. I'll see if I can mess with it to get it to panic again. Let me know if this setting looks right. Thanks again here for all the help. Really appreciate the time.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S Offline
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Yup that looks good. You can try the forced manual panic just to make sure it create the core file but I'm pretty confident it will.

                                      Otherwise just wait for the next crash.

                                      R 1 Reply Last reply Reply Quote 0
                                      • R Offline
                                        rfranzke @stephenw10
                                        last edited by

                                        @stephenw10 said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                                        You can try the forced manual panic just to make sure it create the core file but I'm pretty confident it will.

                                        Yeah, I forgot to do that. Did it just now and it did restart. Created a file called 'VMCore.0' thats like 2.5GB in size. That sound about right?

                                        1 Reply Last reply Reply Quote 0
                                        • M Offline
                                          Mikesco3
                                          last edited by Mikesco3

                                          I don't know if it helps anyone but I was having a kernel panic issue on the first boot after trying to install 2.8 and in my case it was:

                                          iwm7265Dfw: could not load firmware image, error 6
                                          

                                          I was able to fix it by dropping into the shell of the installer after the installation process and before the final reboot, and adding this line:

                                          hint.iwm.0.disabled="1"
                                          

                                          to the end of /mnt/boot/loader.conf

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S Offline
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            No that's an unrelated bug. This one looks more difficult to fix unfortunately!

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.