Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7
-
@stephenw10 Its the same dump. I just made it into an actual link. I couldn't edit original post.
-
Ok the core dump yielded useful info but not enough to solve it unfortunately.
Are you able to load a debug kernel on this and get a core with that running?
If so grab the pkg here. Install it. Then:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/debug-kernel.html#booting-the-debug-kernel -
@stephenw10 Yes I can install this and run it but not sure the proper way to actually install the pkg file. Enable SSH, copy the file over, and then run it or is there some other method to do this via WebUI?
-
@rfranzke
pkg install -U pfSense-kernel-debug-pfSense-2.8.0.b.20250814.0928.pkg
-
@kprovost So I think I have this loaded. I just enabled SSH on the box, connected to it, and transferred the download debug kernel file. Then ran the installer:
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: pkg install -y pfSense-kernel-debug-pfSense-2.8.0.b.20250814.0928.pkg
Updating pfSense-core repository catalogue...
Fetching meta.conf: 0%
Fetching data.pkg: 0%
pfSense-core repository is up to date.
Updating pfSense repository catalogue...
Fetching meta.conf: 0%
Fetching data.pkg: 0%
pfSense repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The following 1 package(s) will be affected (of 0 checked):New packages to be INSTALLED:
pfSense-kernel-debug-pfSense: 2.8.0.b.20250814.0928 [unknown-repository]Number of packages to be installed: 1
The process will require 254 MiB more space.
[1/1] Installing pfSense-kernel-debug-pfSense-2.8.0.b.20250814.0928...
Extracting pfSense-kernel-debug-pfSense-2.8.0.b.20250814.0928: 100% 193 B 0.1kB/s 00:02
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: rebootThen added this to the /boot/loader.conf.local file I created:
kernel="kernel.debug"
so it boots this kernel all the time. I made the file change before the reboot. Box booted up and I can access it still so assuming its running the debug kernel now. Slick way to tell if its running that new kernel?
Edit: Maybe this tells us what we want to know?
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: uname -a
FreeBSD fw2.mdaemon.int 15.0-CURRENT FreeBSD 15.0-CURRENT #1 RELENG_2_8_0-n256082-57273ac5fb19-dirty: Thu Aug 14 09:32:59 CEST 2025 root@nut:/usr/home/kp/netgate/crossbuild-2.8.0/obj/amd64/OdB78hjz/usr/home/kp/netgate/crossbuild-2.8.0/sources/FreeBSD-src-RELENG_2_8_0/amd64.amd64/sys/pfSense-DEBUG amd64
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: -
@rfranzke said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: uname -a
FreeBSD fw2.mdaemon.int 15.0-CURRENT FreeBSD 15.0-CURRENT #1 RELENG_2_8_0-n256082-57273ac5fb19-dirty: Thu Aug 14 09:32:59 CEST 2025 root@nut:/usr/home/kp/netgate/crossbuild-2.8.0/obj/amd64/OdB78hjz/usr/home/kp/netgate/crossbuild-2.8.0/sources/FreeBSD-src-RELENG_2_8_0/amd64.amd64/sys/pfSense-DEBUG amd64Yes, that's the kernel we want to be running now.
It has one extra assertion on top of the default debug kernel. Hopefully that will give us more clues about why the panic happens. -
So got the box to crash again but no dump file was created at all this time:
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: cd /var/crash
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/var/crash: ls
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/var/crash:No file. Any reason this debug version wouldn't create a debug file on panic? Should I have to reconfigure the debug configs as a result of loading the new kernel?
-
Hmm, does
sysctl debug.ddb.scripting.scripts
still show the modified script? -
@stephenw10 I believe so. Here in case I am missing something:
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: sysctl debug.ddb.scripting.scripts
debug.ddb.scripting.scripts: lockinfo=show locks; show alllocks; show lockedvnods
pfs=bt ; show registers ; show pcpu ; run lockinfo ; acttrace ; ps ; alltrace
kdb.enter.default=bt ; show registers ; dump ; reset
kdb.enter.witness=run lockinfo[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root:
I forced a manual panic to test it again and it created a dump file as one would expect, so not sure what happened here:
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/var/crash: ls
bounds info.0 info.last vmcore.0 vmcore.lastLet me know if something is missing from above. Thanks for looking.
-
Huh that's odd. Did it actually panic before with the arp crash? Did it reboot automatically afterwards?
Might need a new script line there if it's doing some special. In which case we might need wisdom from @kprovost
-
@stephenw10 Off the top of my head I don't know of any reason why a panic wouldn't leave a core dump.
What makes you say the box crashed if there's no core dump? Did it reboot? Is there anything in the logs?
If it did crash it really should have left a core dump. The only thing that I can think of that might (and I stress might, I've not confirmed this) cause it to not produce a core dump is if the dump area is too small. How much memory does the machine have, and how big is the swap?
-
@stephenw10 I heard the fans spin up like it does and checked if I could access the device on the network. I couldn't so it definitely seemed to crash. Trying to get this to crash on its own now. The box definitely restarted for some reason. I assumed it was due to the same crash that I've been chasing. I suppose its possible that it just restarted for some reason, but I would say fairly unlikely that would be the case. Plenty of swap space available:
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: swapinfo -h
Device Size Used Avail Capacity
/dev/gpt/swap1 128G 0B 128G 0%
[2.8.0-RELEASE][admin@fw2.mdaemon.int]/root: -
Hmm. I guess try and trigger it again and see if you can confirm it.
-
At that point it might prove helpful to configure a serial console output and hook up the serial port to another pc running some terminal software with a big scroll back buffer.
It might give more insight than fans spinning at boot.p.s. Strange as it is, I do suspect power issues, its a good way to crash a system without leaving ANY traces apart from the new boot logs.
-
So unfortunately, I have been unable to get this to fail......so I guess it's not as frequent as it was. I am out of time with this, so I am gonna have to deploy these.
At this point I am considering getting the supported version and migrate to PFSense Plus. Are there any stability benefits with doing this? Any chance this issue might be fully resolved by simply migrating to the PFSense Plus code? Is the migration process fairly reliable? Anything I need be mindful of? And do I need a support license for each device in an HA pair? Thanks for all the help with this. Sorry I couldn't get this to happen again.
-
2.8.1 and 25.07.1 are built on the same base so I would expect this to happen identically in Plus. There are 25.11 dev snapshots available in Plus you could test but they are intended for development only.
Yes you would need a Plus subscription for both nodes in the HA pair.
-
So, The backup FW I have the debug settings set on 'crashed' yesterday finally but again did not create any sort of crash dump, at least not one seen in the webGUI or no alert like it normally does stating that a crash has occurred and dump file has been created. I heard the box's fans spin up like they do when it restarts and checked after it booted the uptime on the home page of the device:
So not sure.
Additionally, I don't want to shift gears too much here but with regards to the idea of moving to PFSense Plus, I have a few queries I am trying to understand with regards to how this works.
- Can I keep the CE edition setup I have now and just buy support for that version or do I need to actually migrate to using the Plus version of the software to get support. Can I just buy support and just not migrate?
- Are there any configuration settings I risk losing by migrating. Right now, I am using OpenVPN, FreeRadius, FRR OSPF, OpenVPN Export utility, PFBlockerNG and obvious HA setup. Assuming if I have to migrate, the configuration would all come over OK between CE and Plus.
- Once migrated I would not be able to use my current config backups to restore to the migrated Plus version if needed.
Really like where my config is for this deployment next week so a little skittish about making this sort of change now so close to deployment date. I have a ton of users lined up to begin using OpenVPN so really need this to go smooth. Support availability would be great for this as I am an obvious noob but breaking this now would be rough to get back from.
Thanks for any help here. I reached out to sales for all this but have not heard back yet.
-
@rfranzke
1Yes, you can just register ce and it becomes plus
2.You will loose nothing. Just more options e.g in openvpn
3. You can use ce backups if neededThis wont solve your reboot issue, but it doesnt seem to crash either.
Could it be a faulty pdu? -
You can import a CE config into Plus and won't lose anything. Your current config backups would still be valid.
But I'm not sure support would be able to help much more with an issue like this. They would be doing the same things we are already doing.
Does it still create a core as expected if you force a panic with the sysctl?
Spontaneously rebooting like that without generating a crash report is usually a indication of a hardware issue. Though it can be some hardware issue that's triggered by a software change.
-
Yeah the support idea is more for my own inexperience with PFSense not so much for this particular issue. I have to deploy these next week and wanted some back up in case what I have done doesn't work for some reason. Its not an indictment of what we have done for this particular issue. Just not convinced I know what I am doing with this product to go full on deployment without some help. I think I have a handle on what I am doing with it but if I get this in place and some element does not work I'll need to get it sorted quickly. I'm not sure what to think on this crashing issue. It could be a hardware issue I guess, and this device is coincidentally restarting for other reasons just after we made changes to the panic capture settings. Just seems suspect that the dumps have disappeared after our changes and does not change the fact that it WAS creating dumps each time this thing would restart before we made changes. I am afraid maybe I didn't do something correct here somehow to break the dumps.