CRASH Report: My Netgate 7100 crashes every ten to fifteen days.
-
One of our devs is looking at it. We'll probably need to run some debug code to get more info. Are you able to do that?
Did it just start doing this? After an upgrade?
-
Yes, everything began as soon as I updated to the 23.09.1 version. I've been operating pfsense for at least 12 to 15 years without experiencing any crashes. Up until this point, pfsense has been flawless.
I'm willing to give it a shot with some debugging, so I'll wait to hear from a member of your development team. Should I keep the crash on my system or remove it. I have already downloaded the debugging data files.
-
So, it's not getting any better and it's definitely not going away.
Dumptime: 2024-01-02 16:05:31
Dumptime: 2024-01-07 17:14:39
Dumptime: 2024-01-08 22:11:09
Dumptime: 2024-01-09 07:40:00What do I need to do to elevate this so that I can get it figured out?
-
Are you able to boot the debug kernel?
https://docs.netgate.com/pfsense/en/latest/troubleshooting/debug-kernel.html -
Yes and standing by.
Just got home and my system was completely crashed. Had to press the pwr button to get it to reboot.
-
You already have it loaded and running that?
-
Yes, the debug kernel was installed and I just experienced another crash.
-
OK great, do you have a new crash report from that?
-
-
That isn't the debug kernel, it should show:
---<<BOOT>>--- Copyright (c) 1992-2023 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 14.0-CURRENT amd64 1400094 #1 plus-RELENG_23_09_1-n256200-3de1e293f3a: Wed Dec 6 21:01:42 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1-main/obj/amd64/Obhu6gXB/var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1-main/sources/FreeBSD-src-plus-RELENG_23_09_1/amd64.amd64/sys/pfSense-DEBUG amd64
Did you add the loader value to make it boot the debug kernel every time?
-
No, I didn't at first, but it is now and I haven't experienced a crash since I have been on the debug kernel. It's been running for five days so far without a crash.
-
---<<BOOT>>---
Copyright (c) 1992-2023 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 14.0-CURRENT amd64 1400094 #1 plus-RELENG_23_09_1-n256200-3de1e293f3a: Wed Dec 6 21:01:42 UTC 2023
root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1-main/obj/amd64/Obhu6gXB/var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1-main/sources/FreeBSD-src-plus-RELENG_23_09_1/amd64.amd64/sys/pfSense-DEBUG amd64 -
Ok, cool so we're just waiting for a crash then?
-
It's been 31 days, and I have not experienced a crash since I went to the debug kernal.
-
Hmm, and you would have had a crash before that previously?
I have seen issues where running in debug mode actually changed the timing sufficiently to avoid it.
-
@smithjt Not that I’m saying it’s the case here, But I had a similar issue on my 6100 where the crashes was actually triggered by the MCE (Hardware error) mechanism in the Intel CPU. Support and the Internet told me that error could only come from actual unstable/defective hardware, but that just didn’t sound right to me. My box had been 100% stable up until I upgraded to 23.01 where it dumped anywhere between 7 and 16 days from boot. Booting the 22.05 snapshot boot environement made it completely stable again.
Never got it running on 23.01, and when 23.05 came i decided to start over (reflash from recovery image). Never had the issue again.
My conclusion was that the upgrade to 23.01 or some files on my SSD was somehow botsched, and caused the error. -
Thanks for the info!
It has been forty days and counting since my last crash, so I'm still keeping an eye out for any. I recently reinstalled the "Netgate_Firmware_Upgrade" package a few days ago in an attempt to return to my previous configuration before all of this. I will wait patiently for at least a week or two to see whether that is the reason for the crash.
-
The Netgate Formare package does nothing at all until you actually update the BIOS. If you already have the current version there is no need to have it installed. It's installed by default in Plus though so I'd be surprised if it was giving any problems.
-
I’m having a similar issue. It started on my 7100 after I updated all remote sites to 23.09.1 and it was left on 23.05.1. It crashed 2x since January, and I was able to update it this pas week.
Today I’ve got a unit at a remote site that appears to be doing something similar. Starting to wonder if I need to look at other firewall options, as they have only been in place for about 2 years.
-
You have a crash report?