pFsense + 22.05 keeps crashing
-
Hmm the backtrace shows almost nothing:
db:0:kdb.enter.default> show pcpu cpuid = 2 dynamic pcpu = 0xfffffe007e30b1c0 curthread = 0xfffff8000526b000: pid 11 tid 100005 "idle: cpu2" curpcb = 0xfffff8000526b5a0 fpcurthread = none idlethread = 0xfffff8000526b000: tid 100005 "idle: cpu2" curpmap = 0xffffffff8368f728 tssp = 0xffffffff837198f0 commontssp = 0xffffffff837198f0 rsp0 = 0xfffffe0025698cc0 kcr3 = 0x80000000040b1002 ucr3 = 0xffffffffffffffff scr3 = 0xc763ef6e gs32p = 0xffffffff83720108 ldt = 0xffffffff83720148 tss = 0xffffffff83720138 tlb gen = 566107 curvnet = 0 db:0:kdb.enter.default> bt Tracing pid 11 tid 100005 td 0xfffff8000526b000 acpi_cpu_idle_mwait() at acpi_cpu_idle_mwait+0x68/frame 0xfffffe0025698a70 acpi_cpu_idle() at acpi_cpu_idle+0x186/frame 0xfffffe0025698ab0 cpu_idle_acpi() at cpu_idle_acpi+0x3e/frame 0xfffffe0025698ad0 cpu_idle() at cpu_idle+0x9f/frame 0xfffffe0025698af0 sched_idletd() at sched_idletd+0x326/frame 0xfffffe0025698bb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe0025698bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0025698bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Something in the message buffer though:
<2>NMI ISA 38, EISA 0 NMI/cpu2 ... going to debugger
NMI errors are usually some hardware failure or incompatibility.
Did you enable anything that seemed to trigger those errors? Upgrade?It looks like you're still running 22.01. Any reason you're not running 22.05?
Steve
-
@stephenw10 Thanks for reply.
I updated to 22.05 and ran good for a couple of days. Last night it crashes again.
I'm the office so I don't have access to the new crash reports.
I have let the system with the minimum configuration possible.I will upload a fresh report as soon as I can.
Thanks again. I'll keep you posted...
-
@stephenw10 said in pFsense + 22.05 keeps crashing:
still running 22.01. Any reason you're not running 22.05?
Maybe that is the issue.
@geekypr thinks he's running 22.05 but pfSense thinks otherwise.
Or a failed upgrade ?NMI (non maskable interrupts) that are not handled / intercepted by the OS will bring the system to a halt.
A couple of days ago there was a post about NMI and FreeBSD was telling it was related to a failing RAM, a ram parity error, but the system wasn't even equipped with that kind of RAM.Start by entering the BIOS, and disable as much as possible.
No more sound, LED ventilators, and non pfSense related gadgets should be activated. Go for a bare minimum. -
@gertjan I noticed my error on the subject. I did upgraded to 22.05 once I noticed.
Maybe you're right, I will start to disable all the junk on the BIOS.Thanks for the tip...
-
@geekypr Here's an update;
Following advise from Gertjan, I reviewed the BIOS settings and found two things that not really needed; virtualization and hyper-threading. Those where enabled (changed to disable).
Also, turbo-boost feature is enabled, but I leave it as is for now.The system has been running OK for 4 days straight with no issues so far, at least no crashes nor unexpected reboots.
I appreciate the help and tips and will keep post any progress in a couple of days.
Thanks!
-
@geekypr I know that motherboard has 3 NICs... 2 good ones and 1 Realtek. If you're utilizing the Realtek NIC for anything you might need to search forums here for that specific fix action.
-
@skogs Thanks, that third NIC is for IPMI. Not using it.
It worked for a week or two, today it started to crash again.Attached is the latest dump file;
textdump.tarMaybe I have a bad memory module. I just removed one (have 4GB x2) and will monitor behavior.
I hope there's something in the dump that can be find to resolve this issue.
Again, thank you for the help here...
-
Still show an NMI error:
<2>NMI ISA 28, EISA 0 NMI/cpu2 ... going to debugger
If it's not an actual hardware issue it's something FreeBSD cannot handle IMO.
Did you test running anything else on it? Some burn-in test maybe?
Steve
-
@stephenw10
stress_ng can be an option? -
Sure, whatever you have access to. If you can boot and run some other OS without seeing any issues then it could be something FreeBSD specific.
-
Are you still having crashes?
Both dumps are related to cpu power management. Are C-States enabled in the BIOS?
What is the output of the following executed from shell:
sysctl machdep | grep -i idle
-
@adriftatlas Apologies for my late reply.
This is the output;
machdep.idle: acpi
machdep.idle_available: spin, mwait, hlt, acpi
machdep.idle_apl31: 0
machdep.idle_mwait: 1It was fine until last night. Attached is the latest dump file.
What I noticed is, last night I got high humidity environment. And also remember the same environment before. I just don't think is related, but, I can't figure it out why all of the sudden it crashes.
textdump.tarTrowed in other memory stick from another working server, just to make sure.
It's frustrating...... -
Still hard to ignore the NMI errors for me. But if you can disable power saving features in the BIOS as a test you may as well.
You do have Speedstep (C-states) enabled:
est0: <Enhanced SpeedStep Frequency Control> on cpu0
So you could also just disable powerd in System > Advanced > Misc
Steve
-
https://www.supermicro.com/products/archive/motherboard/x9scm-f
This motherboard is more than a decade old. Unless you updated the BIOS recently you're likely running old CPU microcode.
The BMC also likely has a watchdog that may be throwing NMIs, worth updating that too. There is a jumper on the motherboard for it and a BIOS setting, see page 57 in the manual:
https://www.supermicro.com/manuals/motherboard/C202_C204/MNL-1270.pdfLatest BIOS:
1/6/2021 2.3a
https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X9SCM-F/BIOSLatest BMC:
3.52
https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X9SCM-F/BMCOther things to try:
- Disable "Power Technology" in BIOS; see page 76 in manual
- Disable PowerD in pfSense as suggested by @stephenw10
- Set CPU idle to HALT instead of ACPI or MWAIT:
sysctl machdep.idle_mwait=0 sysctl machdep.idle=hlt
-
@adriftatlas Thanks!
I will try that over the weekend.
(powerD is disabled)Keep you posted...