pFsense + 22.05 keeps crashing
-
@gertjan I noticed my error on the subject. I did upgraded to 22.05 once I noticed.
Maybe you're right, I will start to disable all the junk on the BIOS.Thanks for the tip...
-
@geekypr Here's an update;
Following advise from Gertjan, I reviewed the BIOS settings and found two things that not really needed; virtualization and hyper-threading. Those where enabled (changed to disable).
Also, turbo-boost feature is enabled, but I leave it as is for now.The system has been running OK for 4 days straight with no issues so far, at least no crashes nor unexpected reboots.
I appreciate the help and tips and will keep post any progress in a couple of days.
Thanks!
-
@geekypr I know that motherboard has 3 NICs... 2 good ones and 1 Realtek. If you're utilizing the Realtek NIC for anything you might need to search forums here for that specific fix action.
-
@skogs Thanks, that third NIC is for IPMI. Not using it.
It worked for a week or two, today it started to crash again.Attached is the latest dump file;
textdump.tarMaybe I have a bad memory module. I just removed one (have 4GB x2) and will monitor behavior.
I hope there's something in the dump that can be find to resolve this issue.
Again, thank you for the help here...
-
Still show an NMI error:
<2>NMI ISA 28, EISA 0 NMI/cpu2 ... going to debugger
If it's not an actual hardware issue it's something FreeBSD cannot handle IMO.
Did you test running anything else on it? Some burn-in test maybe?
Steve
-
@stephenw10
stress_ng can be an option? -
Sure, whatever you have access to. If you can boot and run some other OS without seeing any issues then it could be something FreeBSD specific.
-
Are you still having crashes?
Both dumps are related to cpu power management. Are C-States enabled in the BIOS?
What is the output of the following executed from shell:
sysctl machdep | grep -i idle
-
@adriftatlas Apologies for my late reply.
This is the output;
machdep.idle: acpi
machdep.idle_available: spin, mwait, hlt, acpi
machdep.idle_apl31: 0
machdep.idle_mwait: 1It was fine until last night. Attached is the latest dump file.
What I noticed is, last night I got high humidity environment. And also remember the same environment before. I just don't think is related, but, I can't figure it out why all of the sudden it crashes.
textdump.tarTrowed in other memory stick from another working server, just to make sure.
It's frustrating...... -
Still hard to ignore the NMI errors for me. But if you can disable power saving features in the BIOS as a test you may as well.
You do have Speedstep (C-states) enabled:
est0: <Enhanced SpeedStep Frequency Control> on cpu0
So you could also just disable powerd in System > Advanced > Misc
Steve
-
https://www.supermicro.com/products/archive/motherboard/x9scm-f
This motherboard is more than a decade old. Unless you updated the BIOS recently you're likely running old CPU microcode.
The BMC also likely has a watchdog that may be throwing NMIs, worth updating that too. There is a jumper on the motherboard for it and a BIOS setting, see page 57 in the manual:
https://www.supermicro.com/manuals/motherboard/C202_C204/MNL-1270.pdfLatest BIOS:
1/6/2021 2.3a
https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X9SCM-F/BIOSLatest BMC:
3.52
https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X9SCM-F/BMCOther things to try:
- Disable "Power Technology" in BIOS; see page 76 in manual
- Disable PowerD in pfSense as suggested by @stephenw10
- Set CPU idle to HALT instead of ACPI or MWAIT:
sysctl machdep.idle_mwait=0 sysctl machdep.idle=hlt
-
@adriftatlas Thanks!
I will try that over the weekend.
(powerD is disabled)Keep you posted...