Unrecoverable machine check exception
-
First time user of pfsense and BSDs in general. I have a fresh installed system, updated to the newest version of PFSense. Hardware specs: quad Intel Pro 100/1000 NIC, Dell OptiPlex 790, 4GB ram, i5 processor with AES-NI support. Other than the PCIe NIC, everything else is factory on the Dell.
The firewall crashes continuously, so it's easy to reproduce. Everytime it crashes it get the dump files. I suspected a hardware issue so I executed the Dell diagnostics from the Boot menu. Everything passes even when choosing the extended test so that wasn't much help. I have read the Intel card is widely known as supported. I can't make heads or tails out of the dump files other than "Unrecoverable machine check exception". Would there be anything I should look for that may give me a clue to what is failing or incompatible with PFsense? Would it be helpful to post some of those files here?
Any advice would be appreciated.
-
I found some information that may be relevant that points to hardware. I have had linux and windows installed on the PC previously without any issues. I noticed from a google search someone else had the same hardware issues on a Dell OptiPlex with a Intel Pro NIC installed.
I have upgraded the BIOS to the newest version. Could FreeBSD/PFSense have an incompatibility with this hardware or is this truly only a hardware issue that BSD recognizes?
MCA: Bank 3, Status 0xfe00000000800400
MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000004
MCA: Vendor "GenuineIntel", ID 0x206a7, APIC ID 0
MCA: CPU 0 UNCOR PCC OVER internal timer error
MCA: Address 0x3fff806160dd
MCA: Misc 0x3ffff
panic: Unrecoverable machine check exception
cpuid = 0
KDB: enter: panic -
MCA/MCE errors are 100% hardware. Nothing to do with the OS or any software.
$ mcelog --no-dmi --ascii --file mce.log Hardware event. This is not a software error. CPU 0 BANK 3 MISC 3ffff ADDR 3fff806160dd MCG status:MCIP STATUS fe00000000800400 MCGSTATUS 4 MCGCAP c09 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 42
Not much more helpful. Looks like a CPU problem, but maybe a slight chance it's power/heat.
-
@jimp Thank you for sharing that information. I assume the Dell diagnostic test does not perform a deep enough test to identify the culprit. This leads me down the right path.
-
Edit:
I replaced The Dell Optiplex 790 completely with a known good one and same crashes, same error message to the letter. The only piece of hardware that was the same was an Intel Pro 1000 NIC. After replacing the NIC the issue is no longer present.I was incorrect in believing this issue was related to PFSense. PFSense assisted me in discovering bad hardware as did Jimp.
MCA: Bank 3, Status 0xfe00000000800400
MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000004
MCA: Vendor "GenuineIntel", ID 0x206a7, APIC ID 0
MCA: CPU 0 UNCOR PCC OVER internal timer error
MCA: Address 0x3fff805ea790
MCA: Misc 0x3ffff
panic: Unrecoverable machine check exception
cpuid = 0
KDB: enter: panic -
Linux and Windows works fine with this PC and the other one that was crashing with PFsense. It must be a issue with this hardware compatibility with FreeBSD.
-
Then you have something else wrong in your environment, maybe bad power. That error cannot come from software. It is generated in the hardware/BIOS.
-
Bad power would be odd since I have it connected to a surge suppressor with two Ubuntu servers that stay on 24/7 without any crashes. Anyway thanks for the response.
-
@mokfarg said in Unrecoverable machine check exception:
Dell Optiplex 790
If both were the same model, same vintage, they may have the same hardware issue. Bad capacitors hit in waves like that.
If you don't believe me about the errors being hardware, research Machine Check Exceptions: https://en.wikipedia.org/wiki/Machine-check_exception
I know you do not want to believe it's hardware, but there is literally no way for software to trigger those.
-
I appreciate the response I truly do. I guess that is a possibility. The only piece of hardware that I have had in both PCs was an Intel NIC, I guess it could cause a kernel panic? I'll remove it and test, usually the crashes happen quickly.
-
Edit:
I replaced The Dell Optiplex 790 completely with a known good one and same crashes, same error message to the letter. The only piece of hardware that was the same was an Intel Pro 1000 NIC. After replacing the NIC the issue is no longer present.I was incorrect in believing this issue was related to PFSense. PFSense assisted me in discovering bad hardware as did Jimp.
MCA: Bank 3, Status 0xfe00000000800400
MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000004
MCA: Vendor "GenuineIntel", ID 0x206a7, APIC ID 0
MCA: CPU 0 UNCOR PCC OVER internal timer error
MCA: Address 0x3fff805ea790
MCA: Misc 0x3ffff
panic: Unrecoverable machine check exception
cpuid = 0
KDB: enter: panic