Kernel Error
-
Noticed on the monitor this error message which was also in the system general log...
May 19 09:00:00 php 12611 [pfBlockerNG] Starting cron process. May 19 09:00:26 php 12611 [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload May 19 09:06:18 kernel MCA: Bank 2, Status 0x8400004040080151 May 19 09:06:18 kernel MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000000 May 19 09:06:18 kernel MCA: Vendor "GenuineIntel", ID 0x706a1, APIC ID 2 May 19 09:06:18 kernel MCA: CPU 1 COR (1) ICACHE L1 IRD error May 19 09:06:18 kernel MCA: Address 0x80db1530 May 19 09:21:00 sshguard 74510 Exiting on signal.
There were no performance issues as far as I know, everything seems normal.
Is this something I should be concerned about?
Thanks
-
@vcr58 said in Kernel Error:
Is this something I should be concerned about?
Yes. MCA errors like that are almost always a hardware issue.
Usually I would have expected it to panic and reboot after seeing that but I assume that did not happen?Is this the first time you've seen an error like that?
Have you recently updated?
Steve
-
@stephenw10 Correct, there was no reboot after this error. I only noticed it because I had a monitor attached, then I checked the system/general logs.
This is the first time I saw this error since replacing the SSD that was giving me trouble (see post). Nothing else has changed except I started using pfBlocker a couple days ago.
I have been running 22.01 released version for a couple months.
-
I have seen this error a couple of times over the years on my FreeNAS/TrueNAS server. In each case, it was bad RAM. Run a few cycles of memtest86 to nail this down, then replace the bad module.
EDIT: It could also be CPU, but I would try memory test first before blaming the CPU. Running CPU stress test with Prime 95 or AIDA64 would be the next step if memory checks out.
-
Yeah, it's pretty much always hardware. Sometimes you might start seeing it after an upgrade for example which looks like a software issue but that's usually because some new driver is now hitting the hardware issue.
Steve
-
$ mcelog --no-dmi --ascii --file mce.log mcelog: Family 6 Model 122 CPU: only decoding architectural errors mcelog: Family 6 Model 122 CPU: only decoding architectural errors Hardware event. This is not a software error. CPU 1 BANK 2 ADDR 80db1530 MCG status: STATUS 8400004040080151 MCGSTATUS 0 MCGCAP c07 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 122 Step 1
Given that it was an L1 cache error and that's the cache on the CPU, then it's almost certainly a CPU problem and not RAM. Might be overheating as well but that seems less likely.
If it is a board with a removable CPU you can try re-seating the CPU in the socket. If it has a removable heat sink you could also try removing that, cleaning and redoing the thermal paste/grease/tape/whatever.
-
@jimp My CPU is soldered on a mini ITX MB but the heat sink may be removable. However I have never seen CPU temp above 40 deg C so I don't think its an issue.
I read somewhere in these forums that there was a BIOS setting that fixed a users errors. I found a "Turbo Mode" in BIOS that I disabled so maybe that will help. I haven't seen any more errors since my first post.