Uncorrectable CPU error
-
Hello everyone,
I want to start off by saying that I'm not very experienced when it comes to networking but that's why I built a pfSense box, so I can start learning ;D
I put together this box for my home network, a simple setup consisting of the modem, pfSense box, and wireless router. The pfSense box is acting as the gateway. It's had an uptime of a little over 7 days and I've had no issues with it at all until about an hour ago, when all the devices in the house couldn't resolve hostnames. I couldn't ping the pfSense box so I figured something was wrong with itβ¦http://s17.postimg.org/syfobbfcf/Untitled.png
Searching online tells me that the "UNCOR PCC GCACHE L2 ERR error" message indicates an uncorrectable error in the CPU's L2 cache. :(
I realize this is not an issue with pfSense itself but I'd like to hear some thoughts on why this may be occurring what I could do to narrow down the faulty hardware. Of course the CPU is not overclocked, and the temp sensors read 33 - 40 C. My exact build is as follows:-
Intel Xeon E5-2650L V2 (ES)
-
Corsair H60 CPU cooler
-
Asrock X79 Extreme4
-
Elpida 4GB DDR3-1600 ECC unbuffered RAM
-
HP NC360T dual-port NIC (Intel-based)
-
AMD Radeon HD 6350
-
Samsung 32GB mSATA SSD
-
Sparkle 400W 80 Plus Platinum PSU
Thanks for any help.
-
-
I'd try swapping out the RAM or maybe the CPU if the RAM swapover doesn't work.
-
The fact that it is happening across multiple CPUs/Cores is at least somewhat encouraging.
MCA events are definitely from hardware, no way around that, but given that it is somewhat random I'd check the cooling and power supply first. MCA events from RAM will say they're from RAM and given a bank or other info. This reporting an issue in the CPU L2 cache is most likely from a problem in the CPU itself. Outside chance it's the MB/power/cooling but MCA errors are generally exactly what they state.
You might check the BIOS to see if it has any CPU L2 options to try changing as well.
-
Try running some stress tests using Prime 95 or Memtest if you can.
Probably the CPU at fault here. Given that it's an Engineering Sample, a lot of them are used for extreme overclocking and the likes of (abuse). I won't be surprised if the CPU was previously badly abused.
-
Thanks for the responses everyone. I have replaced the CPU and it seems to have fixed the issue, currently at 10 days uptime without crashes. The old CPU would crash anywhere between 3-7 days of uptime.