v2.4.4 crashing under load with MCA error



  • MCA: CPU 2 UNCOR PCC OVER internal timer error

    Above is the error I'm getting. MCA = Machine Check Architecture.

    Error occurs randomly, but seems to be replicable if I simply load a few web pages or videos at once from the one PC that is connected to it.

    It's not overheating - ~100degF give or take.

    This is on a Dell Optiplex 390. Actually, two identical boxes are producing the same exact error.

    CPU: Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz (3292.59-MHz K8-class CPU).
    Origin="GenuineIntel" Id=0x206a7 Family=0x6 Model=0x2a Stepping=7

    I'm not overclocking or anything like that. As far as I know these boxes have never been overclocked. (I got them barely used from a real-estate office).

    Also, using an Intel 4-port card, but that doesn't seem to be the issue as near as I can tell.

    igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0x3020-0x303f mem 0xe3420000-0xe343ffff,0xe3000000-0xe33fffff,0xe3450000-0xe3453fff irq 18 at device 0.0 on pci3
    I've googled the error code to death.

    Here's the things I've tried:
    BIOS updated (Currently A14 - latest).
    Reset BIOS defaults.
    Tried two different PCs, identical Optiplex 390's with 4Gb ram.
    Ran thorough system diagnostics - no issues. (RAM or Processor).
    Tried full reinstall on both boxes.
    Checked for bad capacitors on the mboard - no sign of any issues.

    Also, Win10 and Linux Mint ran just fine on both of these boxes for about a year.

    Help?



  • MCA errors are hardware errors caught by FreeBSD, there is nothing pfSense can do about them. Talk to the FreeBSD devs, but it's likely you need new/different hardware.


  • Netgate Administrator

    MCA errors are almost exclusively hardware. Though it could be some hardware issue that only FreeBSD/pfSense tickles.

    Can you test it with FreeBSD 11.2?

    There is usually a lot more lines of MCA errors and probably a kernal panic line. Do you have those?

    Steve



  • Thanks very much for the response... here's the full MCA:

    MCA: Misc 0x3ffff
    MCA: Address 0x3fff80609d46
    MCA: CPU 0 UNCOR PCC OVER internal timer error
    MCA: Vendor "GenuineIntel", ID 0x206a7, APIC ID 0
    MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000004
    MCA: Bank 3, Status 0xfe00000000800400

    What are the odds that both identical boxes have the same hardware issue?


  • Netgate Administrator

    There was no panic string shown though? No crash report after rebooting?

    Unless it's some common fault on that board it's more likely an issue only FreeBSD is hitting as I say. The BIOS may configure the hardware for Windows or Linux but passes different values to BSD for example.

    Steve


  • Rebel Alliance Developer Netgate

    $ mcelog --no-dmi --ascii --file mce.log
    Hardware event. This is not a software error.
    CPU 0 BANK 3 
    MISC 0 ADDR 0 
    MCG status:
    STATUS fe00000000800400 MCGSTATUS 0
    APICID 0 SOCKETID 0 
    

    You have a hardware problem.