Multiple random crashes - Crashlog



  • Hi,
    I am experiencing random crash and reboots, and I am not able to find the root cause.

    NOTE: I am posting the crashlog below. Please notify me if this has security issues for us to do this here publicly (I have obfuscated some info)

    Here is the log:
    https://www.dropbox.com/s/7032hf4ccx9ayaw/Crashlog.txt?dl=0

    Anyone can help / point a finger or a toe !?

    THANKS :)



  • Those are almost always hardware issues. Just before many of your crashes, your network cards seem to be bouncing up and down a lot:

    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    <5>em1: link state changed to DOWN
    <5>em1: link state changed to UP
    
    
    Fatal trap 9: general protection fault while in kernel mode
    cpuid = 0; apic id = 00
    

    Notice that the actual process on the CPU is always different for every crash. What is your config? I see references to a couple of different NICs, and a bridge.



  • Hi @KOM ,

    First of all, thanks for taking your time for answering! :)

    A- Networks Cards::

    • 2 network card
    • 1 wireless card
    • 1 onboard NIC

    B- Configured like this:

    • em0 as WAN
    • em1 as opt4
    • Bridge0 (one of my wireless networks and opt4) as LAN
    • And I have two different network with different SSID on my wireless card

    C-Also... As you said “network cards seem to be bouncing up and down a lot”... How can I test my network cards to find out more about this?

    Thanks again for your help!



  • @Wastapi said in Multiple random crashes - Crashlog:

    How can I test my network cards to find out more about this?

    You replace them one by one and see if your problem magically disappears.



  • @KOM said in Multiple random crashes - Crashlog:

    You replace them one by one and see if your problem magically disappears.

    Hehe, yeah, but this is so random. In any case, we'll see if other things arise before moving there.

    But you really believe a network card can crash the whole system like this??

    As for this that you mentioned :

    Fatal trap 9: general protection fault while in kernel mode
    cpuid = 0; apic id = 00

    Awaiting your insight from our configs for the other aspects of my problem.
    Thanks! ;)



  • Also, could this help? I had this on screen this morning when I arrived.

    https://www.dropbox.com/s/hbp69vipyddjkwr/Screenshot-errorpfsense.jpg?dl=0



  • Gah, that's not good at all. CPU error of some sort. No wonder your system keeps rebooting. Others have had similar issues:

    https://forum.netgate.com/topic/44740/hw-experts-kernel-mca-cpu-0-cor-over-gcache-lg-err-error

    Check for BIOS updates for your board and microcode updates for the chip. Try using a benchmarking tool to pound the CPU and see if it fails. It could be a power-supply issue. It could be a heat issue. Is your CPU fan running?

    You're going to have lots of fun debugging this.


  • Rebel Alliance Developer Netgate

    MCA/MCE messages are 100% a hardware error and cannot be anything else.


  • LAYER 8

    fried cpu 😂

    1bxqft.jpg



  • @kiokoman, We did not overclock anything.



  • @KOM, thanks. there is no Bios update for that machine unfortunately.



  • I'm sure @kiokoman was just joking with that meme.

    A BIOS update would be a Hail Mary for sure since this seems like a clear case of bad hardware, but I always say to try the simple fixes first before you roll up your sleeves for the harder ones.


  • LAYER 8

    yeah the meme was a joke, the fried cpu was not


Log in to reply