Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Recurring crashes in the last weeks

    Scheduled Pinned Locked Moved General pfSense Questions
    5 Posts 2 Posters 514 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • I
      inworksit
      last edited by

      Good morning developers,

      in the last few weeks we had a couple of crashes of our pfSense 2.3 box.
      Yesterday morning our box crashed resulting in a degraded raid. This morning our box crashed again with a degraded raid.
      Both crash reports have been submitted. The time of submission of the last report should be around 2017-08-16 10:27 CET.

      Could you tell us if we should take a closer look at the hardware of if it is a software bug?

      Regards,
      Juergen Nagel
      Inworks GmbH

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        I don't see any crashes in the crash reporter server from the IP address on your post. We can't just go by submitted time. If you could at least give the first two octets of the IPv4 address, or first 2-3 sections of an IPv6 address, that should help narrow it down along with the time.

        Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • I
          inworksit
          last edited by

          Do you see reports from the 92.198.54.104/29 range?

          1 Reply Last reply Reply Quote 0
          • jimpJ
            jimp Rebel Alliance Developer Netgate
            last edited by

            Yes, there are recent ones from yesterday and one from today. Both were the same.

            Fatal double fault:
            eip = 0xc12d2498
            esp = 0xe4767000
            ebp = 0xe4767b70
            cpuid = 1; apic id = 01
            panic: double fault
            cpuid = 1
            KDB: enter: panic
            
            db:0:kdb.enter.default>ย  bt
            Tracing pid 11 tid 100004 td 0xc8715c80
            kdb_enter(c147cb56,c147cb56,c1643e27,c1fb7994,1,...) at kdb_enter+0x3d/frame 0xc1fb7940
            vpanic(c1643e27,c1fb7994,c1fb7994,c1fb79ac,c12e7f2b,...) at vpanic+0x13b/frame 0xc1fb7974
            panic(c1643e27,1,1,1,e4767b70,...) at panic+0x1b/frame 0xc1fb7988
            dblfault_handler() at dblfault_handler+0xab/frame 0xc1fb7988
            --- trap 0x17, eip = 0xc12d2498, esp = 0xe4767000, ebp = 0xe4767b70 ---
            Xpage(8,28,28,c87db000,0,...) at Xpage/frame 0xe4767b70
            Xinvlrng(e4767c28,c0d3d01e,c1f96f58,103f3,c8715c80,...) at Xinvlrng+0x2d/frame 0xe4767bb8
            acpi_cpu_idle(18199824,0,18199824,e4767c28,c12d671a,...) at acpi_cpu_idle+0x15a/frame 0xe4767bf8
            cpu_idle_acpi(18199824,0,c1f87404,c1f87408,c1f87414,...) at cpu_idle_acpi+0x3f/frame 0xe4767c0c
            cpu_idle(0,e4767c78,c147e4f3,a3d,0,...) at cpu_idle+0x9a/frame 0xe4767c28
            sched_idletd(0,e4767ce8,0,0,0,...) at sched_idletd+0x1dd/frame 0xe4767ca4
            fork_exit(c0d3fd30,0,e4767ce8) at fork_exit+0xa3/frame 0xe4767cd4
            fork_trampoline() at fork_trampoline+0x8/frame 0xe4767cd4
            --- trap 0, eip = 0, esp = 0xe4767d20, ebp = 0 ---
            
            

            Usually a double fault is from a driver or hardware issue. Not much helpful in the backtrace though. The idle process was active at the time, it looks like it was literally just sitting there idling and crashed somehow. To me, that screams hardware, but it's not definitive.

            The broken RAID was just because it crashed, it's not directly related. That would happen with gmirror from any panic/crash.

            Might be worth checking for a BIOS update, there are some other ACPI errors in the message buffer of the crash that look out of place:

            ACPI Error: [GPMN] Namespace lookup failure, AE_NOT_FOUND (20150515/psargs-391)
            ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC0.MBRD._CRS] (Node 0xc887bb80), AE_NOT_FOUND (20150515/psparse-552)
            
            

            That doesn't look especially harmful but it's still noteworthy.

            If you can keep it down for a bit, run memtest86+ and any OEM/other hardware diagnostics you have access to. While those may not necessarily draw a problem out if it's there, if they do find something it's a good indicator that you have a hardware problem.

            Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            1 Reply Last reply Reply Quote 0
            • I
              inworksit
              last edited by

              Thanks for the fast analysis!
              We'll run a memtest on the machine and look into replacing the box with modern hardware in the foreseeable future.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.