Recurring crashes in the last weeks
Good morning developers,
in the last few weeks we had a couple of crashes of our pfSense 2.3 box.
Yesterday morning our box crashed resulting in a degraded raid. This morning our box crashed again with a degraded raid.
Both crash reports have been submitted. The time of submission of the last report should be around 2017-08-16 10:27 CET.
Could you tell us if we should take a closer look at the hardware of if it is a software bug?
I don't see any crashes in the crash reporter server from the IP address on your post. We can't just go by submitted time. If you could at least give the first two octets of the IPv4 address, or first 2-3 sections of an IPv6 address, that should help narrow it down along with the time.
Do you see reports from the 18.104.22.168/29 range?
Yes, there are recent ones from yesterday and one from today. Both were the same.
Fatal double fault: eip = 0xc12d2498 esp = 0xe4767000 ebp = 0xe4767b70 cpuid = 1; apic id = 01 panic: double fault cpuid = 1 KDB: enter: panic
db:0:kdb.enter.default> bt Tracing pid 11 tid 100004 td 0xc8715c80 kdb_enter(c147cb56,c147cb56,c1643e27,c1fb7994,1,...) at kdb_enter+0x3d/frame 0xc1fb7940 vpanic(c1643e27,c1fb7994,c1fb7994,c1fb79ac,c12e7f2b,...) at vpanic+0x13b/frame 0xc1fb7974 panic(c1643e27,1,1,1,e4767b70,...) at panic+0x1b/frame 0xc1fb7988 dblfault_handler() at dblfault_handler+0xab/frame 0xc1fb7988 --- trap 0x17, eip = 0xc12d2498, esp = 0xe4767000, ebp = 0xe4767b70 --- Xpage(8,28,28,c87db000,0,...) at Xpage/frame 0xe4767b70 Xinvlrng(e4767c28,c0d3d01e,c1f96f58,103f3,c8715c80,...) at Xinvlrng+0x2d/frame 0xe4767bb8 acpi_cpu_idle(18199824,0,18199824,e4767c28,c12d671a,...) at acpi_cpu_idle+0x15a/frame 0xe4767bf8 cpu_idle_acpi(18199824,0,c1f87404,c1f87408,c1f87414,...) at cpu_idle_acpi+0x3f/frame 0xe4767c0c cpu_idle(0,e4767c78,c147e4f3,a3d,0,...) at cpu_idle+0x9a/frame 0xe4767c28 sched_idletd(0,e4767ce8,0,0,0,...) at sched_idletd+0x1dd/frame 0xe4767ca4 fork_exit(c0d3fd30,0,e4767ce8) at fork_exit+0xa3/frame 0xe4767cd4 fork_trampoline() at fork_trampoline+0x8/frame 0xe4767cd4 --- trap 0, eip = 0, esp = 0xe4767d20, ebp = 0 ---
Usually a double fault is from a driver or hardware issue. Not much helpful in the backtrace though. The idle process was active at the time, it looks like it was literally just sitting there idling and crashed somehow. To me, that screams hardware, but it's not definitive.
The broken RAID was just because it crashed, it's not directly related. That would happen with gmirror from any panic/crash.
Might be worth checking for a BIOS update, there are some other ACPI errors in the message buffer of the crash that look out of place:
ACPI Error: [GPMN] Namespace lookup failure, AE_NOT_FOUND (20150515/psargs-391) ACPI Error: Method parse/execution failed [\_SB_.PCI0.LPC0.MBRD._CRS] (Node 0xc887bb80), AE_NOT_FOUND (20150515/psparse-552)
That doesn't look especially harmful but it's still noteworthy.
If you can keep it down for a bit, run memtest86+ and any OEM/other hardware diagnostics you have access to. While those may not necessarily draw a problem out if it's there, if they do find something it's a good indicator that you have a hardware problem.
Thanks for the fast analysis!
We'll run a memtest on the machine and look into replacing the box with modern hardware in the foreseeable future.