Help with deciphering 2.7.0 crash dump
-
Hi,
My pfSense instance has been crashing randomly for the past few months. It's crashed every 2-3 weeks on 2.6.0 and now for the first time since updating to 2.7.0.
Hardware: Topton Mini PC (N5105, 32GB DDR4, 1TB NVMe, I226-V NICs)
Hypervisor: ESXi 8.0 U1a
Internet: AT&T BGW210 1Gb/s FiberIs anyone able to assist in deciphering the crash dump below? I've snipped the relevant sections with the full crash dump attached. The anti-spam filter wouldn't let me post the trace.
Filename: /var/crash/info.0 Dump header from device: /dev/da0p2 Architecture: amd64 Architecture Version: 4 Dump Length: 76288 Blocksize: 512 Compression: none Dumptime: 2023-07-08 06:29:23 -0700 Hostname: pfsense.winata.xyz Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT #1 RELENG_2_7_0-n255866-686c8d3c1f0: Wed Jun 28 04:21:19 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_0-main/obj/amd64/LwYAddCr/var/jenkins/ Panic String: page fault Dump Parity: 3444675921 Bounds: 0 Dump Status: good
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x22c00000234 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80d65095 stack pointer = 0x28:0xfffffe0096528d50 frame pointer = 0x28:0xfffffe0096528d90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 59902 (grep) rdi: fffff8003ac198c0 rsi: fffffe0096528da0 rdx: fffff800090a3d00 rcx: 0 r8: fffffe00971ebac0 r9: 0 rax: 22c0000022c rbx: 1000 rbp: fffffe0096528d90 r10: 1000 r11: fffffe00971ebfe0 r12: fffffe0096528da0 r13: fffff8003ac198c0 r14: 0 r15: fffffe00971ebac0 trap number = 12 panic: page fault cpuid = 0 time = 1688822963 KDB: enter: panic
Thanks!
-
Backtrace:
db:0:kdb.enter.default> bt Tracing pid 59902 tid 100339 td 0xfffffe00971ebac0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe0096528b10 vpanic() at vpanic+0x183/frame 0xfffffe0096528b60 panic() at panic+0x43/frame 0xfffffe0096528bc0 trap_fatal() at trap_fatal+0x409/frame 0xfffffe0096528c20 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0096528c80 calltrap() at calltrap+0x8/frame 0xfffffe0096528c80 --- trap 0xc, rip = 0xffffffff80d65095, rsp = 0xfffffe0096528d50, rbp = 0xfffffe0096528d90 --- dofilewrite() at dofilewrite+0x85/frame 0xfffffe0096528d90 sys_write() at sys_write+0xbc/frame 0xfffffe0096528e00 amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe0096528f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0096528f30 --- syscall (4, FreeBSD ELF64, write), rip = 0x340381ff9f9a, rsp = 0x340380204f78, rbp = 0x340380204fa0 ---
But there are a bunch of failed processes in the message buffer:
<118>Bootup complete <6>pid 58393 (netstat), jid 0, uid 0: exited on signal 10 <6>pid 98867 (awk), jid 0, uid 0: exited on signal 10 (core dumped) <6>pid 3439 (awk), jid 0, uid 0: exited on signal 10 (core dumped) <6>pid 90571 (grep), jid 0, uid 0: exited on signal 11 (core dumped) <6>pid 56919 (awk), jid 0, uid 0: exited on signal 10 (core dumped) <6>pid 9712 (awk), jid 0, uid 0: exited on signal 11 (core dumped) <6>pid 60062 (php-cgi), jid 0, uid 0: exited on signal 10 (core dumped) <6>pid 43746 (awk), jid 0, uid 0: exited on signal 10 (core dumped) <6>ovpns1: link state changed to DOWN <6>ovpns1: link state changed to UP <6>pid 9474 (dhcpd), jid 0, uid 136: exited on signal 6
You are running the hypervisor on a Jasper Lake CPU though so the first thing I would try is disabling any power saving features in bios, EIST etc.
Steve
-
@stephenw10 Thanks for pointing me in the right direction. Did some research, and indeed you are correct! The Jasper Lake platform has issues with power saving functions causing VMs to behave incorrectly. Apparently, there's a microcode/BIOS update to resolve it. I'll flash it and report back, hopefully with no more crashes!
-
So far, so good! No crashes after about 10 days since updating the BIOS/microcode. Let's hope it stays that way!