Help with deciphering 2.7.0 crash dump

Frozen Fractals

Hi,

My pfSense instance has been crashing randomly for the past few months. It's crashed every 2-3 weeks on 2.6.0 and now for the first time since updating to 2.7.0.

Hardware: Topton Mini PC (N5105, 32GB DDR4, 1TB NVMe, I226-V NICs)
Hypervisor: ESXi 8.0 U1a
Internet: AT&T BGW210 1Gb/s Fiber

Is anyone able to assist in deciphering the crash dump below? I've snipped the relevant sections with the full crash dump attached. The anti-spam filter wouldn't let me post the trace.

Filename: /var/crash/info.0
Dump header from device: /dev/da0p2
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 76288
  Blocksize: 512
  Compression: none
  Dumptime: 2023-07-08 06:29:23 -0700
  Hostname: pfsense.winata.xyz
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 14.0-CURRENT #1 RELENG_2_7_0-n255866-686c8d3c1f0: Wed Jun 28 04:21:19 UTC 2023
    root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_0-main/obj/amd64/LwYAddCr/var/jenkins/
  Panic String: page fault
  Dump Parity: 3444675921
  Bounds: 0
  Dump Status: good

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x22c00000234
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80d65095
stack pointer	        = 0x28:0xfffffe0096528d50
frame pointer	        = 0x28:0xfffffe0096528d90
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 59902 (grep)
rdi: fffff8003ac198c0 rsi: fffffe0096528da0 rdx: fffff800090a3d00
rcx:                0  r8: fffffe00971ebac0  r9:                0
rax:      22c0000022c rbx:             1000 rbp: fffffe0096528d90
r10:             1000 r11: fffffe00971ebfe0 r12: fffffe0096528da0
r13: fffff8003ac198c0 r14:                0 r15: fffffe00971ebac0
trap number		= 12
panic: page fault
cpuid = 0
time = 1688822963
KDB: enter: panic

info.0
textdump.tar.0

Thanks!

stephenw10

Backtrace:

db:0:kdb.enter.default>  bt
Tracing pid 59902 tid 100339 td 0xfffffe00971ebac0
kdb_enter() at kdb_enter+0x32/frame 0xfffffe0096528b10
vpanic() at vpanic+0x183/frame 0xfffffe0096528b60
panic() at panic+0x43/frame 0xfffffe0096528bc0
trap_fatal() at trap_fatal+0x409/frame 0xfffffe0096528c20
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0096528c80
calltrap() at calltrap+0x8/frame 0xfffffe0096528c80
--- trap 0xc, rip = 0xffffffff80d65095, rsp = 0xfffffe0096528d50, rbp = 0xfffffe0096528d90 ---
dofilewrite() at dofilewrite+0x85/frame 0xfffffe0096528d90
sys_write() at sys_write+0xbc/frame 0xfffffe0096528e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe0096528f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0096528f30
--- syscall (4, FreeBSD ELF64, write), rip = 0x340381ff9f9a, rsp = 0x340380204f78, rbp = 0x340380204fa0 ---

But there are a bunch of failed processes in the message buffer:

<118>Bootup complete
<6>pid 58393 (netstat), jid 0, uid 0: exited on signal 10
<6>pid 98867 (awk), jid 0, uid 0: exited on signal 10 (core dumped)
<6>pid 3439 (awk), jid 0, uid 0: exited on signal 10 (core dumped)
<6>pid 90571 (grep), jid 0, uid 0: exited on signal 11 (core dumped)
<6>pid 56919 (awk), jid 0, uid 0: exited on signal 10 (core dumped)
<6>pid 9712 (awk), jid 0, uid 0: exited on signal 11 (core dumped)
<6>pid 60062 (php-cgi), jid 0, uid 0: exited on signal 10 (core dumped)
<6>pid 43746 (awk), jid 0, uid 0: exited on signal 10 (core dumped)
<6>ovpns1: link state changed to DOWN
<6>ovpns1: link state changed to UP
<6>pid 9474 (dhcpd), jid 0, uid 136: exited on signal 6

You are running the hypervisor on a Jasper Lake CPU though so the first thing I would try is disabling any power saving features in bios, EIST etc.

Steve

Frozen Fractals

@stephenw10 Thanks for pointing me in the right direction. Did some research, and indeed you are correct! The Jasper Lake platform has issues with power saving functions causing VMs to behave incorrectly. Apparently, there's a microcode/BIOS update to resolve it. I'll flash it and report back, hopefully with no more crashes!

Frozen Fractals

So far, so good! No crashes after about 10 days since updating the BIOS/microcode. Let's hope it stays that way!