Crash Report or Programming Bug

kuschi

My brand new pfSense Plus 23.09 install just crashed. The hardware is new, too. Any ideas? Thanks.

stephenw10

Backtrace:

db:1:pfs> bt
Tracing pid 51493 tid 103465 td 0xfffffe0103b023a0
kdb_enter() at kdb_enter+0x32/frame 0xfffffe0101fa5230
vpanic() at vpanic+0x163/frame 0xfffffe0101fa5360
panic() at panic+0x43/frame 0xfffffe0101fa53c0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe0101fa5420
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0101fa5480
calltrap() at calltrap+0x8/frame 0xfffffe0101fa5480
--- trap 0xc, rip = 0xffffffff8116bd1b, rsp = 0xfffffe0101fa5550, rbp = 0xfffffe0101fa5550 ---
vm_radix_lookup_ge() at vm_radix_lookup_ge+0x4b/frame 0xfffffe0101fa5550
kern_proc_vmmap_resident() at kern_proc_vmmap_resident+0x12b/frame 0xfffffe0101fa55c0
kern_proc_vmmap_out() at kern_proc_vmmap_out+0x19f/frame 0xfffffe0101fa5740
note_procstat_vmmap() at note_procstat_vmmap+0xfc/frame 0xfffffe0101fa5790
elf64_prepare_notes() at elf64_prepare_notes+0x577/frame 0xfffffe0101fa5820
elf64_coredump() at elf64_coredump+0x8b/frame 0xfffffe0101fa58f0
sigexit() at sigexit+0xbd5/frame 0xfffffe0101fa5d60
postsig() at postsig+0x237/frame 0xfffffe0101fa5e20
ast_sig() at ast_sig+0x1d7/frame 0xfffffe0101fa5ed0
ast_handler() at ast_handler+0x88/frame 0xfffffe0101fa5f10
ast() at ast+0x20/frame 0xfffffe0101fa5f30
doreti_ast() at doreti_ast+0x1c/frame 0x858f9eb30

Panic:

<118>Bootup complete
<6>igc0: promiscuous mode enabled
<6>igc5: promiscuous mode enabled
<6>pid 28585 (suricata), jid 0, uid 0: exited on signal 11 (core dumped)
<6>igc5: promiscuous mode disabled
<6>pid 24905 (unbound-control), jid 0, uid 59: exited on signal 6 (no core dump - other error)


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xfffffe400a490c50
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff8116bd1b
stack pointer	        = 0x0:0xfffffe0101fa5550
frame pointer	        = 0x0:0xfffffe0101fa5550
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 51493 (ntopng)
rdi: fffff80222e1b510 rsi: 0000000000005869 rdx: fffffe400a490c29
rcx: 0000000000000009  r8: 000000000000007f  r9: 000000000000009f
rax: fffffe400a490c28 rbx: fffff80222ec8318 rbp: fffffe0101fa5550
r10: 000007fffffff000 r11: 0000000000000020 r12: 0000000000005869
r13: 00003627ec769000 r14: 00000000000005b9 r15: 0000000000000000
trap number		= 12
panic: page fault
cpuid = 3
time = 1711980524
KDB: enter: panic

N100 is not too new.
You have ACPI errors in the BIOS. Make sure it's the most recent version:

Firmware Error (ACPI): Could not resolve symbol [\_SB.PC00.TXHC.RHUB.SS01], AE_NOT_FOUND (20221020/dswload2-315)
ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20221020/psobject-372)

Nothing obvious but I'd guess something in a CPU power saving mode. Try disabling anything like that in the BIOS.

Is that the first time you've seen that crash?

Steve

kuschi

Thanks, Steve, for the quick response. The N100 is not the newest platform but I was hoping that it will be stable enough.

Unfortunately, it is not the first crash; I have had this unit for over a week now. It will run for a day or two without a problem and then randomly crash and since I couldn't find the error by myself, I posted it in the forum.

I now disabled all power savings and hibernation options in the BIOS. Let's see.

stephenw10

Have all he crashes been identical? If they're all random it could be a hardware issue.

kuschi

I am not sure because there is not always a crash report. I did get another one earlier today:

textdump.tar.0 info.0

Unfortunately, the third crash has not generated any reports yet.

stephenw10

Mmm, no that's completely different.

Backtrace:

db:1:pfs> bt
Tracing pid 2 tid 100041 td 0xfffffe00205b1560
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c579a940
vpanic() at vpanic+0x163/frame 0xfffffe00c579aa70
panic() at panic+0x43/frame 0xfffffe00c579aad0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00c579ab30
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c579ab90
calltrap() at calltrap+0x8/frame 0xfffffe00c579ab90
--- trap 0xc, rip = 0xffffffff81298340, rsp = 0xfffffe00c579ac60, rbp = 0xfffffe00c579ac60 ---
memset_erms() at memset_erms+0x30/frame 0xfffffe00c579ac60
uma_zalloc_arg() at uma_zalloc_arg+0x137/frame 0xfffffe00c579aca0
sigqueue_add() at sigqueue_add+0x99/frame 0xfffffe00c579acd0
tdsendsignal() at tdsendsignal+0x368/frame 0xfffffe00c579ad50
kern_psignal() at kern_psignal+0x8f/frame 0xfffffe00c579add0
realitexpire() at realitexpire+0x1a/frame 0xfffffe00c579ae10
softclock_call_cc() at softclock_call_cc+0x134/frame 0xfffffe00c579aec0
softclock_thread() at softclock_thread+0xe9/frame 0xfffffe00c579aef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe00c579af30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c579af30
--- trap 0x5aa55aa5, rip = 0x5aa55aa55aa55aa5, rsp = 0x5aa55aa55aa55aa5, rbp = 0x5aa55aa55aa55aa5 ---

Panic:

<118>Bootup complete
<6>igc0: promiscuous mode enabled
<6>igc5: promiscuous mode enabled


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xfffff810090a78c0
fault code		= supervisor write data, page not present
instruction pointer	= 0x20:0xffffffff81298340
stack pointer	        = 0x0:0xfffffe00c579ac60
frame pointer	        = 0x0:0xfffffe00c579ac60
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 2 (clock (0))
rdi: fffff810090a78c0 rsi: 0000000000000000 rdx: 0000000000000070
rcx: 0000000000000070  r8: 0000000000000000  r9: fffffe00205b1560
rax: fffff810090a78c0 rbx: 0000000000010000 rbp: fffffe00c579ac60
r10: 0000000000000000 r11: 0000000080334b3c r12: fffff810090a78c0
r13: 0000000000000070 r14: 0000000000000101 r15: fffffe00db402800
trap number		= 12
panic: page fault
cpuid = 0
time = 1711984542
KDB: enter: panic

Ok with crashes that different I'd first run a few memtest cycles to be sure its not bad ram.

kuschi

I already swapped the hard drive. I guess the RAM will be next. Let's see. However, I am afraid that maybe the entire unit is faulty.

stephenw10

It's possible if it's always been unstable.

kuschi

Based on the experience, it is possible. One way or the other, it is a warranty case for the unit or the component. Let's see once the RAM arrives tomorrow.

kuschi

@stephenw10 I don't want to praise the day before sunset but the new RAM may have done the trick! So far, the router has been stably running for almost a day without crashing!

Thank you for your support and for deciphering the crash report.