pfSense kernel panic

userunix

I have been installing pfSense on Proxmox for a week and almost every day I register a crash but I have no idea what caused it. Among the logs I read "panic: spin lock held too long", can you help me?

I only have the logs from the last crash:

textdump.tar.1
info.1

Tell me if you need something else.

stephenw10

Backtrace:

db:0:kdb.enter.default>  bt
Tracing pid 23 tid 100135 td 0xfffff8000626a740
kdb_enter() at kdb_enter+0x37/frame 0xfffffe004ca2c630
vpanic() at vpanic+0x197/frame 0xfffffe004ca2c680
panic() at panic+0x43/frame 0xfffffe004ca2c6e0
_mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x69/frame 0xfffffe004ca2c6f0
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe004ca2c760
smp_targeted_tlb_shootdown() at smp_targeted_tlb_shootdown+0x484/frame 0xfffffe004ca2c7e0
pmap_ts_referenced() at pmap_ts_referenced+0xa90/frame 0xfffffe004ca2c8b0
vm_pageout_worker() at vm_pageout_worker+0xf88/frame 0xfffffe004ca2cc70
vm_pageout() at vm_pageout+0x193/frame 0xfffffe004ca2ccb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe004ca2ccf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004ca2ccf0

Panic:

spin lock 0xffffffff836e0f00 (smp rendezvous) held by 0xfffff80006c77740 (tid 100645) too long
panic: spin lock held too long
cpuid = 1
time = 1673039279
KDB: enter: panic

How do you have the VM configured in Proxmox? I have numerous pfSense installs running in Proxmox and never see an issue.

Steve

AdriftAtlas

You're in good company:
VM freezes irregularly

TL;DR: VMs running on a Jasper Lake CPU Proxmox host crash. The host normally stays up and LXC containers are not affected. Issue appears to be related to C-States.

Try running the following in pfSense shell and see if uptime improves. These values will reset to defaults on next reboot.

sysctl machdep.idle_mwait=0
sysctl machdep.idle=hlt

So far my latest uptime is 4 days. Previous ones were nearly 3 days and less than a day.

Mine:

kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address    = 0x1008e
fault code        = supervisor write data, page not present
instruction pointer    = 0x20:0xffffffff80da2d71
stack pointer            = 0x28:0xfffffe0025782b00
frame pointer            = 0x28:0xfffffe0025782b60
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = resume, IOPL = 0
current process        = 11 (idle: cpu0)
trap number        = 12
panic: page fault
cpuid = 0
time = 1672654637
KDB: enter: panic

db:0:kdb.enter.default>  bt

Tracing pid 11 tid 100003 td 0xfffff8000520d000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00257828c0
vpanic() at vpanic+0x194/frame 0xfffffe0025782910
panic() at panic+0x43/frame 0xfffffe0025782970
trap_fatal() at trap_fatal+0x38f/frame 0xfffffe00257829d0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0025782a30
calltrap() at calltrap+0x8/frame 0xfffffe0025782a30
--- trap 0xc, rip = 0xffffffff80da2d71, rsp = 0xfffffe0025782b00, rbp = 0xfffffe0025782b60 ---
callout_process() at callout_process+0x1b1/frame 0xfffffe0025782b60
handleevents() at handleevents+0x188/frame 0xfffffe0025782ba0
cpu_activeclock() at cpu_activeclock+0x70/frame 0xfffffe0025782bd0
cpu_idle() at cpu_idle+0xa8/frame 0xfffffe0025782bf0
sched_idletd() at sched_idletd+0x326/frame 0xfffffe0025782cb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0025782cf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0025782cf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

db:0:kdb.enter.default>  alltrace

Tracing command sleep pid 35878 tid 100632 td 0xfffff80057237740
sched_switch() at sched_switch+0x606/frame 0xfffffe003671b9c0
mi_switch() at mi_switch+0xdb/frame 0xfffffe003671b9f0
sleepq_catch_signals() at sleepq_catch_signals+0x3f3/frame 0xfffffe003671ba40
sleepq_timedwait_sig() at sleepq_timedwait_sig+0x14/frame 0xfffffe003671ba80
_sleep() at _sleep+0x1c6/frame 0xfffffe003671bb00
kern_clock_nanosleep() at kern_clock_nanosleep+0x1c1/frame 0xfffffe003671bb80
sys_nanosleep() at sys_nanosleep+0x3b/frame 0xfffffe003671bbc0
amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe003671bcf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe003671bcf0
--- syscall (240, FreeBSD ELF64, sys_nanosleep), rip = 0x80038c9fa, rsp = 0x7fffffffec18, rbp = 0x7fffffffec60 ---

Tracing command sh pid 15762 tid 100600 td 0xfffff80016b8e000
sched_switch() at sched_switch+0x606/frame 0xfffffe00366cb970
mi_switch() at mi_switch+0xdb/frame 0xfffffe00366cb9a0
sleepq_catch_signals() at sleepq_catch_signals+0x3f3/frame 0xfffffe00366cb9f0
sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe00366cba20
_sleep() at _sleep+0x1f1/frame 0xfffffe00366cbaa0
pipe_read() at pipe_read+0x3fe/frame 0xfffffe00366cbb10
dofileread() at dofileread+0x95/frame 0xfffffe00366cbb50
sys_read() at sys_read+0xc0/frame 0xfffffe00366cbbc0
amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe00366cbcf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00366cbcf0
--- syscall (3, FreeBSD ELF64, sys_read), rip = 0x80044f03a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe900 ---

Tracing command sh pid 15703 tid 100633 td 0xfffff80057237000
sched_switch() at sched_switch+0x606/frame 0xfffffe0036720800
mi_switch() at mi_switch+0xdb/frame 0xfffffe0036720830
sleepq_catch_signals() at sleepq_catch_signals+0x3f3/frame 0xfffffe0036720880
sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe00367208b0
_sleep() at _sleep+0x1f1/frame 0xfffffe0036720930
kern_wait6() at kern_wait6+0x59e/frame 0xfffffe00367209c0
sys_wait4() at sys_wait4+0x7d/frame 0xfffffe0036720bc0
amd64_sy

userunix

Little update. I updated Proxmox kernel to version 6.1 and now it's 4 days without crash. Perhaps the problem is/was Proxmox.

AdriftAtlas

The issue is likely in the Linux kernel, QEMU, and/or KVM. Likely the VM guest makes a CPU power management call of some sort that is not properly virtualized and it results in a VM panic.

My uptime is 6 days and 6 hours. I am running the stock kernel and the newest microcode. It was crashing with this config until I ran the two sysctl commands mentioned earlier. I'll let it run for a few more days and try to put them into system tunables so they activate on reboot.

AdriftAtlas

The issue is fixed with 0x24000024 microcode:
https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/post-538922