pfSense kernel panic
-
I have been installing pfSense on Proxmox for a week and almost every day I register a crash but I have no idea what caused it. Among the logs I read "panic: spin lock held too long", can you help me?
I only have the logs from the last crash:
Tell me if you need something else.
-
Backtrace:
db:0:kdb.enter.default> bt Tracing pid 23 tid 100135 td 0xfffff8000626a740 kdb_enter() at kdb_enter+0x37/frame 0xfffffe004ca2c630 vpanic() at vpanic+0x197/frame 0xfffffe004ca2c680 panic() at panic+0x43/frame 0xfffffe004ca2c6e0 _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x69/frame 0xfffffe004ca2c6f0 _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe004ca2c760 smp_targeted_tlb_shootdown() at smp_targeted_tlb_shootdown+0x484/frame 0xfffffe004ca2c7e0 pmap_ts_referenced() at pmap_ts_referenced+0xa90/frame 0xfffffe004ca2c8b0 vm_pageout_worker() at vm_pageout_worker+0xf88/frame 0xfffffe004ca2cc70 vm_pageout() at vm_pageout+0x193/frame 0xfffffe004ca2ccb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe004ca2ccf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004ca2ccf0
Panic:
spin lock 0xffffffff836e0f00 (smp rendezvous) held by 0xfffff80006c77740 (tid 100645) too long panic: spin lock held too long cpuid = 1 time = 1673039279 KDB: enter: panic
How do you have the VM configured in Proxmox? I have numerous pfSense installs running in Proxmox and never see an issue.
Steve
-
You're in good company:
VM freezes irregularlyTL;DR: VMs running on a Jasper Lake CPU Proxmox host crash. The host normally stays up and LXC containers are not affected. Issue appears to be related to C-States.
Try running the following in pfSense shell and see if uptime improves. These values will reset to defaults on next reboot.
sysctl machdep.idle_mwait=0 sysctl machdep.idle=hlt
So far my latest uptime is 4 days. Previous ones were nearly 3 days and less than a day.
Mine:
kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x1008e fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80da2d71 stack pointer = 0x28:0xfffffe0025782b00 frame pointer = 0x28:0xfffffe0025782b60 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 11 (idle: cpu0) trap number = 12 panic: page fault cpuid = 0 time = 1672654637 KDB: enter: panic db:0:kdb.enter.default> bt Tracing pid 11 tid 100003 td 0xfffff8000520d000 kdb_enter() at kdb_enter+0x37/frame 0xfffffe00257828c0 vpanic() at vpanic+0x194/frame 0xfffffe0025782910 panic() at panic+0x43/frame 0xfffffe0025782970 trap_fatal() at trap_fatal+0x38f/frame 0xfffffe00257829d0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0025782a30 calltrap() at calltrap+0x8/frame 0xfffffe0025782a30 --- trap 0xc, rip = 0xffffffff80da2d71, rsp = 0xfffffe0025782b00, rbp = 0xfffffe0025782b60 --- callout_process() at callout_process+0x1b1/frame 0xfffffe0025782b60 handleevents() at handleevents+0x188/frame 0xfffffe0025782ba0 cpu_activeclock() at cpu_activeclock+0x70/frame 0xfffffe0025782bd0 cpu_idle() at cpu_idle+0xa8/frame 0xfffffe0025782bf0 sched_idletd() at sched_idletd+0x326/frame 0xfffffe0025782cb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe0025782cf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0025782cf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db:0:kdb.enter.default> alltrace Tracing command sleep pid 35878 tid 100632 td 0xfffff80057237740 sched_switch() at sched_switch+0x606/frame 0xfffffe003671b9c0 mi_switch() at mi_switch+0xdb/frame 0xfffffe003671b9f0 sleepq_catch_signals() at sleepq_catch_signals+0x3f3/frame 0xfffffe003671ba40 sleepq_timedwait_sig() at sleepq_timedwait_sig+0x14/frame 0xfffffe003671ba80 _sleep() at _sleep+0x1c6/frame 0xfffffe003671bb00 kern_clock_nanosleep() at kern_clock_nanosleep+0x1c1/frame 0xfffffe003671bb80 sys_nanosleep() at sys_nanosleep+0x3b/frame 0xfffffe003671bbc0 amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe003671bcf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe003671bcf0 --- syscall (240, FreeBSD ELF64, sys_nanosleep), rip = 0x80038c9fa, rsp = 0x7fffffffec18, rbp = 0x7fffffffec60 --- Tracing command sh pid 15762 tid 100600 td 0xfffff80016b8e000 sched_switch() at sched_switch+0x606/frame 0xfffffe00366cb970 mi_switch() at mi_switch+0xdb/frame 0xfffffe00366cb9a0 sleepq_catch_signals() at sleepq_catch_signals+0x3f3/frame 0xfffffe00366cb9f0 sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe00366cba20 _sleep() at _sleep+0x1f1/frame 0xfffffe00366cbaa0 pipe_read() at pipe_read+0x3fe/frame 0xfffffe00366cbb10 dofileread() at dofileread+0x95/frame 0xfffffe00366cbb50 sys_read() at sys_read+0xc0/frame 0xfffffe00366cbbc0 amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe00366cbcf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00366cbcf0 --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x80044f03a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe900 --- Tracing command sh pid 15703 tid 100633 td 0xfffff80057237000 sched_switch() at sched_switch+0x606/frame 0xfffffe0036720800 mi_switch() at mi_switch+0xdb/frame 0xfffffe0036720830 sleepq_catch_signals() at sleepq_catch_signals+0x3f3/frame 0xfffffe0036720880 sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe00367208b0 _sleep() at _sleep+0x1f1/frame 0xfffffe0036720930 kern_wait6() at kern_wait6+0x59e/frame 0xfffffe00367209c0 sys_wait4() at sys_wait4+0x7d/frame 0xfffffe0036720bc0 amd64_sy
-
Little update. I updated Proxmox kernel to version 6.1 and now it's 4 days without crash. Perhaps the problem is/was Proxmox.
-
The issue is likely in the Linux kernel, QEMU, and/or KVM. Likely the VM guest makes a CPU power management call of some sort that is not properly virtualized and it results in a VM panic.
My uptime is 6 days and 6 hours. I am running the stock kernel and the newest microcode. It was crashing with this config until I ran the two sysctl commands mentioned earlier. I'll let it run for a few more days and try to put them into system tunables so they activate on reboot.
-
-
-
The issue is fixed with 0x24000024 microcode:
https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/post-538922 -