Random crash
-
Hi, Everyone.
May I ask your expertise and experience in troubleshooting kernel panic. I have experience this kernel panic when looking in suricata alert logs. I was thinking it may be suricata. I just uninstall it and replace it with snort but still the issue persist but not as frequent when using suricata.
I have also attached the textdump.tar.0 file. I hope you can help point out what may have caused this crash.
Below is Fatal trap captured:
Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80ec01fe stack pointer = 0x28:0xfffffe003f946a90 frame pointer = 0x28:0xfffffe003f946ac0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock (0)) trap number = 12 panic: page fault cpuid = 1 time = 1628755983 KDB: enter: panic
-
The key parts of that are:
db:0:kdb.enter.default> show pcpu cpuid = 1 dynamic pcpu = 0xfffffe007f102380 curthread = 0xfffff800043c4000: pid 12 tid 100035 "swi4: clock (0)" curpcb = 0xfffff800043c45a0 fpcurthread = none idlethread = 0xfffff8000432a740: tid 100004 "idle: cpu1" curpmap = 0xffffffff8368d5a8 tssp = 0xffffffff83717688 commontssp = 0xffffffff83717688 rsp0 = 0xfffffe003f946e00 kcr3 = 0x3d0b000 ucr3 = 0xffffffffffffffff scr3 = 0x223630000 gs32p = 0xffffffff8371dea0 ldt = 0xffffffff8371dee0 tss = 0xffffffff8371ded0 tlb gen = 1389740 curvnet = 0 db:0:kdb.enter.default> bt Tracing pid 12 tid 100035 td 0xfffff800043c4000 kdb_enter() at kdb_enter+0x37/frame 0xfffffe003f946750 vpanic() at vpanic+0x197/frame 0xfffffe003f9467a0 panic() at panic+0x43/frame 0xfffffe003f946800 trap_fatal() at trap_fatal+0x391/frame 0xfffffe003f946860 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe003f9468b0 trap() at trap+0x286/frame 0xfffffe003f9469c0 calltrap() at calltrap+0x8/frame 0xfffffe003f9469c0 --- trap 0xc, rip = 0xffffffff80ec01fe, rsp = 0xfffffe003f946a90, rbp = 0xfffffe003f946ac0 --- ether_8021q_frame() at ether_8021q_frame+0x2e/frame 0xfffffe003f946ac0 vlan_transmit() at vlan_transmit+0xc8/frame 0xfffffe003f946b30 vlan_altq_start() at vlan_altq_start+0xb4/frame 0xfffffe003f946b60 cbqrestart() at cbqrestart+0x64/frame 0xfffffe003f946b90 rmc_restart() at rmc_restart+0x6f/frame 0xfffffe003f946bc0 softclock_call_cc() at softclock_call_cc+0x141/frame 0xfffffe003f946c70 softclock() at softclock+0x79/frame 0xfffffe003f946c90 ithread_loop() at ithread_loop+0x23c/frame 0xfffffe003f946cf0 fork_exit() at fork_exit+0x7e/frame 0xfffffe003f946d30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe003f946d30 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
This looks like a possible traffic shaping issue though because your logs are filled with:
config_aqm Unable to configure flowset, flowset busy! config_aqm Unable to configure flowset, flowset busy! config_aqm Unable to configure flowset, flowset busy! config_aqm Unable to configure flowset, flowset busy! config_aqm Unable to configure flowset, flowset busy!
Though previously that error has been harmless: https://redmine.pfsense.org/issues/8991
The backtrace shows it's VLAN related. If this has just started happening did you add a VLAN interface perhaps?Steve
-
Thank you so much for your response, you may be correct. I have added a VLAN but it is a while ago, and no issue encounter. To think of it, this issue started happening when I recreated the QoS from PRIQ to CBQ and also including this new VLAN that I have created in the list.
Can you suggest what is the best approach to this? Do I recreate or just edit the existing and change back to PRIQ?
-
I would try one things at a time to try to isolate it.
So maybe remove queues from the VLAN first. If that doesn't work then switch back to PRIQ as a test.
Steve
-
Thanks for the suggestion, I will follow this and see if that fixes the issue. I'll update this ticket after a week to provide some update.
-
As of this moment I have not seen any crash. Thanks for the help I will leave that vlan not included in QoS for now.
-
Looks like you're hitting this: https://redmine.pfsense.org/issues/11470
-
This post is deleted! -
Thanks for that info, I have created an account and provided the crash dump into the redmine ticket. I hope I could have provided the right information so they can fix the issue.