Server crashes when big amount of requests
-
Hi, my firewall server often crashes (when there's a big amount of requests), i use intel nics (SFP+), and newest snapshot of pfsence.
I think this is the error:
Fatal trap 12: page fault while in kernel mode cpuid = 7; apic id = 26 fault virtual address = 0x30 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff808a566e stack pointer = 0x28:0xfffffe0000521750 frame pointer = 0x28:0xfffffe00005217a0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (if_io_tqg_7) trap number = 12 panic: page fault cpuid = 7 time = 1655409054 KDB: enter: panic �����������������������������������������������������������������������������������������������������������������������������������������������panic.txt�������������������������������������������������������������������������������������������0600����0�������0�������12����������14252704636� 7142� �������
Thanks for any help.
-
We need to see the backtrace from the crash report at least. There is often useful data in the message buffer too.
Do you mean the latest development snapshot (2.7.0)?
Steve
-
@stephenw10 Okay, https://pastebin.com/B3mJvJUD. And yes, 2.7.0. Thanks.
-
-
Mmm, OK the backtrace shows:
db:0:kdb.enter.default> bt Tracing pid 0 tid 100045 td 0xfffff802300b7000 kdb_enter() at kdb_enter+0x37/frame 0xfffffe0000521510 vpanic() at vpanic+0x194/frame 0xfffffe0000521560 panic() at panic+0x43/frame 0xfffffe00005215c0 trap_fatal() at trap_fatal+0x38f/frame 0xfffffe0000521620 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0000521680 calltrap() at calltrap+0x8/frame 0xfffffe0000521680 --- trap 0xc, rip = 0xffffffff808a566e, rsp = 0xfffffe0000521750, rbp = 0xfffffe00005217a0 --- oce_multiq_transmit() at oce_multiq_transmit+0x1ee/frame 0xfffffe00005217a0 oce_multiq_start() at oce_multiq_start+0x76/frame 0xfffffe00005217d0 bridge_enqueue() at bridge_enqueue+0xaa/frame 0xfffffe0000521810 bridge_forward() at bridge_forward+0x3ef/frame 0xfffffe0000521870 bridge_input() at bridge_input+0x4c2/frame 0xfffffe00005218f0 ether_nh_input() at ether_nh_input+0x1ff/frame 0xfffffe0000521950 netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00005219a0 ether_input() at ether_input+0x89/frame 0xfffffe0000521a00 iflib_rxeof() at iflib_rxeof+0xaa6/frame 0xfffffe0000521ae0 _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe0000521b20 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe0000521b80 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd2/frame 0xfffffe0000521bb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe0000521bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000521bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
You have two unusual things there; brigdes and oce(4)NICs. And it looks like they are combined.
oce NICs are known to give trouble. There quite an extensive thread about it.
Has this ever worked reliably? Did it start to fail when you upgraded to 2.7?
Steve
-
@stephenw10 Hm, I have interfaces created over the NICs, connected with bridge. And LACP on WAN ports (SFP+). Same problem when using 2.6.0.
-
The easiest way to fix that is going to be to swap out the Emulex card for something else. We've spent tine trying to trouble shoot these before and never found a solution. For example:
https://forum.netgate.com/topic/168212/panic-string-bpf_mcopySteve