Server crashes when big amount of requests

HrustakV

Hi, my firewall server often crashes (when there's a big amount of requests), i use intel nics (SFP+), and newest snapshot of pfsence.

I think this is the error:

Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 26
fault virtual address    = 0x30
fault code        = supervisor read data, page not present
instruction pointer    = 0x20:0xffffffff808a566e
stack pointer            = 0x28:0xfffffe0000521750
frame pointer            = 0x28:0xfffffe00005217a0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 0 (if_io_tqg_7)
trap number        = 12
panic: page fault
cpuid = 7
time = 1655409054
KDB: enter: panic
�����������������������������������������������������������������������������������������������������������������������������������������������panic.txt�������������������������������������������������������������������������������������������0600����0�������0�������12����������14252704636�  7142� �������

Thanks for any help.

stephenw10

We need to see the backtrace from the crash report at least. There is often useful data in the message buffer too.

Do you mean the latest development snapshot (2.7.0)?

Steve

HrustakV

@stephenw10 Okay, https://pastebin.com/B3mJvJUD. And yes, 2.7.0. Thanks.

stephenw10

Mmm, OK the backtrace shows:

db:0:kdb.enter.default>  bt
Tracing pid 0 tid 100045 td 0xfffff802300b7000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe0000521510
vpanic() at vpanic+0x194/frame 0xfffffe0000521560
panic() at panic+0x43/frame 0xfffffe00005215c0
trap_fatal() at trap_fatal+0x38f/frame 0xfffffe0000521620
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0000521680
calltrap() at calltrap+0x8/frame 0xfffffe0000521680
--- trap 0xc, rip = 0xffffffff808a566e, rsp = 0xfffffe0000521750, rbp = 0xfffffe00005217a0 ---
oce_multiq_transmit() at oce_multiq_transmit+0x1ee/frame 0xfffffe00005217a0
oce_multiq_start() at oce_multiq_start+0x76/frame 0xfffffe00005217d0
bridge_enqueue() at bridge_enqueue+0xaa/frame 0xfffffe0000521810
bridge_forward() at bridge_forward+0x3ef/frame 0xfffffe0000521870
bridge_input() at bridge_input+0x4c2/frame 0xfffffe00005218f0
ether_nh_input() at ether_nh_input+0x1ff/frame 0xfffffe0000521950
netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00005219a0
ether_input() at ether_input+0x89/frame 0xfffffe0000521a00
iflib_rxeof() at iflib_rxeof+0xaa6/frame 0xfffffe0000521ae0
_task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe0000521b20
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe0000521b80
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd2/frame 0xfffffe0000521bb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe0000521bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000521bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

You have two unusual things there; brigdes and oce(4)NICs. And it looks like they are combined.

oce NICs are known to give trouble. There quite an extensive thread about it.

Has this ever worked reliably? Did it start to fail when you upgraded to 2.7?

Steve

HrustakV

@stephenw10 Hm, I have interfaces created over the NICs, connected with bridge. And LACP on WAN ports (SFP+). Same problem when using 2.6.0.

stephenw10

The easiest way to fix that is going to be to swap out the Emulex card for something else. We've spent tine trying to trouble shoot these before and never found a solution. For example:
https://forum.netgate.com/topic/168212/panic-string-bpf_mcopy

Steve