Pfsense crashes after plugging in cable (Mellanox connect x3 card)
-
Hello,
I have the issue thath when I try to connect a cable to my mellanox connect x3 card pfsense panics and crahses. I assume thath this has to to with pfsense wanting to change the mode of the interface to eth.
As when I boot up the firewall without any connections it works fine but does not show any of the interfaces. if I then look up the config of the interface with
sysctl -a | grep mlx
I see that the mode is auto (ib)
and when I then try to manually change it to eth with
sysctl sys.device.mlx4_core0.mlx4_port1=eth
it crashes the same way
as described in my other post: https://forum.netgate.com/topic/185479/mellanox-cards-change-from-infiniband-to-ethernetwhy could this happen? what can I do about it?
see crash report below
<118>pfSense 2.7.2-RELEASE amd64 20231206-2010
<118>Bootup complete
<6>mlx4_en mlx4_core0: Activating port:1Fatal trap 12: page fault while in kernel mode
cpuid = 37; apic id = 53
fault virtual address = 0x0
fault code = supervisor read instruction, page not present
instruction pointer = 0x20:0x0
stack pointer = 0x28:0xfffffe0100768a48
frame pointer = 0x28:0xfffffe0100768a70
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (mlx4)
rdi: fffff806c78c7000 rsi: fffffe0100768ac0 rdx: 00000000c0306938
rcx: 00000000c0306938 r8: 0000000000000000 r9: 0000000000000010
rax: 0000000000000000 rbx: fffffe0100768ac0 rbp: fffffe0100768a70
r10: 0000000000000000 r11: fffffe004783d000 r12: 0000000000008802
r13: fffff806c7307810 r14: fffffe016c00fac8 r15: 0000000000000016
trap number = 12
panic: page fault
cpuid = 37
time = 1704970470
KDB: enter: panicinterestingly after the crash it reboots itself and I can see one interface as mlxen0 and can assign it normally in the webgui
textdump.txtAny help is appreciated
-
Backtrace:
db:0:kdb.enter.default> bt Tracing pid 0 tid 100690 td 0xfffffe003fba53a0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe0100768720 vpanic() at vpanic+0x163/frame 0xfffffe0100768850 panic() at panic+0x43/frame 0xfffffe01007688b0 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe0100768910 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0100768970 calltrap() at calltrap+0x8/frame 0xfffffe0100768970 --- trap 0xc, rip = 0, rsp = 0xfffffe0100768a48, rbp = 0xfffffe0100768a70 --- ??() at 0/frame 0xfffffe0100768a70 dump_iface() at dump_iface+0x145/frame 0xfffffe0100768b20 rtnl_handle_ifevent() at rtnl_handle_ifevent+0xa9/frame 0xfffffe0100768ba0 if_attach_internal() at if_attach_internal+0x3cf/frame 0xfffffe0100768bf0 ether_ifattach() at ether_ifattach+0x2c/frame 0xfffffe0100768c30 mlx4_en_init_netdev() at mlx4_en_init_netdev+0xafb/frame 0xfffffe0100768cc0 mlx4_en_activate() at mlx4_en_activate+0x76/frame 0xfffffe0100768cf0 mlx4_add_device() at mlx4_add_device+0xe5/frame 0xfffffe0100768d30 mlx4_register_device() at mlx4_register_device+0xa8/frame 0xfffffe0100768d60 mlx4_change_port_types() at mlx4_change_port_types+0x15f/frame 0xfffffe0100768da0 mlx4_sense_port() at mlx4_sense_port+0x125/frame 0xfffffe0100768de0 linux_work_fn() at linux_work_fn+0xe4/frame 0xfffffe0100768e40 taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfffffe0100768ec0 taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe0100768ef0 fork_exit() at fork_exit+0x7f/frame 0xfffffe0100768f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0100768f30
Do you know that the card itself is good?
Have you tried connecting to it with multiple SFP modules?
-
-
@stephenw10 I have 4 of these cards installed in this server with 2 ports each. I have not tested each individual card however none work. could one defective card cause this behavior of all cards? if yes then I might need to test them individually. however if I plug in all the ports of all the Mellanox cards to my switch and boot the switch so it is operational and then boot the PFsense box, it starts with no errors and shows the interfaces even in the web gui.
when i do the
sysctl -a | grep mlx4_core
it show auto (eth) this time and i can set it to
sysctl sys.device.mlx4_core0.mlx4_port1=eth
manually whiteout issues for each network interface.
I can even reboot without issues as long as all cables stay connected. but once I disconnect one of the cables an reboot then the disconnected interface is missing and when I then plug in the disconnected interface again the firewall crashes
-
I'm not sure there's anything we can do about that in pfSense unfortunately.
Are you able to test FreeBSD 14 directly?
-
@stephenw10 okay that's not so good news. I am currently downloading an iso for testing with freebsd directly. I will update this post once I know more
-