pfsense reboot randomly on vmware
-
Do you have a full crash report from that?
-
@stephenw10 see attachmentdump.txt
-
Backtrace:
db:0:kdb.enter.default> bt Tracing pid 11 tid 100006 td 0xfffffe00c5876e40 kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c57a6880 vpanic() at vpanic+0x163/frame 0xfffffe00c57a69b0 panic() at panic+0x43/frame 0xfffffe00c57a6a10 trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00c57a6a70 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c57a6ad0 calltrap() at calltrap+0x8/frame 0xfffffe00c57a6ad0 --- trap 0xc, rip = 0xffffffff80af1d90, rsp = 0xfffffe00c57a6ba0, rbp = 0xfffffe00c57a6ba0 --- vmxnet3_isc_txd_credits_update() at vmxnet3_isc_txd_credits_update+0x20/frame 0xfffffe00c57a6ba0 iflib_fast_intr_rxtx() at iflib_fast_intr_rxtx+0xf7/frame 0xfffffe00c57a6c00 intr_event_handle() at intr_event_handle+0x126/frame 0xfffffe00c57a6c70 intr_execute_handlers() at intr_execute_handlers+0x49/frame 0xfffffe00c57a6ca0 Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe00c57a6ca0 --- interrupt, rip = 0xffffffff81255c76, rsp = 0xfffffe00c57a6d70, rbp = 0xfffffe00c57a6d70 --- acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfffffe00c57a6d70 acpi_cpu_idle() at acpi_cpu_idle+0x2fe/frame 0xfffffe00c57a6db0 cpu_idle_acpi() at cpu_idle_acpi+0x46/frame 0xfffffe00c57a6dd0 cpu_idle() at cpu_idle+0x9d/frame 0xfffffe00c57a6df0 sched_idletd() at sched_idletd+0x576/frame 0xfffffe00c57a6ef0 fork_exit() at fork_exit+0x7f/frame 0xfffffe00c57a6f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c57a6f30
Panic 1:
Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0xfffffe00c5e00008 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80af1d90 stack pointer = 0x28:0xfffffe00c57a6ba0 frame pointer = 0x28:0xfffffe00c57a6ba0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 11 (idle: cpu3) rdi: fffff80005ebf800 rsi: 0000000000000240 rdx: 0000000000000000 rcx: 0000000000000000 r8: 0000000000002000 r9: fffffe001e98f000 rax: fffffe00c5dfe000 rbx: fffff80005db6800 rbp: fffffe00c57a6ba0 r10: fffffe001e98fa30 r11: 0000000000000001 r12: 0000000000000003 r13: 0000000000001ec0 r14: fffffe00c59bb000 r15: 0000000000000000 trap number = 12 panic: page fault cpuid = 3 time = 1701779519 KDB: enter: panic
Panic 2:
Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0xfffffe00c5e00008 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80af1d90 stack pointer = 0x28:0xfffffe0120f61c90 frame pointer = 0x28:0xfffffe0120f61c90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 0 (wg_tqg_3) rdi: fffff80005e80800 rsi: 0000000000000240 rdx: 0000000000000000 rcx: 0000000000000000 r8: 0000000000002000 r9: fffffe0120f62000 rax: fffffe00c5dfe000 rbx: fffff80005db6800 rbp: fffffe0120f61c90 r10: 00000000000001f4 r11: 00000000800b1470 r12: 0000000000000003 r13: 0000000000001ec0 r14: fffffe00c59bb000 r15: 0000000000000000 trap number = 12 panic: page fault cpuid = 3 time = 1701788100 KDB: enter: panic Uptime: 2h11m12s
It's the same as this thread: https://forum.netgate.com/topic/182898/crash-report-14-0-current-freebsd-14-0-current-1-releng_2_7_0-n255866-686c8d3c1f0/
Though I'm still not convinced the cause there was bad ram. Unless maybe if you are also using a Dell server as host.Steve
-
@stephenw10 the hypervisor vmware 8.0 running on PowerEdge R630, DBE and ram on IDRAC show is ok
the physical network is Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet and virtual interafce on vm pfsense are vmxnet with openvmtoolls installed -
The only other thing it looks like it is this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239118
But that is old and we have that patch anyway: https://github.com/pfsense/FreeBSD-src/commit/4166913371c9be822cfa34419c397f711de83b49
-
@stephenw10 said in pfsense reboot randomly on vmware:
But that is old and we have that patch anyway
how do i apply the patch? can you help me?
-
@security_sharezone becasue on my system not have directory sys
i have only dev but not present vmware -
@security_sharezone said in pfsense reboot randomly on vmware:
@stephenw10 said in pfsense reboot randomly on vmware:
But that is old and we have that patch anyway
how do i apply the patch? can you help me?
He means the patch is already included in the current pfSense operating system code. It does not need to be applied. That bug is old enough that fix for it was included in the latest pfSense kernel builds.
And binary patches like that at the OS level are not something you can easily apply anyway.
-
That's not a patch you can apply. It's applied to the source and we already have it in. I only pointed that out because that's the only other thing that's close to that backtrace.
-
@bmeeks said in pfsense reboot randomly on vmware:
He means the patch is already included in the current pfSense operating system code. It does not need to be applied. That bug is old enough that fix for it was included in the latest pfSense kernel builds.
And binary patches like that at the OS level are not something you can easily apply anyway.
unfortunately it keeps rebooting. last time it lasted 55 minutes. but could it be that i'm averaging 10mb in wireguard vpn and it's the vpn that's crashing?
-
Possibly. One of the panics shows it's in wg at the time.
Do you have any other crash reports? Does it always show the same backtrace?
-
@stephenw10 in this moment i reset and delete all file and log . i waiting another crash .
-
-
@STEPHENW10 on attacchment crach dump .
I've been waiting to collect some logs and understand the problem. but I can't get to the bottom of it. I'd analyse the wireguard vpn. but I really can't make heads or tails of it :-). can you think of any ideas?crash_dump.txt
-
Only one of those 4 crashes seems to be related to wireguard. Are you able to test just disabling WireGuard?
Another good test here would be swapping the NIC that's on to something other than VMX3 in the hypervisor. That would require quite a few changes though.
-
@stephenw10 said in pfsense reboot randomly on vmware:
Only one of those 4 crashes seems to be related to wireguard. Are you able to test just disabling WireGuard?
Another good test here would be swapping the NIC that's on to something other than VMX3 in the hypervisor. That would require quite a few changes though.
i disabled all backup jobs via veeam that used wireguard. changing vmxnet3 to e1000 would also have a drop in performance
-
Did you actually disable WireGuard befe those crashes? One pf them is in the wg process shortly before it crashes.
It would still be a good test to switch to e1000.
-
@stephenw10 said in pfsense reboot randomly on vmware:
Did you actually disable WireGuard befe those crashes? One pf them is in the wg process shortly before it crashes.
It would still be a good test to switch to e1000.
no disabled now ... monitoring wireguard tunnels without traffic .. switching to e1000 a big job .. i'll make a vm clone and then convert everything
-
Yes, it's a significant task. But it would prove the issue is in the vmxnet3 driver. Or disprove it.
-
I will definitely do it. my steps are
- wireguard down for two days and see if problem backup job
- change e1000
I will give feedback. if a solution is found in the meantime I would be happy
-
@stephenw10 in this moment 20 hours without reboot .
wireguard is up but without backup running . low traffic transit .