Got System Panic this am

swixo

System panicked this am. Would you like the textdump, and where should I send it?

stephenw10

You can put the backtrace up here of you want. Usually enough to see what's happening and doesn't have any identifying info in it.

swixo

db:0:kdb.enter.default>  bt
Tracing pid 11 tid 100004 td 0xfffff8000565f740
kdb_enter() at kdb_enter+0x37/frame 0xfffffe0075d5efa0
vpanic() at vpanic+0x194/frame 0xfffffe0075d5eff0
panic() at panic+0x43/frame 0xfffffe0075d5f050
trap_fatal() at trap_fatal+0x38f/frame 0xfffffe0075d5f0b0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0075d5f110
calltrap() at calltrap+0x8/frame 0xfffffe0075d5f110
--- trap 0xc, rip = 0xffffffff80d833cc, rsp = 0xfffffe0075d5f1e0, rbp = 0xfffffe0075d5f200 ---
rm_cleanIPI() at rm_cleanIPI+0x5c/frame 0xfffffe0075d5f200
smp_rendezvous_action() at smp_rendezvous_action+0xac/frame 0xfffffe0075d5f230
Xrendezvous() at Xrendezvous+0xae/frame 0xfffffe0075d5f230
--- interrupt, rip = 0xffffffff804c5324, rsp = 0xfffffe0075d5f300, rbp = 0xfffffe0075d5f330 ---
acpi_cpu_idle() at acpi_cpu_idle+0x304/frame 0xfffffe0075d5f330
cpu_idle_acpi() at cpu_idle_acpi+0x3e/frame 0xfffffe0075d5f350
cpu_idle() at cpu_idle+0x9f/frame 0xfffffe0075d5f370
sched_idletd() at sched_idletd+0x326/frame 0xfffffe0075d5f430
fork_exit() at fork_exit+0x7e/frame 0xfffffe0075d5f470
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0075d5f470
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

stephenw10

The actual backtrace to the panic is the important part there.

I'm not familiar with that though.

Is there anything in the message buffer leading up to that? Any errors?

What snapshot was that? What hardware is it running on?

Steve

swixo

@stephenw10
Oh - found this in msg buffer -

<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d7:76 to 3c:ec:ef:44:d0:c0 on ixl3.410
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d0:c1 to 3c:ec:ef:44:d7:76 on ixl3.410
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d7:76 to 3c:ec:ef:44:d0:c0 on ixl3.410
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d0:c1 to 3c:ec:ef:44:d7:76 on ixl3.410
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d7:76 to 3c:ec:ef:44:d0:c0 on ixl3.410
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d0:c1 to 3c:ec:ef:44:d7:76 on ixl3.410
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d7:76 to 3c:ec:ef:44:d0:c0 on ixl3.410
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d0:c1 to 3c:ec:ef:44:d7:76 on ixl3.410
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d7:76 to 3c:ec:ef:44:d0:c0 on ixl3.410
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d0:c1 to 3c:ec:ef:44:d7:76 on ixl3.410
<6>ovpn1: changing name to 'ovpns3'
<6>arp: 192.168.1.15 moved from 3c:ec:ef:44:d0:c0 to 3c:ec:ef:44:d0:c1 on ixl3.410
<6>ovpns3: link state changed to UP
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0x10
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80d833cc
stack pointer	        = 0x28:0xfffffe0075d5f1e0
frame pointer	        = 0x28:0xfffffe0075d5f200
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= resume, IOPL = 0
current process		= 11 (idle: cpu1)
trap number		= 12
panic: page fault
cpuid = 1
time = 1654268809
KDB: enter: panic

stephenw10

Hmm, the ARP movements are just log spam. You can stop logging it if you know what those devices are and they are in a lagg for example. It wouldn't cause a panic.

https://docs.netgate.com/pfsense/en/latest/troubleshooting/logs-arp-moved.html

swixo

@stephenw10 Could it be related to the new OpenVPN DCO thing? I had made a changed to the tunnel (trying to debug a latency problem) when it panicked. It looks like the last thing was the tunnel enabling.

stephenw10

It was immediately afterwards? And you had DCO enabled?

It could well be related then. What snapshot is that and what hardware?

Steve

swixo

@stephenw10 22.05-BETA (amd64)
built on Tue May 31 06:20:27 UTC 2022
FreeBSD 12.3-STABLE

swixo

@swixo
Oh yes - DCO on - the tunnel that came up just before the panic - is a DCO tunnel

stephenw10

How many OpenVPN tunnels in total do you have there? How many are using DCO?

swixo

@stephenw10 Only this one is UP.

There are two others - but they were completely idle.

stephenw10

All 3 had DCO enabled though?

swixo

@stephenw10 Yes. They were enabled.

stephenw10

Mmm, OK. Is this the only time you have seen it?

swixo

@stephenw10 Yes - this is the first panic I have observed.

stephenw10

Ok, thanks. I've opened an internal bug for it. Our developers are looking into it.

Steve

swixo

@stephenw10 If helpful -

This system is the 'server' side of an s-s OpenVPN tunnel and it crashed.

The action that caused it was an update made to the remote client end. After that change - the tunnel went down, then up - then crashed.

swixo

@stephenw10 Will you let us know when a change is logged internally that should affect this? I dont want to try again until we have a resolution.

stephenw10

We can't know for certain since as far as I know you are the only person who had hit that. We never replicated it here. But we have applied a fix for what appears to be the bug that caused it. It's in 22.05-RC now.

Steve