Crash report - Fatal trap 12: page fault while in kernel mode (on VMWARE)
Hi. Since doing an upgrade to 2.4.5, I have had 2 crashes over the last month.
This is running as a guest vm on a current patched ESXi 6.7U3 host. Any ideas?
Nothing obvious there unfortunately.
Could be something in pfatt.....
Panics seem to immediately follow Avahi crashing out. Could be cause or symptom. I would try running with that disabled though if you can.
Thanks for the reply. I can't run without avahi, because the house's music and AV systems all use Chromecasts...
When I looked at the logs, it looked like some devices changed MAC addresses. That could be a chromecast that dropped off an ethernet adapter and then came back on Wifi. But that appeared to happen after the reboot, or am I not looking at the log entry properly?
Was avahi crashing in both cases? Did avahi get upgraded when going from 2.4.4 to 2.4.5?
Yes Avahi will have been updated.
You are seeing some arp movement logs like:
arp: 10.1.30.150 moved from 54:60:09:c0:d1:4e to 00:e0:4c:36:86:d2 on vmx0.30
Since that's not Apple it could be an actual IP conflict.
Check what those MACs (and the others shown) belong to.
Stephen, they are all Chromecast Audio devices. The 00:e0 MAC addresses are for the USB connected ethernet adapter. The 54:60 addresses are the builtin wifi adapters. All of them end up on VLAN30, which is the vmx0.30 VLAN addreess.
If the switch they are plugged into goes reboots, they can flip back to wifi, and then back to ethernet when the switch comes back online. The MAC addresses are different, but the chromecast will want the same IP address via DHCP since it's the same device and network.
But I update switches all the time (they are unifi US-48's), and they don't crash pfense when I do that. The last time it happened was at 1 AM local time, and no switch upgrade happened then.
I guess if the chromecast did a software update and rebooted, that could cause such a transition as well. Not sure why that should cause avahi trouble, and even if avahi crashed, why would it cause a kernel panic?
It shouldn't, I agree. And that looks like legitimate use of the same IP.
You may want to just stop logging those:
The two crashes shown have different backtraces:
db:0:kdb.enter.default> bt Tracing pid 0 tid 100079 td 0xfffff80006654620 kdb_enter() at kdb_enter+0x3b/frame 0xfffffe02387ef640 vpanic() at vpanic+0x19b/frame 0xfffffe02387ef6a0 panic() at panic+0x43/frame 0xfffffe02387ef700 bpf_buffer_append_mbuf() at bpf_buffer_append_mbuf+0x64/frame 0xfffffe02387ef730 catchpacket() at catchpacket+0x4b9/frame 0xfffffe02387ef7e0 bpf_mtap() at bpf_mtap+0x200/frame 0xfffffe02387ef850 ether_nh_input() at ether_nh_input+0xe9/frame 0xfffffe02387ef8b0 netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe02387ef900 ether_input() at ether_input+0x26/frame 0xfffffe02387ef920 if_input() at if_input+0xa/frame 0xfffffe02387ef930 em_rxeof() at em_rxeof+0x2e1/frame 0xfffffe02387ef9a0 em_handle_que() at em_handle_que+0x40/frame 0xfffffe02387ef9e0 taskqueue_run_locked() at taskqueue_run_locked+0x185/frame 0xfffffe02387efa40 taskqueue_thread_loop() at taskqueue_thread_loop+0xb8/frame 0xfffffe02387efa70 fork_exit() at fork_exit+0x83/frame 0xfffffe02387efab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe02387efab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db:0:kdb.enter.default> bt Tracing pid 80764 tid 100760 td 0xfffff801e3d52620 kdb_enter() at kdb_enter+0x3b/frame 0xfffffe0238ba3200 vpanic() at vpanic+0x19b/frame 0xfffffe0238ba3260 panic() at panic+0x43/frame 0xfffffe0238ba32c0 trap_pfault() at trap_pfault/frame 0xfffffe0238ba3310 trap_pfault() at trap_pfault+0x49/frame 0xfffffe0238ba3370 trap() at trap+0x29d/frame 0xfffffe0238ba3480 calltrap() at calltrap+0x8/frame 0xfffffe0238ba3480 --- trap 0xc, rip = 0xffffffff80d579c3, rsp = 0xfffffe0238ba3550, rbp = 0xfffffe0238ba3560 --- m_tag_delete_chain() at m_tag_delete_chain+0x83/frame 0xfffffe0238ba3560 mb_dtor_pack() at mb_dtor_pack+0x11/frame 0xfffffe0238ba3570 uma_zfree_arg() at uma_zfree_arg+0x41/frame 0xfffffe0238ba35d0 mb_free_ext() at mb_free_ext+0x101/frame 0xfffffe0238ba3600 m_freem() at m_freem+0x48/frame 0xfffffe0238ba3620 vmxnet3_stop() at vmxnet3_stop+0x283/frame 0xfffffe0238ba3670 vmxnet3_init_locked() at vmxnet3_init_locked+0x27/frame 0xfffffe0238ba3700 vmxnet3_ioctl() at vmxnet3_ioctl+0x39c/frame 0xfffffe0238ba3740 ifhwioctl() at ifhwioctl+0x5f3/frame 0xfffffe0238ba37a0 ifioctl() at ifioctl+0x475/frame 0xfffffe0238ba3840 kern_ioctl() at kern_ioctl+0x267/frame 0xfffffe0238ba38b0 sys_ioctl() at sys_ioctl+0x15b/frame 0xfffffe0238ba3980 amd64_syscall() at amd64_syscall+0xa86/frame 0xfffffe0238ba3ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0238ba3ab0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x804e69fca, rsp = 0x7fffdfffbd58, rbp = 0x7fffdfffc5b0 ---
Seems to be in mbufs for the first one. There are no mbuf exhaustion messages but make sure you have that set to 1M and shown as such on the dashboard.
Looks almost exactly like this: https://forum.netgate.com/topic/147078/pfsense-reboot-kernel-panic-bpf_mcopy-v2-4-4-p3 No specific cause there though.
@stephenw10 Thanks for looking into this. I do have 1M MBUFs set as reflected in the dashboard.
As per the thread, I have increased the frags limit to 10000 (it was set at 5000 which is the default), and will see if that helps anything.
I do have a gigabit fiber connection, but it's not clear that should cause changes to the defaults, but if there are things I need to change, I'm happy to try it.
The system has been running fine for 2 days now. I'll keep an eye on it and see if it stays stable.
I can't ever remember a crash in this configuration under 2.4.4. Were there any changes in 2.4.5 that could have caused a problem?
Also, I did install the latest set of critical Vmware patches to 6.7U3 about a week before the first crash. Any change that could have affected something? The system is using ECC memory, and I am not seeing errors, so I think the hardware seems to not be a cause.
Mmm, there are not any frags limit log entries in the message buffer so you probably don't need to increase that. It won't hurt though.
There are no specific issues I'm aware of with VMWare and 2.4.5/p1. Nor with VMWare updates.
No, current pfSense versions can import and update older config file versions but not the other way around. It might work OK.
But other thread showing this was running 2.4.4p3 anyway so I would suggest going to a 2.5 snapshot if you're going to do anything.
Ok. It's easy enough to take a snapshot that I can revert to since it's a vmware guest. I will go try a 2.5 version and see if it is better. Do you think there are relevant changes in the 3.5 train that could address this, or is it just trying something newer?
Does this crash have any more helpful data than the other two?
No that crash looks pretty much identical.
There are a lot of changes in pfSense 2.5 due to the FreeBSD 12 base. There are a whole raft of NIC changes that could affect this.
That makes a ton of sense. Will try it out today.