Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Crash report - Fatal trap 12: page fault while in kernel mode (on VMWARE)

    General pfSense Questions
    2
    13
    1.6k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fresnoboy
      last edited by

      Hi. Since doing an upgrade to 2.4.5, I have had 2 crashes over the last month.

      This is running as a guest vm on a current patched ESXi 6.7U3 host. Any ideas?

      textdumps.txt

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Nothing obvious there unfortunately.

        Could be something in pfatt.....

        Panics seem to immediately follow Avahi crashing out. Could be cause or symptom. I would try running with that disabled though if you can.

        Steve

        F 1 Reply Last reply Reply Quote 0
        • F
          fresnoboy @stephenw10
          last edited by

          @stephenw10

          Thanks for the reply. I can't run without avahi, because the house's music and AV systems all use Chromecasts...

          When I looked at the logs, it looked like some devices changed MAC addresses. That could be a chromecast that dropped off an ethernet adapter and then came back on Wifi. But that appeared to happen after the reboot, or am I not looking at the log entry properly?

          Was avahi crashing in both cases? Did avahi get upgraded when going from 2.4.4 to 2.4.5?

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Yes Avahi will have been updated.
            You are seeing some arp movement logs like:

            arp: 10.1.30.150 moved from 54:60:09:c0:d1:4e to 00:e0:4c:36:86:d2 on vmx0.30
            

            Since that's not Apple it could be an actual IP conflict.

            Check what those MACs (and the others shown) belong to.

            Steve

            F 1 Reply Last reply Reply Quote 0
            • F
              fresnoboy @stephenw10
              last edited by

              @stephenw10

              Stephen, they are all Chromecast Audio devices. The 00:e0 MAC addresses are for the USB connected ethernet adapter. The 54:60 addresses are the builtin wifi adapters. All of them end up on VLAN30, which is the vmx0.30 VLAN addreess.

              If the switch they are plugged into goes reboots, they can flip back to wifi, and then back to ethernet when the switch comes back online. The MAC addresses are different, but the chromecast will want the same IP address via DHCP since it's the same device and network.

              But I update switches all the time (they are unifi US-48's), and they don't crash pfense when I do that. The last time it happened was at 1 AM local time, and no switch upgrade happened then.

              I guess if the chromecast did a software update and rebooted, that could cause such a transition as well. Not sure why that should cause avahi trouble, and even if avahi crashed, why would it cause a kernel panic?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                It shouldn't, I agree. And that looks like legitimate use of the same IP.

                You may want to just stop logging those:
                https://docs.netgate.com/pfsense/en/latest/troubleshooting/logs-arp-moved.html

                The two crashes shown have different backtraces:

                db:0:kdb.enter.default>  bt
                Tracing pid 0 tid 100079 td 0xfffff80006654620
                kdb_enter() at kdb_enter+0x3b/frame 0xfffffe02387ef640
                vpanic() at vpanic+0x19b/frame 0xfffffe02387ef6a0
                panic() at panic+0x43/frame 0xfffffe02387ef700
                bpf_buffer_append_mbuf() at bpf_buffer_append_mbuf+0x64/frame 0xfffffe02387ef730
                catchpacket() at catchpacket+0x4b9/frame 0xfffffe02387ef7e0
                bpf_mtap() at bpf_mtap+0x200/frame 0xfffffe02387ef850
                ether_nh_input() at ether_nh_input+0xe9/frame 0xfffffe02387ef8b0
                netisr_dispatch_src() at netisr_dispatch_src+0xa2/frame 0xfffffe02387ef900
                ether_input() at ether_input+0x26/frame 0xfffffe02387ef920
                if_input() at if_input+0xa/frame 0xfffffe02387ef930
                em_rxeof() at em_rxeof+0x2e1/frame 0xfffffe02387ef9a0
                em_handle_que() at em_handle_que+0x40/frame 0xfffffe02387ef9e0
                taskqueue_run_locked() at taskqueue_run_locked+0x185/frame 0xfffffe02387efa40
                taskqueue_thread_loop() at taskqueue_thread_loop+0xb8/frame 0xfffffe02387efa70
                fork_exit() at fork_exit+0x83/frame 0xfffffe02387efab0
                fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe02387efab0
                --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
                
                db:0:kdb.enter.default>  bt
                Tracing pid 80764 tid 100760 td 0xfffff801e3d52620
                kdb_enter() at kdb_enter+0x3b/frame 0xfffffe0238ba3200
                vpanic() at vpanic+0x19b/frame 0xfffffe0238ba3260
                panic() at panic+0x43/frame 0xfffffe0238ba32c0
                trap_pfault() at trap_pfault/frame 0xfffffe0238ba3310
                trap_pfault() at trap_pfault+0x49/frame 0xfffffe0238ba3370
                trap() at trap+0x29d/frame 0xfffffe0238ba3480
                calltrap() at calltrap+0x8/frame 0xfffffe0238ba3480
                --- trap 0xc, rip = 0xffffffff80d579c3, rsp = 0xfffffe0238ba3550, rbp = 0xfffffe0238ba3560 ---
                m_tag_delete_chain() at m_tag_delete_chain+0x83/frame 0xfffffe0238ba3560
                mb_dtor_pack() at mb_dtor_pack+0x11/frame 0xfffffe0238ba3570
                uma_zfree_arg() at uma_zfree_arg+0x41/frame 0xfffffe0238ba35d0
                mb_free_ext() at mb_free_ext+0x101/frame 0xfffffe0238ba3600
                m_freem() at m_freem+0x48/frame 0xfffffe0238ba3620
                vmxnet3_stop() at vmxnet3_stop+0x283/frame 0xfffffe0238ba3670
                vmxnet3_init_locked() at vmxnet3_init_locked+0x27/frame 0xfffffe0238ba3700
                vmxnet3_ioctl() at vmxnet3_ioctl+0x39c/frame 0xfffffe0238ba3740
                ifhwioctl() at ifhwioctl+0x5f3/frame 0xfffffe0238ba37a0
                ifioctl() at ifioctl+0x475/frame 0xfffffe0238ba3840
                kern_ioctl() at kern_ioctl+0x267/frame 0xfffffe0238ba38b0
                sys_ioctl() at sys_ioctl+0x15b/frame 0xfffffe0238ba3980
                amd64_syscall() at amd64_syscall+0xa86/frame 0xfffffe0238ba3ab0
                fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0238ba3ab0
                --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x804e69fca, rsp = 0x7fffdfffbd58, rbp = 0x7fffdfffc5b0 ---
                

                Seems to be in mbufs for the first one. There are no mbuf exhaustion messages but make sure you have that set to 1M and shown as such on the dashboard.
                Looks almost exactly like this: https://forum.netgate.com/topic/147078/pfsense-reboot-kernel-panic-bpf_mcopy-v2-4-4-p3 No specific cause there though.

                Steve

                F 1 Reply Last reply Reply Quote 0
                • F
                  fresnoboy @stephenw10
                  last edited by

                  @stephenw10 Thanks for looking into this. I do have 1M MBUFs set as reflected in the dashboard.

                  As per the thread, I have increased the frags limit to 10000 (it was set at 5000 which is the default), and will see if that helps anything.

                  I do have a gigabit fiber connection, but it's not clear that should cause changes to the defaults, but if there are things I need to change, I'm happy to try it.

                  The system has been running fine for 2 days now. I'll keep an eye on it and see if it stays stable.

                  I can't ever remember a crash in this configuration under 2.4.4. Were there any changes in 2.4.5 that could have caused a problem?

                  Also, I did install the latest set of critical Vmware patches to 6.7U3 about a week before the first crash. Any change that could have affected something? The system is using ECC memory, and I am not seeing errors, so I think the hardware seems to not be a cause.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Mmm, there are not any frags limit log entries in the message buffer so you probably don't need to increase that. It won't hurt though.

                    There are no specific issues I'm aware of with VMWare and 2.4.5/p1. Nor with VMWare updates.

                    Steve

                    F 1 Reply Last reply Reply Quote 0
                    • F
                      fresnoboy @stephenw10
                      last edited by

                      textdump.tar.2.zip @stephenw10

                      Well, I just had another outage. Same mbuf panic, and this with double the frags I had allocated before.

                      Txtdump attached. Would love some ideas, or maybe I should revert back to 2.4.4? The config is backward compatible to 2.4.4 right?

                      thx!

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        No, current pfSense versions can import and update older config file versions but not the other way around. It might work OK.
                        But other thread showing this was running 2.4.4p3 anyway so I would suggest going to a 2.5 snapshot if you're going to do anything.

                        Steve

                        F 1 Reply Last reply Reply Quote 0
                        • F
                          fresnoboy @stephenw10
                          last edited by

                          @stephenw10

                          Ok. It's easy enough to take a snapshot that I can revert to since it's a vmware guest. I will go try a 2.5 version and see if it is better. Do you think there are relevant changes in the 3.5 train that could address this, or is it just trying something newer?

                          Does this crash have any more helpful data than the other two?

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            No that crash looks pretty much identical.

                            There are a lot of changes in pfSense 2.5 due to the FreeBSD 12 base. There are a whole raft of NIC changes that could affect this.

                            Steve

                            F 1 Reply Last reply Reply Quote 0
                            • F
                              fresnoboy @stephenw10
                              last edited by

                              @stephenw10

                              That makes a ton of sense. Will try it out today.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.