Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Crash report

    Scheduled Pinned Locked Moved General pfSense Questions
    24 Posts 3 Posters 2.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fireix @stephenw10
      last edited by

      @stephenw10 This happened again now. I can't see anything labeled bt>.

      So strange that this started happening now, been running stable for years.

      Fatal trap 12: page fault while in kernel mode
      cpuid = 0; apic id = 00
      fault virtual address = 0x20
      fault code = supervisor read data, page not present

      1 Reply Last reply Reply Quote 1
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Can we see the full crash report then?

        F 1 Reply Last reply Reply Quote 0
        • F
          fireix @stephenw10
          last edited by fireix

          @stephenw10 Only issue is that it has lot of IPs (including public) in it, so didn't want to post it here. But removed sensitive stuff, here is first dump-file:

          textdump-2021.txt

          And here is 2nd:

          Dump header from device: /dev/mirror/pfSenseMirrorp3
            Architecture: amd64
            Architecture Version: 4
            Dump Length: 157696
            Blocksize: 512
            Compression: none
            Dumptime: Sun Apr 25 00:14:21 2021
            Hostname: XX
            Magic: FreeBSD Text Dump
            Version String: FreeBSD 12.2-STABLE d48fb226319(devel-12) pfSense
            Panic String: page fault
            Dump Parity: 411090558
            Bounds: 0
            Dump Status: good
          
          
          1 Reply Last reply Reply Quote 1
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Ok, so we can see the backtrace in that here:

            db:0:kdb.enter.default>  bt
            Tracing pid 0 tid 100046 td 0xfffff8000461d740
            kdb_enter() at kdb_enter+0x37/frame 0xfffffe000055a140
            vpanic() at vpanic+0x197/frame 0xfffffe000055a190
            panic() at panic+0x43/frame 0xfffffe000055a1f0
            trap_fatal() at trap_fatal+0x391/frame 0xfffffe000055a250
            trap_pfault() at trap_pfault+0x4f/frame 0xfffffe000055a2a0
            trap() at trap+0x286/frame 0xfffffe000055a3b0
            calltrap() at calltrap+0x8/frame 0xfffffe000055a3b0
            --- trap 0xc, rip = 0xffffffff80e024b5, rsp = 0xfffffe000055a480, rbp = 0xfffffe000055a490 ---
            turnstile_broadcast() at turnstile_broadcast+0x45/frame 0xfffffe000055a490
            __mtx_unlock_sleep() at __mtx_unlock_sleep+0x7f/frame 0xfffffe000055a4c0
            pf_find_state() at pf_find_state+0x21c/frame 0xfffffe000055a500
            pf_test_state_tcp() at pf_test_state_tcp+0x1b6/frame 0xfffffe000055a620
            pf_test() at pf_test+0x1f64/frame 0xfffffe000055a870
            pf_check_in() at pf_check_in+0x1d/frame 0xfffffe000055a890
            pfil_run_hooks() at pfil_run_hooks+0xa1/frame 0xfffffe000055a930
            ip_input() at ip_input+0x475/frame 0xfffffe000055a9e0
            netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe000055aa30
            ether_demux() at ether_demux+0x16a/frame 0xfffffe000055aa60
            ether_nh_input() at ether_nh_input+0x330/frame 0xfffffe000055aac0
            netisr_dispatch_src() at netisr_dispatch_src+0xca/frame 0xfffffe000055ab10
            ether_input() at ether_input+0x4b/frame 0xfffffe000055ab40
            iflib_rxeof() at iflib_rxeof+0xae6/frame 0xfffffe000055ac20
            _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe000055ac60
            gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe000055acc0
            gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe000055acf0
            fork_exit() at fork_exit+0x7e/frame 0xfffffe000055ad30
            fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000055ad30
            --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
            

            Look like the message buffer has been removed.

            The first thing to do is compare that backtrace with one from another crash report. If they are all identical or very similar it's probably a software issue at least.

            Steve

            F 1 Reply Last reply Reply Quote 0
            • F
              fireix @stephenw10
              last edited by

              @stephenw10 textdump-old.txt

              That is dump from two weeks earlier.

              Dump header from device: /dev/mirror/pfSenseMirrorp3
                Architecture: amd64
                Architecture Version: 4
                Dump Length: 157696
                Blocksize: 512
                Compression: none
                Dumptime: Sat Apr 10 04:29:57 2021
                Hostname: 
                Magic: FreeBSD Text Dump
                Version String: FreeBSD 12.2-STABLE d48fb226319(devel-12) pfSense
                Panic String: page fault
                Dump Parity: 2148879230
                Bounds: 0
                Dump Status: good
              
              
              1 Reply Last reply Reply Quote 1
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Ok, so virtually identical.

                That is 2.5.1 yes? It looks a lot like an old crash that should be fixed in 2.5.1.

                Steve

                F 1 Reply Last reply Reply Quote 0
                • F
                  fireix @stephenw10
                  last edited by

                  @stephenw10 2.5.1-RELEASE (amd64)
                  built on Mon Apr 12 07:50:14 EDT 2021

                  I was hoping it was just something fixed in 2.5.1, so I upgraded (from 2.5.0) just after the previous report (2 days later). So 2nd crash last night on 2.5.1.

                  1 Reply Last reply Reply Quote 1
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Hmm, are you able to test a 2.6 snapshot?

                    Though I'm not aware if anything specific that has gone in the address that.

                    Steve

                    F 1 Reply Last reply Reply Quote 0
                    • F
                      fireix @stephenw10
                      last edited by fireix

                      @stephenw10 It is in production, so a bit scary to upgrade since it seems to work for most usage (except one LAN-network, but not sure if related). I have a 2nd machine with same config offline standing ready for years now, so in theory I can just fire it up and load the backup when I'm onsite, but...

                      In the log, there is weird stuff like the below - many hundred. It is correct it is not a host, it is an alias for hosts and ports that are valid. Maybe this causes overload? I haven't change the aliases for months, started appearing just now. It doesn't seem to cause any problems, but strange that it suggest that the alias names are host.

                      1 Reply Last reply Reply Quote 1
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        It shouldn't ever cause a crash but you should remove unresolvable entries from aliases and rules.
                        It can cause delays in updating the ruleset that can cause other issues if there are enough.

                        Steve

                        F 1 Reply Last reply Reply Quote 0
                        • F
                          fireix @stephenw10
                          last edited by

                          @stephenw10 There was 6-7 aliases that was no longer in use. Meaning that I have earlier deleted one more more host behind the alias (from the GUI), but the alias it was part of had been left behind or had other valid entries. Now there is only one left in the logs and I can't find it..

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            I usually search the config file directly in that situation.

                            1 Reply Last reply Reply Quote 0
                            • D
                              dcugy
                              last edited by

                              I have the same problem since i move to pfsense 2.5. actually i use pfsense 2.6 and i have one crash by day.

                              i have in report : fault code = supervisor read instruction, page not present

                              what are differences in configuration files between pfsense 2.4 and 2.5 ?

                              best regards

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Depends which specific version but there are a lot:
                                https://docs.netgate.com/pfsense/en/latest/releases/versions.html

                                It shouldn't matter though, you can import an older config into the current pfSense version.

                                Steve

                                D 1 Reply Last reply Reply Quote 0
                                • F
                                  fireix
                                  last edited by fireix

                                  Changing hardware didn't help, not removing aliases or IPSec tunnels either.

                                  What finally solved it for me after a year of trouble, was removing the LAN LAG against two switches. Had redundancy in case of one switch failed. All the switches shown the correct properties against the other end (short/long etc), so had no reason to suspect any issues. It all started after a pfSense upgrade.

                                  I assume it must have been some kind of network confusion that caused the crash to happen every month. After this change, no problems has appeared.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Hmm, that's weird. You never saw any errors relating the the LAGG?

                                    It was LACP I assume. Was the LAN just directly assigned to it? Or VLANs over it?

                                    Steve

                                    F 1 Reply Last reply Reply Quote 0
                                    • F
                                      fireix @stephenw10
                                      last edited by

                                      @stephenw10 LACP, correct. No VLANS at all, LAN directly assigned to it.

                                      Maybe stupid, but only reason why I started suspecting it, was this message on one of the servers on the network (from the dump/crash-log):

                                      <6>arp: moved from ac:1f:6b:6f:f2:8a to ac:1f:6b:6f:f2:8b on lagg0

                                      I was suspecting that something wasn't working correctly, as there was no reason for a always-on file server to switch port. Maybe it is routine, who knows.. And not a single crash after.

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        Hmm, that's the server's MAC address(es)?

                                        That looks like a log message on pfSense showing that the server moved to a different MAC. I assume you omitted the IP address there.

                                        That wouldn't normally be an issue. It might happen if the server itself was connected with a lagg to the switch stack for example.

                                        F 1 Reply Last reply Reply Quote 0
                                        • F
                                          fireix @stephenw10
                                          last edited by

                                          @stephenw10 Yes, the servers mac-address. The server (all servers, not only this) was connected through LAGG-setup against switch in the same way. Didn't really think it should be a big problem, just tiny bit weird that only one had the "problem" (qnap server).

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            Yeah, it shouldn't be a problem. The servers IP can change MAC and it's usually only an inconvenience in the logging. It's sufficiently common that you can disable those log messages if you know the cause:
                                            https://docs.netgate.com/pfsense/en/latest/troubleshooting/logs-arp-moved.html

                                            So I'd say those log messages are unrelated to whatever was causing that crash.

                                            Steve

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.