Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    "Fatal trap 12: page fault while in kernel mode" (w/ screenshot)

    Scheduled Pinned Locked Moved General pfSense Questions
    14 Posts 5 Posters 14.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      jjj
      last edited by

      Had a nice fatal error over the weekend. Any ideas what this is about?

      1 Reply Last reply Reply Quote 0
      • jimpJ Offline
        jimp Rebel Alliance Developer Netgate
        last edited by

        Hard to say without more detail. That doesn't look familiar.

        First, update to a current snapshot. If that alone doesn't help, then when you get the panic again, get a new picture and also type "bt" at that prompt, and get a picture of that output as well.

        The backtrace (bt) is important to help track down what the code was doing at the time.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • J Offline
          jjj
          last edited by

          We're on 2.0 Release and it's still occurring.

          1 Reply Last reply Reply Quote 0
          • Cry HavokC Offline
            Cry Havok
            last edited by

            Probably hardware related - whether the hardware is faulty/borderline or just incompatible.

            Start by running diagnostics on the hardware. I'd suggest a memory test (memtest86 et all) as the first test to run.

            1 Reply Last reply Reply Quote 0
            • J Offline
              jjj
              last edited by

              Well, it started happening right when we updated to 2.0 (RC3). It had been running flawlessly up until then. Therefore, I really doubt it's faulty hardware related. How do we verify hardware compatibility?

              1 Reply Last reply Reply Quote 0
              • G Offline
                GoldServe
                last edited by

                I found that if one of my links go up and down and up so often that the gateway monitor removes the link while PF is processing a packet, it will crash. I fixed my link so it doesn't go down as often and everything is good again.

                1 Reply Last reply Reply Quote 0
                • J Offline
                  jjj
                  last edited by

                  I don't understand how 1.2.3 was rock solid and never, ever crashed, even once, then we deploy 2.0 and it crashes every other day.

                  1 Reply Last reply Reply Quote 0
                  • J Offline
                    jjj
                    last edited by

                    This happens only during off-hours. Error screenshot

                    We just switched to new hardware…guess we'll wait and see. :(

                    1 Reply Last reply Reply Quote 0
                    • jimpJ Offline
                      jimp Rebel Alliance Developer Netgate
                      last edited by

                      If you run your NICs out of resources, then you could hit a panic with the driver…

                      Try some of the tweaks here:
                      http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards

                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      1 Reply Last reply Reply Quote 0
                      • W Offline
                        wallabybob
                        last edited by

                        Your error screenshot shows:

                        sf2: watchdog timeout, 240 Tx descriptors are active
                        sf2: watchdog timeout, 248 Tx descriptors are active
                        sf0: Tx underrun – increasing Tx threshold to 384 bytes

                        The FreeBSD man page for sf (http://www.freebsd.org/cgi/man.cgi?query=sf&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&arch=default&format=html) says

                        sf%d: watchdog timeout  The device has stopped responding to the network,
                            or there is a problem with the network connection (cable).

                        Do the watchdog timeout messages consistently appear before the panic? Always the same interface (sf2)?

                        Do the Tx underrun messages consistently appear before the panic? Always the same interface (sf0)?

                        Perhaps there is some sf driver related error condition that the software doesn't handle correctly, ultimately resulting in a panic.

                        Did you notice these messages when running pfSense 1.2.3?

                        @jjj:

                        I don't understand how 1.2.3 was rock solid and never, ever crashed, even once, then we deploy 2.0 and it crashes every other day.

                        Operating system upgrades often include performance enhancements. Those enhancements often drive some part of a system harder than it was driven before and sometimes other parts of a system can noticeably suffer. For illustration, suppose FreeBSD changed to double the maximum size of a disk transfer. That MIGHT impact sf devices. Its fundamental to the way Ethernet works that once a NIC starts transmission of a frame it can't pause the transmission mid-frame. If the the transmission does pause (transmit underrun) the whole frame must be retransmitted. For cost reasons, older NICs had a small transmit buffer which was refilled from main memory during transmission as required. For performance reasons, newer NICs include a buffer large enough to be able to guarantee transmission of at least a maximum standard sized frame without pauses. That sf driver reports transmit underruns suggests its transmit component buffers less than a whole frame and the PCI bus gets busy for long enough that an sf device can't refill its transmit buffer in time to avoid mid frame transmission pauses. An increase in maximum disk transfer size might result in an increased likelihood of the disk (or the disk plus other very active NICs) starving the sf device of PCI transfers long enough to result in transmit underrun. Poor handling of transmit underrun (due to a rare combination of circumstances) might result in a panic somewhat later.

                        1 Reply Last reply Reply Quote 0
                        • J Offline
                          jjj
                          last edited by

                          Thanks for the replies. So far no crash on the new hardware (Dell AMD x64), but we'll see after we get through the weekend.

                          I did notice however TX underrun issues are happening on the new hardware. The interesting part is we're still using an Intel quad-port NIC (dc0-3), but it's slightly different than before. Also, we're using a 3com NIC (xl0), as the onboard Broadcom NIC wouldn't pass traffic at all.

                          Here's a screenshot of the errors the current firewall is getting.

                          @wallabybob: it appears as if the watchdog and Tx underrun do appear before the panic. we'll keep an eye on that now. If that is indeed the case, then our NICs might need to get replaced. Notice in the screenshot that both the Intel and 3com are having errors. I'm guessing it's because they're both the same age (i.e. old).

                          @jimp: does that link apply since our NICs are dcX?

                          1 Reply Last reply Reply Quote 0
                          • jimpJ Offline
                            jimp Rebel Alliance Developer Netgate
                            last edited by

                            The old dc cards were notoriously crappy, even when new. They were DEC chips, and iirc only rebadged as Intel, they aren't "proper" Intel cards really, the good ones use the fxp/em/igb drivers. The xl cards have been ok, but are showing their age.

                            I'm not sure I'd trust anything that old to a decent workload.

                            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            1 Reply Last reply Reply Quote 0
                            • W Offline
                              wallabybob
                              last edited by

                              @jimp:

                              The old dc cards were notoriously crappy, even when new. They were DEC chips, and iirc only rebadged as Intel,

                              For a while Intel sold them as Intel parts after they took over the DEC chip business in the 1990s.

                              @jjj:

                              So far no crash on the new hardware (Dell AMD x64), but we'll see after we get through the weekend.

                              I did notice however TX underrun issues are happening on the new hardware. The interesting part is we're still using an Intel quad-port NIC (dc0-3), but it's slightly different than before. Also, we're using a 3com NIC (xl0), as the onboard Broadcom NIC wouldn't pass traffic at all.

                              So you have five active NICS. Are they mostly pretty busy? If the box you are running this in has another PCI bus (unlikely if its a desktop, possible if its a server) you might get fewer Tx underruns if you move one of the cards to the other PCI bus and put the heaviest load on the xl0 interface (to try to balance somewhat the traffic on xl0 and the traffic on the dcx interfaces). Alternatively, if the box has a PCIe slot you might reduce the Tx underruns by purchasing a PCIe NIC and moving the heaviest traffic to it. My suspicion is that you might be trying to pass enough traffic to saturate the PCI bus at times.

                              1 Reply Last reply Reply Quote 0
                              • J Offline
                                jjj
                                last edited by

                                All of the NICs are on PCI slots. I just bought the 4-port NIC. It'll plug into the 16x slot on the PC.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.