Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Random crashes "Fatal trap 12: page fault while in kernel mode"

    Scheduled Pinned Locked Moved General pfSense Questions
    15 Posts 2 Posters 5.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Hmm, might need a console log to know more.
      Reviewing this I did actually test a PPPoE server on igc when we were testing the 6100 internally. I never saw an issue but I only tested with a few clients so it could well be a condition only created by multiple simultaneous connections.

      Steve

      JeGrJ 1 Reply Last reply Reply Quote 0
      • JeGrJ
        JeGr LAYER 8 Moderator @stephenw10
        last edited by JeGr

        @stephenw10 said in Random crashes "Fatal trap 12: page fault while in kernel mode":

        Hmm, might need a console log to know more.

        Tell me what you need and I'll try to get it :) I'm still on "potential race condition" or bug line but I'll be happy to shed some light on this either way. We checked for a core dump and as I thought - none is written, so that's a no I'm afraid.
        Also strikes me odd, that after the mass-reconnection to the new PPPoE Server interface (ix) that immediatly triggered a trap. That's why I was thinking race cond. with multiple reconnects happening at once but that's hard to come by and is annoying the users so nothing one could "easily reproduce" in production. Still wondering why it doesn' happen more frequent but seemingly random.

        Cheers
        \jens

        Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

        If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Mmm, if neither of those devices have an SSD then there is probably no SWAP partition configured and that's required for a crash dump.
          The panic and backtrace will still be shown on the console though so you can get it bu hooking the console up to something and logging that output until it panics.

          I agree it seems likely to be a race condition caused by multiple simultaneous client connections. Unclear what in yet though.
          I re-instated my PPPoE server on a 6100 and used a 4100 as the client, both via igc NICs. So far it's been solid but I'm unlikely to be able to trigger it.

          Steve

          JeGrJ 1 Reply Last reply Reply Quote 0
          • JeGrJ
            JeGr LAYER 8 Moderator @stephenw10
            last edited by JeGr

            @stephenw10 Coming back to this as the problem persists, the customer had time to do, what TAC support told me.

            He re-installed the 4100 that originally should be in place there from scratch. Imported config and switched back from the 6100 to the 4100 on site. Basically the same as above happened again after switching when multiple PPPoE requests came in. Now a few days later and a few reboots, too (sadly it did nothing to fix the problem as assumed), we finally have a coredump to share:

            textdump.tar.0
            info.0

            So at least the reinstall seemed to fix the "no core dump" situation at least

            Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

            If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Ok, well it's not familiar to me but it's an unusual setup. I'll see if anyone else here has seen it.

              1 Reply Last reply Reply Quote 1
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                One of our developers is looking at this. I have opened a bug report for it:
                https://redmine.pfsense.org/issues/13210

                Steve

                JeGrJ 1 Reply Last reply Reply Quote 1
                • JeGrJ
                  JeGr LAYER 8 Moderator @stephenw10
                  last edited by

                  @stephenw10 Thanks Steve, will track it there

                  Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

                  If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    This fix is in 2.7 snapshots now, are you able to test that?

                    Steve

                    JeGrJ 1 Reply Last reply Reply Quote 0
                    • JeGrJ
                      JeGr LAYER 8 Moderator @stephenw10
                      last edited by

                      @stephenw10 We're already in touch with both, Netgate TAC and the customer to test the fix. We updated the location in question yesterday evening. Besides a problem after the upgrade to 22.05RC of the device not cleanly restarting/booting after upgrade (had that same problem thrice already) after a slight "powerloss-restart" procedure ;) it rebootet and upgraded fine. As far as I can say up intil now we have ~10h of no lockup, freeze or panic.

                      Not really in the clear yet as we had some big "looking good" windows when testing, too, but it seems to look promising!

                      knocks on wood

                      Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

                      If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                      JeGrJ 1 Reply Last reply Reply Quote 1
                      • JeGrJ
                        JeGr LAYER 8 Moderator @JeGr
                        last edited by

                        To add it here: Customer has updated to a newer RC-snapshot as the earlier got him a few report emails of the box for getting packet loss sometimes (not that often) and he wanted to check if that would be fixed, too.
                        On the latest RC snapshot thus far no problems to report. No crash dump, no freeze, no panic. Also the packet loss seems gone too :) So happy on both fronts for now - makes me happy to report that.

                        Great job everyone involved. Shoutout to TAC support for their help and staying on the topic, too!

                        Cheers
                        \jens

                        Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

                        If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.