• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

[v2.3 & v2.4] Kernel crash with Fatal trap 12: page fault while in kernel mode

Scheduled Pinned Locked Moved General pfSense Questions
35 Posts 10 Posters 32.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C
    CDuv
    last edited by Nov 13, 2016, 2:11 PM

    OK, I'll try other versions: I have other hardware.

    I guess testing v2.3.3 wouldn't do any better (since I doubt this unknown bug would get fixed).

    I rather try v2.4 instead of v2.2 (to avoid loosing any feature v2.3 brought): would my v2.3.2-p1 configuration file be accepted on v2.4?

    1 Reply Last reply Reply Quote 0
    • C
      CDuv
      last edited by Nov 14, 2016, 10:29 AM

      Should I disable "Flow Control" (as the Wiki says)

      1 Reply Last reply Reply Quote 0
      • W
        w0w
        last edited by Nov 14, 2016, 7:39 PM

        Yes, 2.4 can use backup config from previous versions. You can  also try any other settings you find, not only flow control.

        1 Reply Last reply Reply Quote 0
        • C
          CDuv
          last edited by Nov 15, 2016, 9:22 AM

          So, I tried 2.4.0-BETA v20161113-2326 (pfSense-CE-memstick-serial-2.4.0-BETA-amd64-20161113-2326), in 15 hours it failed twice (9h and 15h later).

          When I logged in in the WebConfigurator to get the first crash report I got it fine:

          Crash report begins.  Anonymous machine information:

          amd64
          11.0-RELEASE-p3
          FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016    root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense

          Crash report details:

          Filename: /var/crash/bounds
          1

          Filename: /var/crash/info.0
          Dump header from device: /dev/ada0s1b
            Architecture: amd64
            Architecture Version: 2
            Dump Length: 580517888
            Blocksize: 512
            Dumptime: Tue Nov 15 04:00:16 2016
            Hostname: pfsensebox.example.com
            Magic: FreeBSD Kernel Dump
            Version String: FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016
              root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense
            Panic String: page fault
            Dump Parity: 1903556642
            Bounds: 0
            Dump Status: good

          Filename: /var/crash/info.last
          Dump header from device: /dev/ada0s1b
            Architecture: amd64
            Architecture Version: 2
            Dump Length: 580517888
            Blocksize: 512
            Dumptime: Tue Nov 15 04:00:16 2016
            Hostname: pfsensebox.example.com
            Magic: FreeBSD Kernel Dump
            Version String: FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016
              root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense
            Panic String: page fault
            Dump Parity: 1903556642
            Bounds: 0
            Dump Status: good

          Filename: /var/crash/minfree
          2048

          but when sending report to developers, it got it's second crash which I don't want to send (because it would maybe re-crash the system) but I got this on the serial console:

          Enter an option:
          Message from syslogd@pfsensebox at Nov 15 10:01:15 …
          pfsensebox php-fpm[84587]: /index.php: Successful login for user 'admin' from: 10.0.1.53
          panic: sbsndptr: sockbuf 0xfffff8010d811518 and mbuf 0xfffff8010ddc6000 clashing
          cpuid = 6
          Uptime: 6h0m53s
          Dumping 567 out of 8135 MB: (CTRL-C to abort) ..3%..12%..23%..32%..43%..51%..63%..71%..82%..91%
          Dump complete
                                                                                      99
          TAB Key on Remote Keyboard To Entry Setup Menu
          MB-7551 Ver.AE0 03/28/2014
          Version 2.16.1242. Copyright (C) 2013 American Megatrends, Inc.
          Press ~~or <esc>to enter setup.

          (.. many empty lines ..)

          |oading /boot/defaults/loader.conf serial port                                 
          /IOS drive C: is disk0        /boot/config: -S115200 -D
          BIOS 619kB/2081240kB available memory

          FreeBSD/x86 bootstrap loader, Revision 1.1
          (root@buildbot2.netgate.com, Wed Aug  3 08:04:25 CDT 2016)

          (.. many empty lines ..)

          /boot/entropy size=0x100017b93e]a0 |     
          Booting… _/ _|  ___ _ __  ___  ___   
          Copyright (c) 1992-2016 The FreeBSD Project.
          Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
            | .
          /The Regents of the University of California. All rights reserved.
          FreeBSD is a registered trademark of The FreeBSD Foundation.
          FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016
              root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense amd64</esc>~~

          I have a 567MB file "/var/crash/vmcore.0".

          1 Reply Last reply Reply Quote 0
          • C
            CDuv
            last edited by Nov 15, 2016, 1:01 PM

            Other crash:

            Fatal trap 12: page fault while in kernel mode
            cpuid = 5; apic id = 0a
            fault virtual address  = 0x78
            fault code              = supervisor read data, page not present
            instruction pointer    = 0x20:0xffffffff80d6632c
            stack pointer          = 0x28:0xfffffe01ec7d9930
            frame pointer          = 0x28:0xfffffe01ec7d9990
            code segment            = base 0x0, limit 0xfffff, type 0x1b
                                    = DPL 0, pres 1, long 1, def32 0, gran 1
            processor eflags        = interrupt enabled, resume, IOPL = 0
            current process        = 12 (irq289: igb4:que 5)
            trap number            = 12
            panic: page fault
            cpuid = 5
            Uptime: 1h33m1s
            Dumping 568 out of 8135 MB:..3%..12%..23%..31%..43%..51%..62%..71%..82%..91%
            Dump complete
                                                                                        99
            TAB Key on Remote Keyboard To Entry Setup Menu
            MB-7551 Ver.AE0 03/28/2014
            Version 2.16.1242. Copyright (C) 2013 American Megatrends, Inc.
            Press ~~or <esc>to enter setup.

            (.. many empty lines ..)

            |oading /boot/defaults/loader.conf serial port                                 
            /IOS drive C: is disk0                                                          BIOS 619kB/2081240kB available memory

            FreeBSD/x86 bootstrap loader, Revision 1.1
            (root@buildbot2.netgate.com, Wed Aug  3 08:04:25 CDT 2016)
            \

            (.. many empty lines ..)

            syms=[0x8+0x17b620+0x8+0x17b93e]a0 data=0xaad7b8+0x4c60e8
            /boot/entropy size=0x1000 __  ___  ___   
            Booting…|___ \ / _ \ ' / __|/ _ \   
            Copyright (c) 1992-2016 The FreeBSD Project.
            Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
              |_|  The Regents of the University of California. All rights reserved.
            FreeBSD is a registered trademark of The FreeBSD Foundation.             
            FreeBSD 11.0-RELEASE-p3 #180 8fb831d(RELENG_2_4): Sun Nov 13 23:31:20 CST 2016
                root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense amd64</esc>~~

            I will now try v2.3.1 (I think I recall that it was not crashing that much at that time).

            2.4.0-BETA crashed few seconds after I changed Virtual IP settings (to disable box running v2.4.0-BETA and switch production to the box running v2.3.1).

            1 Reply Last reply Reply Quote 0
            • W
              w0w
              last edited by Nov 15, 2016, 7:32 PM

              I can be wrong but it mostly looks like ECC memory failure. Please replace your memory and test again.

              1 Reply Last reply Reply Quote 0
              • C
                CDuv
                last edited by Nov 16, 2016, 2:08 AM Nov 15, 2016, 9:47 PM

                It occurs on 2 identical brand-new box so I doubt faulty hardware is the cause (it is still possible indeed but lowly possible).

                1 Reply Last reply Reply Quote 0
                • W
                  w0w
                  last edited by Nov 16, 2016, 6:06 PM

                  Two boxes? Does it mean you don't use same pieces of hardware when testing on both machines? For example harddrive/CF or USB stick?

                  1 Reply Last reply Reply Quote 0
                  • C
                    CDuv
                    last edited by Nov 16, 2016, 10:30 PM

                    It means I have 2 identical servers with exact same model of component (1X SSD, 1x RAM memory stick) in each one.

                    v2.3.2 was tested on both servers.
                    v2.40-BETA (which performs a bit better : only 2-3 crashes per day) was only tested on box 2.
                    v2.3.1 was only tested on box 1.

                    I never swapped any piece (would it be RAM or SSD)
                    Used a couple of USB memory stick for the installations : it is actually the only piece of hardware that was shared between servers.

                    1 Reply Last reply Reply Quote 0
                    • C
                      CDuv
                      last edited by Nov 17, 2016, 2:29 PM

                      Funny fact, when unplugging network cables (when I want to swap production from one server to another: for the tests): the server crashes…

                      1 Reply Last reply Reply Quote 0
                      • W
                        w0w
                        last edited by Nov 17, 2016, 3:58 PM

                        Are there some BIOS/UEFI options regarding OS installation compatibility?
                        Did you try to install 2.4 on ZFS with GPT-UEFI (it should work on latest builds)? I am not sure may be its related to some power savings or anything else you can find in BIOS or UEFI settings related to power savings.
                        It would be good to disable all CPU power saving modes except common C1 mode for testing purposes.

                        1 Reply Last reply Reply Quote 0
                        • W
                          W4RH34D
                          last edited by Nov 17, 2016, 4:11 PM

                          bad cable?  faulty plug on the other end with out of spec voltages?

                          Did you really check your cables?

                          1 Reply Last reply Reply Quote 0
                          • C
                            CDuv
                            last edited by Nov 17, 2016, 5:31 PM Nov 17, 2016, 4:48 PM

                            @w0w:

                            Are there some BIOS/UEFI options regarding OS installation compatibility?
                            Did you try to install 2.4 on ZFS with GPT-UEFI (it should work on latest builds)? I am not sure may be its related to some power savings or anything else you can find in BIOS or UEFI settings related to power savings.
                            It would be good to disable all CPU power saving modes except common C1 mode for testing purposes.

                            pfSense 2.4 was installed on MBR: I'll try GPT…

                            I have no access to BIOS on this appliance which is pre-prepared for pfSense (bought to a local appliance reseller): serial console does not allow me to enter BIOS (I see the "Press to enter…" but pressing the [Del] key does nothing).: Got it working.
                            The BIOS says the following about CPU:

                            • EIST (GV3) : Disable
                            • P-stat Coordination : Package (cannot be modified)
                            • TM1 : Enable
                            • TM2 Mode : Adaptative Throttling (cannot be modified)
                            • CPU C State : Disable
                            • Enhanced Halt State : Disable (cannot be modified)
                            • ACP C2 : Diable (cannot be modified)
                            • Monitor/Mwait : Enable (cannot be modified)
                            • L1 Prefetcher : Enable
                            • L2 Prefetcher : Enable
                            • Max CPUID Value Limit : Disable
                            • Execute Disable Bit : Enable
                            • AES-NI : Enable
                            • Turbo : Enable (cannot be modified)
                            • Active Processor Core : All

                            But, on the pfSense config, PowerD is disabled and AC Power, Battery Power and Unknown Power settings are all set to "Hiadaptive" (did not touched theses after 2.4.0-BETA installation).

                            It is advised to disable power saving modes in the BIOS/UEFI?

                            1 Reply Last reply Reply Quote 0
                            • B
                              beppo
                              last edited by Nov 17, 2016, 7:42 PM

                              Fatal trap 12: page fault while in kernel mode
                              cpuid = 3; apic id = 06
                              fault virtual address	= 0x5e00000000
                              fault code		= supervisor read data, page not present
                              instruction pointer	= 0x20:0xffffffff80d80b00
                              stack pointer	        = 0x28:0xfffffe00a1644b60
                              frame pointer	        = 0x28:0xfffffe00a1644b80
                              code segment		= base 0x0, limit 0xfffff, type 0x1b
                              			= DPL 0, pres 1, long 1, def32 0, gran 1
                              processor eflags	= interrupt enabled, resume, IOPL = 0
                              current process		= 75463 (pfctl)
                              

                              Got the same error once a day with the same motherboard in 2.3.x. Bios settings weren't changed since the first install of 2.x. I don't think it is a hardware issue but I also have no clue how to solve the issue but installing 2.2 again.

                              1 Reply Last reply Reply Quote 0
                              • C
                                CDuv
                                last edited by Nov 18, 2016, 12:07 AM Nov 17, 2016, 10:00 PM

                                Yeah! I am not alone!

                                Is your network architecture partially similar to mine? Do you have a lot of users?

                                If your workaround is to downgrade to 2.2 it is a lead for some debugging and a possible fix…

                                I see bug #4689 (Panic/Crash "sbflush_internal: cc 4294967166 || mb 0 || mbcnt 0") is similar but is marked as resolved for 2.3…
                                Original bug report on FreeBSD bug tracker is still open and someone reported it ran into the issue it a month ago.

                                1 Reply Last reply Reply Quote 0
                                • W
                                  w0w
                                  last edited by Nov 18, 2016, 4:22 AM Nov 18, 2016, 4:17 AM

                                  I don't see any options you must change in BIOS. All your posted options are OK. Just for testing purpose, enable PowerD and set it to maximum perfomance. Make sure you do not have polling enabled and enable all setting below (see picture).
                                  If it does not help then install 2.2.x version.

                                  offload.jpg
                                  offload.jpg_thumb

                                  1 Reply Last reply Reply Quote 0
                                  • C
                                    CDuv
                                    last edited by Nov 18, 2016, 12:00 PM

                                    I had "Hardware Checksum Offloading" unchecked and both "Hardware TCP Segmentation Offloading" & "Hardware Large Receive Offloading" checked.

                                    I'll check "Hardware Checksum Offloading" and set PowerD to maximum…

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      CDuv
                                      last edited by Nov 21, 2016, 2:35 PM Nov 21, 2016, 12:33 PM

                                      Update:
                                      I tested OPNsense (v16.7.8 ) under the same load and configuration and it works (3 days now)…
                                      In the same time, the other server (on pfSense 2.4.0) which is up but not used (no traffic towards him) did not crashed either: indicates crashes are load/traffic related.

                                      Hope this helps to pinpoint the exact cause of the issue.

                                      1 Reply Last reply Reply Quote 0
                                      • W
                                        w0w
                                        last edited by Nov 21, 2016, 7:44 PM

                                        I think you must create bug report on redmine. The crashes I have had on different hardware also happened under heavy traffic. It could be driver related or NIC hardware revision/firmware.

                                        1 Reply Last reply Reply Quote 0
                                        • W
                                          w0w
                                          last edited by Nov 23, 2016, 6:42 PM

                                          Any news?

                                          1 Reply Last reply Reply Quote 0
                                          31 out of 35
                                          • First post
                                            31/35
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            This community forum collects and processes your personal information.
                                            consent.not_received