• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

"Page fault while in kernel mode" on APU2 after bios/coreboot upgrade

Scheduled Pinned Locked Moved General pfSense Questions
41 Posts 8 Posters 3.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C
    CS
    last edited by Sep 11, 2020, 4:50 AM

    Hey all,

    I recently upgraded my BIOS on an APU2 board from Coreboot v.4.0.X to v4.12.0.4. That's the latest version downloaded from https://pcengines.github.io. Since then pfSense 2.4.5-RELEASE-p1 crashes once every day at random time and automatically reboots.

    According to the last two crash reports, there is a "page fault while in kernel mode".

    Let me know if you have experienced something similar and if you have any ideas about how to troubleshoot and fix this issue.

    Dump info:

    Architecture: amd64
    Architecture Version: 1
    Dump Length: 72704
    Blocksize: 512
    Magic: FreeBSD Text Dump
    Version String: FreeBSD 11.3-STABLE #243 abf8cba50ce(RELENG_2_4_5): Tue Jun  2 17:53:37 EDT 2020
      root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-245/obj/amd64/YNx4Qq3j/build/ce-crossbuild-245/source
    Panic String: page fault
    Dump Parity: 447612467
    Bounds: 0
    Dump Status: good
    

    Crash 1:

    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address	= 0x0
    fault code		= supervisor read instruction, page not present
    instruction pointer	= 0x20:0x0
    stack pointer	        = 0x28:0xfffffe011f4ee800
    frame pointer	        = 0x28:0xfffffe011f4ee8d0
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 54266 (sh)
    trap number		= 12
    panic: page fault
    cpuid = 0
    KDB: enter: panic
    

    Crash 2:

    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address	= 0xffffffffffffffff
    fault code		= supervisor write data, page not present
    instruction pointer	= 0x20:0xffffffff812504ca
    stack pointer	        = 0x28:0xfffffe011f5c0800
    frame pointer	        = 0x28:0xfffffe011f5c08d0
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 86485 (ls)
    trap number		= 12
    panic: page fault
    cpuid = 1
    KDB: enter: panic
    
    F 1 Reply Last reply Sep 11, 2020, 2:48 PM Reply Quote 0
    • S
      stephenw10 Netgate Administrator
      last edited by Sep 11, 2020, 2:03 PM

      Need to see the backtrace to compare but since those faults are in different processes they will be different. That implies a high likelihood of a hardware issue probably RAM. Did that BIOS update change the memory handling at all?
      Does going back to an earlier version correct it?

      Steve

      1 Reply Last reply Reply Quote 0
      • F
        fireodo @CS
        last edited by Sep 11, 2020, 2:48 PM

        @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

        I recently upgraded my BIOS on an APU2 board from Coreboot v.4.0.X to v4.12.0.4.

        Beside of many changes between the Legacy Bios Line (4.0.x) and the Mainline (4.12.0.x) in the Mainline is the Core Performance Boost enabled by default. This COULD be something that makes a slightly faulty RAM to react.
        You could deactivate it in the Bios and see whats happening.
        coreboot-apuspare.png

        Regards,
        fireodo

        Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
        SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
        pfsense 2.7.2 CE
        Packages: Apcupsd Cron Iftop Iperf LCDproc Nmap pfBlockerNG RRD_Summary Shellcmd Snort Speedtest System_Patches.

        1 Reply Last reply Reply Quote 0
        • C
          CS
          last edited by Sep 12, 2020, 4:43 AM

          Thank you @stephenw10 and @fireodo !

          I deactivated the "Core Performance Boost" option and I'm waiting to see what happens.

          Today it crashed several times, before the change in BIOS, and the faults are in different processes every time. I also got the following error twice "spin lock held too long":

          MCA: Bank 1, Status 0x9400000000000151
          MCA: Global Cap 0x0000000000000106, Status 0x0000000000000000
          MCA: Vendor "AuthenticAMD", ID 0x730f01, APIC ID 1
          MCA: CPU 1 COR ICACHE L1 IRD error
          MCA: Address 0x4eb660
          spin lock 0xffffffff83517de8 (smp rendezvous) held by 0xfffff80008b65620 (tid 100132) too long
          timeout stopping cpus
          panic: spin lock held too long
          cpuid = 1
          KDB: enter: panic
          

          and

          spin lock 0xffffffff83517de8 (smp rendezvous) held by 0xfffff80004acd000 (tid 100059) too long
          timeout stopping cpus
          panic: spin lock held too long
          cpuid = 1
          KDB: enter: panic
          
          1 Reply Last reply Reply Quote 0
          • C
            CS
            last edited by Sep 12, 2020, 4:45 AM

            If it fails again I'll run a memtest and possibly downgrade to an older version of coreboot. By the way, my pfSense config is an old one that I have kept while upgrading to newer versions.

            F D 2 Replies Last reply Sep 12, 2020, 7:13 AM Reply Quote 0
            • F
              fireodo @CS
              last edited by Sep 12, 2020, 7:13 AM

              @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

              If it fails again I'll run a memtest and possibly downgrade to an older version of coreboot. By the way, my pfSense config is an old one that I have kept while upgrading to newer versions.

              Memtest is a good idea and maybe a checkdisk too!

              Good Weekend,
              fireodo

              Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
              SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
              pfsense 2.7.2 CE
              Packages: Apcupsd Cron Iftop Iperf LCDproc Nmap pfBlockerNG RRD_Summary Shellcmd Snort Speedtest System_Patches.

              1 Reply Last reply Reply Quote 0
              • S
                stephenw10 Netgate Administrator
                last edited by Sep 12, 2020, 4:16 PM

                Yup, definitely try memtest if you can. That MCA error can only be hardware related so I would guess it is something to do with the core boost if it doesn't happen on legacy BIOS versions. I haven't dug deep enough here to find out if that changes the ram clock. I don't have an APU new enough to support that.

                Steve

                1 Reply Last reply Reply Quote 0
                • D
                  DaddyGo @CS
                  last edited by DaddyGo Sep 12, 2020, 6:05 PM Sep 12, 2020, 6:04 PM

                  @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                  possibly downgrade to an older version of coreboot.

                  Hi,

                  Unnecessary step back (downgrade) APU2 based boxes work perfectly with the new BIOS

                  b4930277-e015-43a5-a655-4a8ff61a6fcc-image.png

                  The problem is maybe that, with a "legacy BIOS" version left for a long time (I don't understand why?) and now you've taken a big step forward onto an old pfSense install

                  My suggestion is a full backup followed by a fresh pfSense installation with the latest BIOS😉 (v4.12.0.4)

                  Important:
                  After installing the BIOS, the APU boards require a complete power outage (60- 120 sec), a hot and cold reboot is not enough !!!

                  Cats bury it so they can't see it!
                  (You know what I mean if you have a cat)

                  C 1 Reply Last reply Sep 12, 2020, 8:27 PM Reply Quote 0
                  • C
                    CS
                    last edited by Sep 12, 2020, 8:24 PM

                    Uptime: 16 hours with no crash yet, fingers-crossed. :)

                    Thanks @DaddyGo , I had done the complete power outage so that shouldn't be an issue here.

                    I agree that a fresh pfSense with the latest BIOS would be ideal but I keep this as my last option right now. Ideally I wouldn't even restore my config and do everything from scratch but I'm not sure if I'll have the time and patience to do that.

                    In regards to your comment about the legacy BIOS version, honestly I didn't have a good reason to keep upgrading the BIOS when the device works flawlessly with the latest pfSense releases. Sometimes the BIOS upgrades might cause issues and I didn't have time to deal with these. I upgraded now because the device relocated and it's always a good opportunity to start fresh with the latest versions.

                    1 Reply Last reply Reply Quote 0
                    • C
                      CS @DaddyGo
                      last edited by Sep 12, 2020, 8:27 PM

                      @DaddyGo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                      Unnecessary step back (downgrade) APU2 based boxes work perfectly with the new BIOS

                      @DaddyGo can you please confirm if "Core Performance Boost" is currently enabled or disabled in your BIOS? For the record, I have Coreboot v4.12.0.4, not v4.12.0.3. Let me know how it goes when you upgrade.

                      D 1 Reply Last reply Sep 13, 2020, 11:56 AM Reply Quote 0
                      • D
                        DaddyGo @CS
                        last edited by Sep 13, 2020, 11:56 AM

                        @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                        can you please confirm if "Core Performance Boost" is currently enabled or disabled in your BIOS?

                        We have been using APU boards for many years, so we have a lot of experience with these MOBOs.
                        We’ve been through a lot of BIOS versions already. 😉

                        We have long deviated from the legacy BIOS line at the suggestion of pcEngines and 3mdeb.

                        CPB has been in use for a long time, as the first CPU core spins at 1,400 at this time, which is good for OpenVPN stuff.

                        CPB has been enabled since V4.9.0.2

                        67a74a7a-39af-4767-b573-3acfe18f4ea5-image.png

                        with this you can check: sysctl dev.cpu.0.freq_levels

                        Updating the BIOS is quite difficult due to known USB flash drive problems, almost only the Kingston DT100 G3 can update the BIOS. I also quickly bought 16 and 32G models out of it as they are no longer available.

                        The sequence of operations is well described here, if you need help I am happy to be at your disposal.
                        https://pcengines.ch/howto.htm#TinyCoreLinux

                        register for BIOS information here:
                        https://pcengines.github.io/
                        (you will receive a first-hand update via email)

                        493043bc-6dcc-42f9-acce-bd2c7f5f2509-image.png

                        btw:

                        Also, don’t forget about Intel tweaks and the correct configuration of your NIC
                        loader.conf.local....
                        like:

                        legal.intel_ipw.license_ack=1
                        legal.intel_iwi.license_ack=1
                        hw.igb.rx_process_limit=-1
                        hw.igb.tx_process_limit=-1
                        hw.igb.rxd=1024
                        hw.igb.txd=1024
                        hw.igb.max_interrupt_rate=64000

                        and etc......

                        system tunables...
                        disable EEE,
                        disable flow control
                        kern.ipc.nmbclusters
                        set net.inet.ip.redirect (enable tryforward routing path ipv4)

                        and similar things....

                        Cats bury it so they can't see it!
                        (You know what I mean if you have a cat)

                        C 1 Reply Last reply Sep 17, 2020, 9:55 PM Reply Quote 0
                        • C
                          CS @DaddyGo
                          last edited by Sep 17, 2020, 9:55 PM

                          @DaddyGo thanks a lot for your response.

                          For the record, the device has been working smoothly without any crashes for about 6 days after I disabled CPB. So that was definitely what caused the issue. I'll try to re-enable it and do some tuning in case this can be solved without having to keep CPB disabled or re-install pfSense from scratch. I'll provide updates about my progress on this thread for future reference.

                          D 1 Reply Last reply Sep 18, 2020, 9:02 AM Reply Quote 1
                          • D
                            DaddyGo @CS
                            last edited by DaddyGo Sep 18, 2020, 9:11 AM Sep 18, 2020, 9:02 AM

                            @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                            the device has been working smoothly without any crashes for about 6 days after I disabled CPB

                            This means that your problem is CPB dependent, but I really have not heard of anyone else having this problem in the long run.

                            CPB is not a required feature, but if it already exists and can be enabled, why not use it.
                            For us, it caused a significant improvement in ExpVPN connections

                            These links can also be useful:

                            https://teklager.se/en/knowledge-base/apu2-vpn-performance/
                            https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/
                            https://teklager.se/en/knowledge-base/

                            btw:
                            99% of pcEngines users use CPB, the forum is full of APU board descriptions, I think it's a good thing

                            Cats bury it so they can't see it!
                            (You know what I mean if you have a cat)

                            F 1 Reply Last reply Sep 18, 2020, 9:18 AM Reply Quote 0
                            • F
                              fireodo @DaddyGo
                              last edited by Sep 18, 2020, 9:18 AM

                              @DaddyGo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                              btw:
                              99% of pcEngines users use CPB, the forum is full of APU board descriptions, I think it's a good thing

                              I have CPB too, and I have tested with and without, there was no difference in the pfsense behavior (beside speed increase), but I think that the original posters APU has RAM that is on the "limit" and the increasing of speed make that RAM to produce errors.
                              Thats what I suppose.

                              Fine Weekend,
                              fireodo

                              Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
                              SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
                              pfsense 2.7.2 CE
                              Packages: Apcupsd Cron Iftop Iperf LCDproc Nmap pfBlockerNG RRD_Summary Shellcmd Snort Speedtest System_Patches.

                              D 1 Reply Last reply Sep 18, 2020, 9:33 AM Reply Quote 0
                              • D
                                DaddyGo @fireodo
                                last edited by Sep 18, 2020, 9:33 AM

                                @fireodo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                                but I think that the original posters APU has RAM that is on the "limit" and the increasing of speed make that RAM to produce errors.

                                This is very possible....exhausted RAM

                                no matter how good the APU stuff is, 4GB of RAM was often on the "verge" for me

                                Don't forget @fireodo that 3mdeb (BIOS developers) has been activating RAM ECC for some time

                                so this should help with RAM errors

                                Cats bury it so they can't see it!
                                (You know what I mean if you have a cat)

                                F 1 Reply Last reply Sep 18, 2020, 2:57 PM Reply Quote 0
                                • F
                                  fireodo @DaddyGo
                                  last edited by Sep 18, 2020, 2:57 PM

                                  @DaddyGo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                                  Don't forget @fireodo that 3mdeb (BIOS developers) has been activating RAM ECC for some time

                                  so this should help with RAM errors

                                  I know - but if the Hardware is not OK (the RAM-Chips) then even ECC cannot compensate that!

                                  Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
                                  SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
                                  pfsense 2.7.2 CE
                                  Packages: Apcupsd Cron Iftop Iperf LCDproc Nmap pfBlockerNG RRD_Summary Shellcmd Snort Speedtest System_Patches.

                                  D 1 Reply Last reply Sep 18, 2020, 3:33 PM Reply Quote 0
                                  • D
                                    DaddyGo @fireodo
                                    last edited by Sep 18, 2020, 3:33 PM

                                    @fireodo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                                    Hardware is not OK

                                    That’s really true, and then we’ll see what the OP gets

                                    Cats bury it so they can't see it!
                                    (You know what I mean if you have a cat)

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      CS
                                      last edited by Oct 12, 2020, 9:23 PM

                                      @DaddyGo, @fireodo , @stephenw10

                                      Hey folks, let me provide an update here:

                                      • Memtest was completed without errors but pfSense kept crashing.
                                      • I upgraded coreboot to v4.12.0.5 but it kept crashing.
                                      • I reinstalled pfSense 2.4.5-RELEASE-p1 and restored my config but it kept crashing, which is something I was not expecting.
                                      • I kept the CPU Boost config option in my loader.conf.local and disabled again the option "Core Performance Boost" in Bios. It stopped crashing and CPU Boost is still active:
                                      dev.cpu.0.temperature: 62.7C
                                      dev.cpu.0.cx_method: C1/hlt C2/io
                                      dev.cpu.0.cx_usage_counters: 24303377 0
                                      dev.cpu.0.cx_usage: 100.00% 0.00% last 1981us
                                      dev.cpu.0.cx_lowest: C1
                                      dev.cpu.0.cx_supported: C1/1/0 C2/2/400
                                      dev.cpu.0.freq_levels: 1400/-1 1200/-1 1000/-1
                                      dev.cpu.0.freq: 1400
                                      dev.cpu.0.%parent: acpi0
                                      dev.cpu.0.%pnpinfo: _HID=none _UID=0
                                      dev.cpu.0.%location: handle=\_PR_.P000
                                      dev.cpu.0.%driver: cpu
                                      dev.cpu.0.%desc: ACPI CPU
                                      

                                      Core Performance Boost is triggering this for some reason, it was crashing randomly and not when it was under load.
                                      Could anyone share their APU2 loader.config.local file for reference? I'm wondering if I'm missing something obvious, I haven't done any tuning for years because it has been running smoothly with no issues.

                                      F 1 Reply Last reply Oct 14, 2020, 1:09 PM Reply Quote 0
                                      • S
                                        stephenw10 Netgate Administrator
                                        last edited by Oct 12, 2020, 9:45 PM

                                        The fact it threw an MCA error implies it was hitting some hardware issue and it looked to be in the RAM.

                                        I'm not entirely sire what the Core Performance Boost setting does but I could well believe it pushes the RAM or bus speed up with the CPU. Your RAM appears to be incapable of running stable at that new rate. Or something lsimilar to that.

                                        Steve

                                        1 Reply Last reply Reply Quote 0
                                        • kiokomanK
                                          kiokoman LAYER 8
                                          last edited by kiokoman Oct 12, 2020, 11:11 PM Oct 12, 2020, 9:51 PM

                                          are you sure it's ram?

                                          to me it can be overclocked cpu or burned cpu

                                          MCA: Vendor "AuthenticAMD", ID 0x730f01, APIC ID 1
                                          MCA: CPU 1 COR ICACHE L1 IRD error
                                          

                                          Machine Check Architecture

                                          CPU 1
                                          COR = Corrected
                                          ICACHE = Instruction Cache
                                          L1 = L1 Cache (On Chip)
                                          IRD = Instruction Fetch
                                          error is self explanatory.

                                          ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                                          Please do not use chat/PM to ask for help
                                          we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                                          Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                                          1 Reply Last reply Reply Quote 1
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            This community forum collects and processes your personal information.
                                            consent.not_received