Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade

    Scheduled Pinned Locked Moved General pfSense Questions
    41 Posts 8 Posters 3.9k Views 9 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C Offline
      CS @DaddyGo
      last edited by

      @DaddyGo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

      Unnecessary step back (downgrade) APU2 based boxes work perfectly with the new BIOS

      @DaddyGo can you please confirm if "Core Performance Boost" is currently enabled or disabled in your BIOS? For the record, I have Coreboot v4.12.0.4, not v4.12.0.3. Let me know how it goes when you upgrade.

      DaddyGoD 1 Reply Last reply Reply Quote 0
      • DaddyGoD Offline
        DaddyGo @CS
        last edited by

        @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

        can you please confirm if "Core Performance Boost" is currently enabled or disabled in your BIOS?

        We have been using APU boards for many years, so we have a lot of experience with these MOBOs.
        We’ve been through a lot of BIOS versions already. 😉

        We have long deviated from the legacy BIOS line at the suggestion of pcEngines and 3mdeb.

        CPB has been in use for a long time, as the first CPU core spins at 1,400 at this time, which is good for OpenVPN stuff.

        CPB has been enabled since V4.9.0.2

        67a74a7a-39af-4767-b573-3acfe18f4ea5-image.png

        with this you can check: sysctl dev.cpu.0.freq_levels

        Updating the BIOS is quite difficult due to known USB flash drive problems, almost only the Kingston DT100 G3 can update the BIOS. I also quickly bought 16 and 32G models out of it as they are no longer available.

        The sequence of operations is well described here, if you need help I am happy to be at your disposal.
        https://pcengines.ch/howto.htm#TinyCoreLinux

        register for BIOS information here:
        https://pcengines.github.io/
        (you will receive a first-hand update via email)

        493043bc-6dcc-42f9-acce-bd2c7f5f2509-image.png

        btw:

        Also, don’t forget about Intel tweaks and the correct configuration of your NIC
        loader.conf.local....
        like:

        legal.intel_ipw.license_ack=1
        legal.intel_iwi.license_ack=1
        hw.igb.rx_process_limit=-1
        hw.igb.tx_process_limit=-1
        hw.igb.rxd=1024
        hw.igb.txd=1024
        hw.igb.max_interrupt_rate=64000

        and etc......

        system tunables...
        disable EEE,
        disable flow control
        kern.ipc.nmbclusters
        set net.inet.ip.redirect (enable tryforward routing path ipv4)

        and similar things....

        Cats bury it so they can't see it!
        (You know what I mean if you have a cat)

        C 1 Reply Last reply Reply Quote 0
        • C Offline
          CS @DaddyGo
          last edited by

          @DaddyGo thanks a lot for your response.

          For the record, the device has been working smoothly without any crashes for about 6 days after I disabled CPB. So that was definitely what caused the issue. I'll try to re-enable it and do some tuning in case this can be solved without having to keep CPB disabled or re-install pfSense from scratch. I'll provide updates about my progress on this thread for future reference.

          DaddyGoD 1 Reply Last reply Reply Quote 1
          • DaddyGoD Offline
            DaddyGo @CS
            last edited by DaddyGo

            @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

            the device has been working smoothly without any crashes for about 6 days after I disabled CPB

            This means that your problem is CPB dependent, but I really have not heard of anyone else having this problem in the long run.

            CPB is not a required feature, but if it already exists and can be enabled, why not use it.
            For us, it caused a significant improvement in ExpVPN connections

            These links can also be useful:

            https://teklager.se/en/knowledge-base/apu2-vpn-performance/
            https://teklager.se/en/knowledge-base/apu2-1-gigabit-throughput-pfsense/
            https://teklager.se/en/knowledge-base/

            btw:
            99% of pcEngines users use CPB, the forum is full of APU board descriptions, I think it's a good thing

            Cats bury it so they can't see it!
            (You know what I mean if you have a cat)

            fireodoF 1 Reply Last reply Reply Quote 0
            • fireodoF Offline
              fireodo @DaddyGo
              last edited by

              @DaddyGo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

              btw:
              99% of pcEngines users use CPB, the forum is full of APU board descriptions, I think it's a good thing

              I have CPB too, and I have tested with and without, there was no difference in the pfsense behavior (beside speed increase), but I think that the original posters APU has RAM that is on the "limit" and the increasing of speed make that RAM to produce errors.
              Thats what I suppose.

              Fine Weekend,
              fireodo

              Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
              SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
              pfsense 2.8.1 CE
              Packages: Apcupsd, Cron, Iftop, Iperf, LCDproc, Nmap, pfBlockerNG, RRD_Summary, Shellcmd, Snort, Speedtest, System_Patches.

              DaddyGoD 1 Reply Last reply Reply Quote 0
              • DaddyGoD Offline
                DaddyGo @fireodo
                last edited by

                @fireodo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                but I think that the original posters APU has RAM that is on the "limit" and the increasing of speed make that RAM to produce errors.

                This is very possible....exhausted RAM

                no matter how good the APU stuff is, 4GB of RAM was often on the "verge" for me

                Don't forget @fireodo that 3mdeb (BIOS developers) has been activating RAM ECC for some time

                so this should help with RAM errors

                Cats bury it so they can't see it!
                (You know what I mean if you have a cat)

                fireodoF 1 Reply Last reply Reply Quote 0
                • fireodoF Offline
                  fireodo @DaddyGo
                  last edited by

                  @DaddyGo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                  Don't forget @fireodo that 3mdeb (BIOS developers) has been activating RAM ECC for some time

                  so this should help with RAM errors

                  I know - but if the Hardware is not OK (the RAM-Chips) then even ECC cannot compensate that!

                  Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
                  SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
                  pfsense 2.8.1 CE
                  Packages: Apcupsd, Cron, Iftop, Iperf, LCDproc, Nmap, pfBlockerNG, RRD_Summary, Shellcmd, Snort, Speedtest, System_Patches.

                  DaddyGoD 1 Reply Last reply Reply Quote 0
                  • DaddyGoD Offline
                    DaddyGo @fireodo
                    last edited by

                    @fireodo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                    Hardware is not OK

                    That’s really true, and then we’ll see what the OP gets

                    Cats bury it so they can't see it!
                    (You know what I mean if you have a cat)

                    1 Reply Last reply Reply Quote 0
                    • C Offline
                      CS
                      last edited by

                      @DaddyGo, @fireodo , @stephenw10

                      Hey folks, let me provide an update here:

                      • Memtest was completed without errors but pfSense kept crashing.
                      • I upgraded coreboot to v4.12.0.5 but it kept crashing.
                      • I reinstalled pfSense 2.4.5-RELEASE-p1 and restored my config but it kept crashing, which is something I was not expecting.
                      • I kept the CPU Boost config option in my loader.conf.local and disabled again the option "Core Performance Boost" in Bios. It stopped crashing and CPU Boost is still active:
                      dev.cpu.0.temperature: 62.7C
                      dev.cpu.0.cx_method: C1/hlt C2/io
                      dev.cpu.0.cx_usage_counters: 24303377 0
                      dev.cpu.0.cx_usage: 100.00% 0.00% last 1981us
                      dev.cpu.0.cx_lowest: C1
                      dev.cpu.0.cx_supported: C1/1/0 C2/2/400
                      dev.cpu.0.freq_levels: 1400/-1 1200/-1 1000/-1
                      dev.cpu.0.freq: 1400
                      dev.cpu.0.%parent: acpi0
                      dev.cpu.0.%pnpinfo: _HID=none _UID=0
                      dev.cpu.0.%location: handle=\_PR_.P000
                      dev.cpu.0.%driver: cpu
                      dev.cpu.0.%desc: ACPI CPU
                      

                      Core Performance Boost is triggering this for some reason, it was crashing randomly and not when it was under load.
                      Could anyone share their APU2 loader.config.local file for reference? I'm wondering if I'm missing something obvious, I haven't done any tuning for years because it has been running smoothly with no issues.

                      fireodoF 1 Reply Last reply Reply Quote 0
                      • stephenw10S Offline
                        stephenw10 Netgate Administrator
                        last edited by

                        The fact it threw an MCA error implies it was hitting some hardware issue and it looked to be in the RAM.

                        I'm not entirely sire what the Core Performance Boost setting does but I could well believe it pushes the RAM or bus speed up with the CPU. Your RAM appears to be incapable of running stable at that new rate. Or something lsimilar to that.

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • kiokomanK Offline
                          kiokoman LAYER 8
                          last edited by kiokoman

                          are you sure it's ram?

                          to me it can be overclocked cpu or burned cpu

                          MCA: Vendor "AuthenticAMD", ID 0x730f01, APIC ID 1
                          MCA: CPU 1 COR ICACHE L1 IRD error
                          

                          Machine Check Architecture

                          CPU 1
                          COR = Corrected
                          ICACHE = Instruction Cache
                          L1 = L1 Cache (On Chip)
                          IRD = Instruction Fetch
                          error is self explanatory.

                          ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                          Please do not use chat/PM to ask for help
                          we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                          Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                          1 Reply Last reply Reply Quote 1
                          • stephenw10S Offline
                            stephenw10 Netgate Administrator
                            last edited by

                            Nope I'm not sure. And your explanation looks better!

                            Pretty much the only thinh that made me think it might be ram was:

                            MCA: Bank 1, Status 0x9400000000000151
                            

                            Which I assume to be a RAM bank but it could be cache or some other terminology.

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • C Offline
                              CS
                              last edited by CS

                              @kiokoman thanks, that's a good point. I have seen crashes with CPU ID 0 and CPU ID 1.

                              Last three dumps:

                              Fatal trap 12: page fault while in kernel mode
                              cpuid = 0; apic id = 00
                              fault virtual address	= 0x1af
                              fault code		= supervisor read instruction, page not present
                              instruction pointer	= 0x20:0x1af
                              stack pointer	        = 0x28:0xfffffe0118ce1890
                              frame pointer	        = 0x28:0xfffffe0118ce18f0
                              code segment		= base 0x0, limit 0xfffff, type 0x1b
                              			= DPL 0, pres 1, long 1, def32 0, gran 1
                              processor eflags	= resume, IOPL = 0
                              current process		= 11 (idle: cpu0)
                              trap number		= 12
                              panic: page fault
                              cpuid = 0
                              KDB: enter: panic
                              
                              spin lock 0xffffffff83517de8 (smp rendezvous) held by 0xfffff8009ddbf000 (tid 100206) too long
                              timeout stopping cpus
                              panic: spin lock held too long
                              cpuid = 1
                              KDB: enter: panic
                              
                              spin lock 0xffffffff83517de8 (smp rendezvous) held by 0xfffff8008b216620 (tid 100197) too long
                              timeout stopping cpus
                              panic: spin lock held too long
                              cpuid = 1
                              KDB: enter: panic
                              
                              1 Reply Last reply Reply Quote 0
                              • kiokomanK Offline
                                kiokoman LAYER 8
                                last edited by kiokoman

                                it can be useful for others with this kind of errors but

                                it's the MCI status register, not the RAM bank

                                ECC error (ADDR valid) 0x9426c0010b000813
                                ECC error overflow (ADDR valid) 0xd426c0010b000813
                                ECC error (ADDR invalid) 0x9026c0010b000813
                                ECC error overflow (ADDR invalid) 0xd026c0010b000813
                                L1 Cache Data Store error (UE) 0xb600200000000145
                                **L1 Instruction Cache (Instruction Fetch) error (ADDR valid) 0x9400000000000151**
                                L1 Instruction Cache (Instruction Fetch) error overflow (ADDR valid) 0xd400000000000151
                                Bus Unit (L2 Cache) error (UE) 0xb600000000020136
                                L2 Data Cache (Line Fill) error (ADDR valid) 0x9400400000000136
                                L2 Data Cache (Line Fill) error overflow (ADDR valid) 0xd400400000000136
                                

                                this is specific for this CPU:

                                The error-reporting machine check register banks supported in this processor are:
                                • MC0: Data cache (DC).
                                • MC1: Instruction cache (IC). <- "MCA bank 1"
                                • MC2: Bus unit (BU), including L2 cache.
                                • MC3: Reserved.
                                • MC4: Northbridge (NB), including the IO link. These MSRs are also accessible from configuration
                                space. There is only one NB error-reporting bank, independent of the number of cores.
                                • MC5: Fixed-issue reorder buffer (FR) machine check registers.
                                

                                ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                                Please do not use chat/PM to ask for help
                                we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                                Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                                1 Reply Last reply Reply Quote 1
                                • kiokomanK Offline
                                  kiokoman LAYER 8
                                  last edited by kiokoman

                                  @CS
                                  CPU ID 0 and CPU ID 1 it's probably a dual core cpu ?
                                  timeout stopping CPUs, it was unable to speak with the CPU
                                  with spin lock held too long, it's basically telling you: "I can't wait forever here, so I guess I'll stop and panic"
                                  based on what you had before I would check CPU settings like overclock / voltage / frequency, overheat, and dust on the fan if there is one

                                  Does it seem to be a common problem for Apu2 ? https://forum.netgate.com/topic/156830/could-you-help-me-analyze-these-crashdumps?_=1602587866619

                                  ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                                  Please do not use chat/PM to ask for help
                                  we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                                  Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                                  C 1 Reply Last reply Reply Quote 0
                                  • C Offline
                                    CS @kiokoman
                                    last edited by

                                    @kiokoman APU2 has a single AMD Embedded G series GX-412TC, 4 CPUs: 1 package x 4 cores.
                                    No overclocking and no active cooling in place for these boards.

                                    Reference: https://pcengines.ch/apu2.htm

                                    1 Reply Last reply Reply Quote 0
                                    • kiokomanK Offline
                                      kiokoman LAYER 8
                                      last edited by kiokoman

                                      ah i didn't understand that the problem was solved
                                      so it was Core Performance Boost
                                      it was probably overclocking the cpu

                                      ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                                      Please do not use chat/PM to ask for help
                                      we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                                      Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                                      1 Reply Last reply Reply Quote 0
                                      • C Offline
                                        CS
                                        last edited by

                                        @kiokoman correct, "Core Performance Boost" was causing it and we were trying to find out why considering that other folks have it enabled on APU2 without experiencing any issues.

                                        DaddyGoD 1 Reply Last reply Reply Quote 0
                                        • kiokomanK Offline
                                          kiokoman LAYER 8
                                          last edited by

                                          we have a saying in Italy, literally translated as ‘not all donuts come out with a hole’ meaning ‘not everything turns out as planned’ 😂
                                          it's called "silicon lottery", not all cpu are the same, there is ample opportunity for some microscopic part of a CPU, which works fine at a certain speed/voltage combination, to no work if the speed or voltage is increased.

                                          ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                                          Please do not use chat/PM to ask for help
                                          we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                                          Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                                          1 Reply Last reply Reply Quote 0
                                          • fireodoF Offline
                                            fireodo @CS
                                            last edited by fireodo

                                            @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                                            @DaddyGo, @fireodo , @stephenw10

                                            Could anyone share their APU2 loader.config.local file for reference? I'm wondering if I'm missing something obvious, I haven't done any tuning for years because it has been running smoothly with no issues.

                                            Hi, here the content of my loader.config.local:

                                            legal.intel_ipw.license_ack=1
                                            legal.intel_iwi.license_ack=1
                                            debug.acpi.avoid="_SB_.PCI0.GPIO" (necessary for loading apuled.ko)

                                            if you still have "hint.acpi_perf.0.disabled=1" in your loader.conf.local you will see those increased frecv. in sysctl dev.cpu even when you have disabled CPB in BIOS.

                                            Regards,
                                            fireodo

                                            Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
                                            SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
                                            pfsense 2.8.1 CE
                                            Packages: Apcupsd, Cron, Iftop, Iperf, LCDproc, Nmap, pfBlockerNG, RRD_Summary, Shellcmd, Snort, Speedtest, System_Patches.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.