Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade

    Scheduled Pinned Locked Moved General pfSense Questions
    41 Posts 8 Posters 3.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DaddyGoD
      DaddyGo @fireodo
      last edited by

      @fireodo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

      but I think that the original posters APU has RAM that is on the "limit" and the increasing of speed make that RAM to produce errors.

      This is very possible....exhausted RAM

      no matter how good the APU stuff is, 4GB of RAM was often on the "verge" for me

      Don't forget @fireodo that 3mdeb (BIOS developers) has been activating RAM ECC for some time

      so this should help with RAM errors

      Cats bury it so they can't see it!
      (You know what I mean if you have a cat)

      fireodoF 1 Reply Last reply Reply Quote 0
      • fireodoF
        fireodo @DaddyGo
        last edited by

        @DaddyGo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

        Don't forget @fireodo that 3mdeb (BIOS developers) has been activating RAM ECC for some time

        so this should help with RAM errors

        I know - but if the Hardware is not OK (the RAM-Chips) then even ECC cannot compensate that!

        Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
        SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
        pfsense 2.7.2 CE
        Packages: Apcupsd Cron Iftop Iperf LCDproc Nmap pfBlockerNG RRD_Summary Shellcmd Snort Speedtest System_Patches.

        DaddyGoD 1 Reply Last reply Reply Quote 0
        • DaddyGoD
          DaddyGo @fireodo
          last edited by

          @fireodo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

          Hardware is not OK

          That’s really true, and then we’ll see what the OP gets

          Cats bury it so they can't see it!
          (You know what I mean if you have a cat)

          1 Reply Last reply Reply Quote 0
          • C
            CS
            last edited by

            @DaddyGo, @fireodo , @stephenw10

            Hey folks, let me provide an update here:

            • Memtest was completed without errors but pfSense kept crashing.
            • I upgraded coreboot to v4.12.0.5 but it kept crashing.
            • I reinstalled pfSense 2.4.5-RELEASE-p1 and restored my config but it kept crashing, which is something I was not expecting.
            • I kept the CPU Boost config option in my loader.conf.local and disabled again the option "Core Performance Boost" in Bios. It stopped crashing and CPU Boost is still active:
            dev.cpu.0.temperature: 62.7C
            dev.cpu.0.cx_method: C1/hlt C2/io
            dev.cpu.0.cx_usage_counters: 24303377 0
            dev.cpu.0.cx_usage: 100.00% 0.00% last 1981us
            dev.cpu.0.cx_lowest: C1
            dev.cpu.0.cx_supported: C1/1/0 C2/2/400
            dev.cpu.0.freq_levels: 1400/-1 1200/-1 1000/-1
            dev.cpu.0.freq: 1400
            dev.cpu.0.%parent: acpi0
            dev.cpu.0.%pnpinfo: _HID=none _UID=0
            dev.cpu.0.%location: handle=\_PR_.P000
            dev.cpu.0.%driver: cpu
            dev.cpu.0.%desc: ACPI CPU
            

            Core Performance Boost is triggering this for some reason, it was crashing randomly and not when it was under load.
            Could anyone share their APU2 loader.config.local file for reference? I'm wondering if I'm missing something obvious, I haven't done any tuning for years because it has been running smoothly with no issues.

            fireodoF 1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              The fact it threw an MCA error implies it was hitting some hardware issue and it looked to be in the RAM.

              I'm not entirely sire what the Core Performance Boost setting does but I could well believe it pushes the RAM or bus speed up with the CPU. Your RAM appears to be incapable of running stable at that new rate. Or something lsimilar to that.

              Steve

              1 Reply Last reply Reply Quote 0
              • kiokomanK
                kiokoman LAYER 8
                last edited by kiokoman

                are you sure it's ram?

                to me it can be overclocked cpu or burned cpu

                MCA: Vendor "AuthenticAMD", ID 0x730f01, APIC ID 1
                MCA: CPU 1 COR ICACHE L1 IRD error
                

                Machine Check Architecture

                CPU 1
                COR = Corrected
                ICACHE = Instruction Cache
                L1 = L1 Cache (On Chip)
                IRD = Instruction Fetch
                error is self explanatory.

                ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                Please do not use chat/PM to ask for help
                we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                1 Reply Last reply Reply Quote 1
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Nope I'm not sure. And your explanation looks better!

                  Pretty much the only thinh that made me think it might be ram was:

                  MCA: Bank 1, Status 0x9400000000000151
                  

                  Which I assume to be a RAM bank but it could be cache or some other terminology.

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • C
                    CS
                    last edited by CS

                    @kiokoman thanks, that's a good point. I have seen crashes with CPU ID 0 and CPU ID 1.

                    Last three dumps:

                    Fatal trap 12: page fault while in kernel mode
                    cpuid = 0; apic id = 00
                    fault virtual address	= 0x1af
                    fault code		= supervisor read instruction, page not present
                    instruction pointer	= 0x20:0x1af
                    stack pointer	        = 0x28:0xfffffe0118ce1890
                    frame pointer	        = 0x28:0xfffffe0118ce18f0
                    code segment		= base 0x0, limit 0xfffff, type 0x1b
                    			= DPL 0, pres 1, long 1, def32 0, gran 1
                    processor eflags	= resume, IOPL = 0
                    current process		= 11 (idle: cpu0)
                    trap number		= 12
                    panic: page fault
                    cpuid = 0
                    KDB: enter: panic
                    
                    spin lock 0xffffffff83517de8 (smp rendezvous) held by 0xfffff8009ddbf000 (tid 100206) too long
                    timeout stopping cpus
                    panic: spin lock held too long
                    cpuid = 1
                    KDB: enter: panic
                    
                    spin lock 0xffffffff83517de8 (smp rendezvous) held by 0xfffff8008b216620 (tid 100197) too long
                    timeout stopping cpus
                    panic: spin lock held too long
                    cpuid = 1
                    KDB: enter: panic
                    
                    1 Reply Last reply Reply Quote 0
                    • kiokomanK
                      kiokoman LAYER 8
                      last edited by kiokoman

                      it can be useful for others with this kind of errors but

                      it's the MCI status register, not the RAM bank

                      ECC error (ADDR valid) 0x9426c0010b000813
                      ECC error overflow (ADDR valid) 0xd426c0010b000813
                      ECC error (ADDR invalid) 0x9026c0010b000813
                      ECC error overflow (ADDR invalid) 0xd026c0010b000813
                      L1 Cache Data Store error (UE) 0xb600200000000145
                      **L1 Instruction Cache (Instruction Fetch) error (ADDR valid) 0x9400000000000151**
                      L1 Instruction Cache (Instruction Fetch) error overflow (ADDR valid) 0xd400000000000151
                      Bus Unit (L2 Cache) error (UE) 0xb600000000020136
                      L2 Data Cache (Line Fill) error (ADDR valid) 0x9400400000000136
                      L2 Data Cache (Line Fill) error overflow (ADDR valid) 0xd400400000000136
                      

                      this is specific for this CPU:

                      The error-reporting machine check register banks supported in this processor are:
                      • MC0: Data cache (DC).
                      • MC1: Instruction cache (IC). <- "MCA bank 1"
                      • MC2: Bus unit (BU), including L2 cache.
                      • MC3: Reserved.
                      • MC4: Northbridge (NB), including the IO link. These MSRs are also accessible from configuration
                      space. There is only one NB error-reporting bank, independent of the number of cores.
                      • MC5: Fixed-issue reorder buffer (FR) machine check registers.
                      

                      ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                      Please do not use chat/PM to ask for help
                      we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                      Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                      1 Reply Last reply Reply Quote 1
                      • kiokomanK
                        kiokoman LAYER 8
                        last edited by kiokoman

                        @CS
                        CPU ID 0 and CPU ID 1 it's probably a dual core cpu ?
                        timeout stopping CPUs, it was unable to speak with the CPU
                        with spin lock held too long, it's basically telling you: "I can't wait forever here, so I guess I'll stop and panic"
                        based on what you had before I would check CPU settings like overclock / voltage / frequency, overheat, and dust on the fan if there is one

                        Does it seem to be a common problem for Apu2 ? https://forum.netgate.com/topic/156830/could-you-help-me-analyze-these-crashdumps?_=1602587866619

                        ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                        Please do not use chat/PM to ask for help
                        we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                        Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                        C 1 Reply Last reply Reply Quote 0
                        • C
                          CS @kiokoman
                          last edited by

                          @kiokoman APU2 has a single AMD Embedded G series GX-412TC, 4 CPUs: 1 package x 4 cores.
                          No overclocking and no active cooling in place for these boards.

                          Reference: https://pcengines.ch/apu2.htm

                          1 Reply Last reply Reply Quote 0
                          • kiokomanK
                            kiokoman LAYER 8
                            last edited by kiokoman

                            ah i didn't understand that the problem was solved
                            so it was Core Performance Boost
                            it was probably overclocking the cpu

                            ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                            Please do not use chat/PM to ask for help
                            we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                            Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                            1 Reply Last reply Reply Quote 0
                            • C
                              CS
                              last edited by

                              @kiokoman correct, "Core Performance Boost" was causing it and we were trying to find out why considering that other folks have it enabled on APU2 without experiencing any issues.

                              DaddyGoD 1 Reply Last reply Reply Quote 0
                              • kiokomanK
                                kiokoman LAYER 8
                                last edited by

                                we have a saying in Italy, literally translated as ‘not all donuts come out with a hole’ meaning ‘not everything turns out as planned’ 😂
                                it's called "silicon lottery", not all cpu are the same, there is ample opportunity for some microscopic part of a CPU, which works fine at a certain speed/voltage combination, to no work if the speed or voltage is increased.

                                ̿' ̿'\̵͇̿̿\з=(◕_◕)=ε/̵͇̿̿/'̿'̿ ̿
                                Please do not use chat/PM to ask for help
                                we must focus on silencing this @guest character. we must make up lies and alter the copyrights !
                                Don't forget to Upvote with the 👍 button for any post you find to be helpful.

                                1 Reply Last reply Reply Quote 0
                                • fireodoF
                                  fireodo @CS
                                  last edited by fireodo

                                  @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                                  @DaddyGo, @fireodo , @stephenw10

                                  Could anyone share their APU2 loader.config.local file for reference? I'm wondering if I'm missing something obvious, I haven't done any tuning for years because it has been running smoothly with no issues.

                                  Hi, here the content of my loader.config.local:

                                  legal.intel_ipw.license_ack=1
                                  legal.intel_iwi.license_ack=1
                                  debug.acpi.avoid="_SB_.PCI0.GPIO" (necessary for loading apuled.ko)

                                  if you still have "hint.acpi_perf.0.disabled=1" in your loader.conf.local you will see those increased frecv. in sysctl dev.cpu even when you have disabled CPB in BIOS.

                                  Regards,
                                  fireodo

                                  Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
                                  SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
                                  pfsense 2.7.2 CE
                                  Packages: Apcupsd Cron Iftop Iperf LCDproc Nmap pfBlockerNG RRD_Summary Shellcmd Snort Speedtest System_Patches.

                                  1 Reply Last reply Reply Quote 0
                                  • DaddyGoD
                                    DaddyGo @CS
                                    last edited by

                                    @CS said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                                    other folks have it enabled on APU2 without experiencing any issues.

                                    I confirm this 😉

                                    we have lot of such units at end users, they are "run" with CPB without any problems
                                    we basically configure these "routers / NGFWs" + pfSense with CPB

                                    CPB as I wrote above has been enabled in the Coreboot BIOS, but can only be interpreted on 1 core with a frequency of 1,400 instead of 1,000 this is good for OpenVPN stuff, for example...

                                    @CS I think don't look for the rabbit in the bush...
                                    this is not an issue whic is caused by CPB or pfSense

                                    I think the APU2 MOBO is damaged somewhere, cold soldering or something like that

                                    which causes a malfunction in the BUS or RAM operation due to the elevated clock....???

                                    maybe try a CPU shock test under linux and insulate the APU2 housing to warm up .....Voilà, maybe there will be results

                                    @kiokoman anyway, this is an AMD embedded series CPU can not really be overdriven, designed for low-power devices
                                    either it works or it doesn't, there is no overclocking it only the CPB allows for a small tuning...

                                    Cats bury it so they can't see it!
                                    (You know what I mean if you have a cat)

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      CS
                                      last edited by

                                      @DaddyGo @fireodo I won't continue troubleshooting this honestly, the board works fine for me with CPB disabled and I still get the boosted CPU frequency by having the right settings in my loader.conf.local. I'm not even sure if my performance would get any better! Actually, I'm now wondering, just out of curiosity, if this happens when you have both, the CPU boost settings in loader.conf.local and the BIOS setting enabled.

                                      G 1 Reply Last reply Reply Quote 1
                                      • AKEGECA
                                        AKEGEC
                                        last edited by

                                        @CS , every hardware has a different outcome in Q.C. Even with the same parts, but a different batch.
                                        Rule of thumb, 4 years lifespan (4 is death in Chinese). Nowadays you should be happy if your electronic works for more than 4 years.

                                        I am not sure how handy you are but you could try heating up the cpu (without thermal paste) with the heat gun on a flat surface. Keep around 10 cm distance with circular motion for around 10-15 mins. But be warn, you could burn the cpu.

                                        DaddyGoD 1 Reply Last reply Reply Quote 0
                                        • DaddyGoD
                                          DaddyGo @AKEGEC
                                          last edited by

                                          @AKEGEC said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                                          but you could try heating up the cpu (without thermal paste) with the heat gun on a flat surface.

                                          It's a very bad idea.
                                          This AMD CPU reaches its maximum TDP in about 40 seconds without a cooling surface (heat shrink) and dies...
                                          (moreover as I wrote it is an embedded CPU, soldered to the PCB)
                                          (earlier than the said Chinese 4-year death)

                                          the pcEngines stuff is stable and we have several pieces of it that has been working for 6 years (from ALIX and APU series)

                                          The ALIXs works as a radio os WISP PtP and AP and is constantly exposed to the weather.
                                          So these are not subject to your Chinese rule 😉

                                          Cats bury it so they can't see it!
                                          (You know what I mean if you have a cat)

                                          AKEGECA 1 Reply Last reply Reply Quote 0
                                          • AKEGECA
                                            AKEGEC @DaddyGo
                                            last edited by

                                            @DaddyGo said in "Page fault while in kernel mode" on APU2 after bios/coreboot upgrade:

                                            It's a very bad idea.
                                            This AMD CPU reaches its maximum TDP in about 40 seconds without a cooling surface (heat shrink) and dies...
                                            (moreover as I wrote it is an embedded CPU, soldered to the PCB)
                                            (earlier than the said Chinese 4-year death)

                                            the pcEngines stuff is stable and we have several pieces of it that has been working for 6 years (from ALIX and APU series)

                                            The ALIXs works as a radio os WISP PtP and AP and is constantly exposed to the weather.
                                            So these are not subject to your Chinese rule 😉

                                            Well to solder embedded cpu you need a temperature between 200-400°c.
                                            Anyway I was talking about heating it up a bit. As long you are not reaching 90°c you will be fine. But if it already passed the 4 years mark, then I would leave it as it is. 😏
                                            I don't know why manufactures are shortening their products lifespan. It used to be 15-30 years quality guarantee.

                                            DaddyGoD 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.