• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Kernel Panics on Intel Apollo Lake Processors

Plus 23.05 Development Snapshots (Retired)
4
39
4.2k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    SiDegr
    last edited by May 15, 2023, 9:26 AM

    Hello,

    I am opening this topic in regards to a major issue I am facing with 23.05 RC Betas

    I started experiencing multiple kernel panics per day on my system without any apparent reason, after studying the panic logs and comparing multiple of them and also doing some research online I concluded that the issue has to do with well known "bug" with Intel Apollo Lake CPUs (Intel errata APL30)

    login-to-view

    I came to the above conclusion as i noticed that every time the panics happened CPU Core 0 was at this state "acpi_cpu_idle_mwait()" and Core 1 was always at "cpu_idle_wakeup()"

    The workaround proposed by intel is to implement interrupts to wake up the processor from an MWAIT sleep state

    For now some workarounds I found but have not tested yet are either to disable MONITOR/MWAIT functions in the BIOS level if that is a setting that exist about that and another one would be to set the following flag in the boot parameters "set machdep.idle_mwait=0"

    It will hereby be good to see this issue getting resolved in future releases

    Best Regards,
    SiDe

    1 Reply Last reply Reply Quote 0
    • S
      stephenw10 Netgate Administrator
      last edited by May 15, 2023, 12:07 PM

      You tried either of those and it successfully prevented panics you were seeing?

      Do you have any of the crash reports for review?

      S 1 Reply Last reply May 15, 2023, 12:22 PM Reply Quote 0
      • S
        SiDegr @stephenw10
        last edited by SiDegr May 15, 2023, 12:36 PM May 15, 2023, 12:22 PM

        @stephenw10

        Hello Stephen,

        Unfortunately I did not have the time to test yet, I will check if my BIOS has this setting first and if it does I prefer this workaround in order to avoid the issue ones and for all

        I will check the bios and if the setting exists I will implement it and test for 1-2 days and come back to you with the results

        As far as the dumps are concerned I have both the running config option as well as the ddb output and the msg buffer

        I am attaching them to this post

        textdump.tar.0
        info.0

        P.S. Below you can find the hardware specs I am running this instance on:

        Computer: ZimaBoard 832
        Processor: Intel Celeron N3450
        RAM: 8GB LPDDR4
        Storage: Internal eMMC 32GB
        NIC: Dual Gbe Realtek RTL8111H

        1 Reply Last reply Reply Quote 0
        • S
          stephenw10 Netgate Administrator
          last edited by May 15, 2023, 12:33 PM

          Are all the crashes the same?

          I see nothing hat looks identical to that in anything we have logged. It does seem very similar to the known issue in Jasper Lake when running vitualised though.

          S 1 Reply Last reply May 15, 2023, 12:40 PM Reply Quote 0
          • S
            SiDegr @stephenw10
            last edited by May 15, 2023, 12:40 PM

            @stephenw10

            Indeed I can confirm that all logs are identical as one of the cores are always in this "acpi_cpu_idle_mwait()" and the second one is trying to issue a wakeup function to the idle core

            I can also confirm that the instance is not virtualized, I have edited the above post to include my hardware setup as well

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by May 15, 2023, 12:56 PM

              Hmm, surprising we haven't seen more reports of that if it affects all Apollo Lake CPUs. I'd guess it doesn't (Edit: looks like only some steppings). We'll await your test results.

              S 2 Replies Last reply May 15, 2023, 2:16 PM Reply Quote 0
              • S
                SiDegr @stephenw10
                last edited by SiDegr May 15, 2023, 2:33 PM May 15, 2023, 2:16 PM

                @stephenw10

                Indeed, I will perform the tests I mentioned and come back with the results (Edit: I can confirm that my bios did have the "monitor mwait" option which I have now disabled)

                1 Reply Last reply Reply Quote 1
                • S
                  SiDegr @stephenw10
                  last edited by SiDegr May 15, 2023, 3:28 PM May 15, 2023, 3:15 PM

                  @stephenw10

                  It panicked a again on a seemingly different issue as you will be able to see from the logs, seemingly "binuptime()" but I can also see that two of the cores are on an "acpi_cpu_c1()" state and if the issue that I am facing is similar to the one mentioned in the article that you shared they also mention issues with cpu C-States so I am wondering if I should disable that as well in the BIOS

                  I will keep on testing for now without disabling C-States and see how it goes and if it keeps crashing on that state I will disable that as well

                  1 Reply Last reply Reply Quote 1
                  • S
                    stephenw10 Netgate Administrator
                    last edited by May 15, 2023, 4:21 PM

                    Yeah, I could imagine any power states being affected.
                    The closest thing I have to test this with is an N3160 and that is absolutely stable so I'm guessing it's not affected.

                    S 1 Reply Last reply May 15, 2023, 4:51 PM Reply Quote 0
                    • S
                      SiDegr @stephenw10
                      last edited by May 15, 2023, 4:51 PM

                      @stephenw10

                      Yeah that is my consensus as well, as I am not that familiar with power states and those types of areas of the kernel could you possibly tell me if you know that information what do I loose if I turn those completely off? Is it just a power consumption matter or will I have issues with temperatures and cpu usage etc?

                      Thank you in advance

                      S 1 Reply Last reply May 15, 2023, 4:58 PM Reply Quote 0
                      • S
                        stephenw10 Netgate Administrator @SiDegr
                        last edited by May 15, 2023, 4:58 PM

                        @sidegr said in Kernel Panics on Intel Apollo Lake Processors:

                        Is it just a power consumption matter or will I have issues with temperatures and cpu usage etc?

                        Depends how good the cooling is. 😉
                        It will run hotter, probably not by much though.

                        S 1 Reply Last reply May 15, 2023, 5:26 PM Reply Quote 0
                        • S
                          SiDegr @stephenw10
                          last edited by May 15, 2023, 5:26 PM

                          @stephenw10

                          Well let's see how it goes without disabling the C States first as the ZimaBoard is a passively cooled SBC and i don't know how well it will take that :P however at the moment it is running at aprox 36 C

                          1 Reply Last reply Reply Quote 0
                          • S
                            stephenw10 Netgate Administrator
                            last edited by May 15, 2023, 7:04 PM

                            Check the sysctls for dev.cpu.0. It will show you what the lowest C state available is and what percentage of time the CPU has spent in each state:

                            [2.7.0-DEVELOPMENT][admin@t70.stevew.lan]/root: sysctl -a | grep cx_
                            hw.acpi.cpu.cx_lowest: C1
                            dev.cpu.3.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
                            dev.cpu.3.cx_usage_counters: 10875352 2826431 46137948
                            dev.cpu.3.cx_usage: 18.17% 4.72% 77.10% last 229us
                            dev.cpu.3.cx_lowest: C3
                            dev.cpu.3.cx_supported: C1/1/1 C2/2/500 C3/3/1000
                            dev.cpu.2.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
                            dev.cpu.2.cx_usage_counters: 10937005 2854991 46338850
                            dev.cpu.2.cx_usage: 18.18% 4.74% 77.06% last 13us
                            dev.cpu.2.cx_lowest: C3
                            dev.cpu.2.cx_supported: C1/1/1 C2/2/500 C3/3/1000
                            dev.cpu.1.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
                            dev.cpu.1.cx_usage_counters: 9982605 2917088 45710826
                            dev.cpu.1.cx_usage: 17.03% 4.97% 77.99% last 16us
                            dev.cpu.1.cx_lowest: C3
                            dev.cpu.1.cx_supported: C1/1/1 C2/2/500 C3/3/1000
                            dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
                            dev.cpu.0.cx_usage_counters: 897566084 392 576
                            dev.cpu.0.cx_usage: 99.99% 0.00% 0.00% last 127us
                            dev.cpu.0.cx_lowest: C3
                            dev.cpu.0.cx_supported: C1/1/1 C2/2/500 C3/3/1000
                            

                            I've never tried but it looks like you might be able to set the method used there.

                            S 2 Replies Last reply May 15, 2023, 9:28 PM Reply Quote 0
                            • S
                              SiDegr @stephenw10
                              last edited by May 15, 2023, 9:28 PM

                              @stephenw10

                              Well it seems that at least the setting i changed on the BIOS side did indeed take effect as you can see that the method for the states is C1/hlt instead of mwait that you get on your end

                              hw.acpi.cpu.cx_lowest: C1
                              dev.cpu.3.cx_method: C1/hlt C2/io C3/io
                              dev.cpu.3.cx_usage_counters: 9348450 0 0
                              dev.cpu.3.cx_usage: 100.00% 0.00% 0.00% last 93us
                              dev.cpu.3.cx_lowest: C1
                              dev.cpu.3.cx_supported: C1/1/1 C2/2/50 C3/3/150
                              dev.cpu.2.cx_method: C1/hlt C2/io C3/io
                              dev.cpu.2.cx_usage_counters: 27785106 0 0
                              dev.cpu.2.cx_usage: 100.00% 0.00% 0.00% last 491us
                              dev.cpu.2.cx_lowest: C1
                              dev.cpu.2.cx_supported: C1/1/1 C2/2/50 C3/3/150
                              dev.cpu.1.cx_method: C1/hlt C2/io C3/io
                              dev.cpu.1.cx_usage_counters: 26451744 0 0
                              dev.cpu.1.cx_usage: 100.00% 0.00% 0.00% last 236us
                              dev.cpu.1.cx_lowest: C1
                              dev.cpu.1.cx_supported: C1/1/1 C2/2/50 C3/3/150
                              dev.cpu.0.cx_method: C1/hlt C2/io C3/io
                              dev.cpu.0.cx_usage_counters: 13222448 0 0
                              dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 12us
                              dev.cpu.0.cx_lowest: C1
                              dev.cpu.0.cx_supported: C1/1/1 C2/2/50 C3/3/150
                              
                              

                              So from what i can see my cores stay at C1 which is the halt state all the time while sleeping i suppose as it shows 0 time sepent on other states

                              I do not exactly know what that means in terms of thermals and stuff though as i am not that familiar with the states

                              1 Reply Last reply Reply Quote 0
                              • S
                                SiDegr @stephenw10
                                last edited by May 16, 2023, 9:06 AM

                                @stephenw10

                                I have also updated the cpu microcode as suggested on the article regarding the VM issues on Jasper Lake

                                It experienced a panic twice since yesterday and both times it was stuck it did not even reboot after the crash but again it has to do with the C1 state

                                1 Reply Last reply Reply Quote 0
                                • S
                                  SiDegr
                                  last edited by SiDegr May 17, 2023, 9:22 AM May 17, 2023, 9:21 AM

                                  It has been a little more than 24 hours since i have updated the cpu microcode and i haven't had any crashes yet which never happened before as i used to get 1-2 crashes per day

                                  I will test it for a couple more days like this and then i will try enabling the monitor mwait functionality in bios as well to check if it will work like that

                                  T 1 Reply Last reply May 18, 2023, 6:22 AM Reply Quote 1
                                  • T
                                    thebear @SiDegr
                                    last edited by thebear May 18, 2023, 6:23 AM May 18, 2023, 6:22 AM

                                    @sidegr You got my attention there are a lot of stepping issues with the N5105 and Proxmox virtualizations. Just letting you know that from what I read microcode #24 fixes a lot.

                                    https://forum.proxmox.com/threads/vm-freezes-irregularly.111494/page-33

                                    Thinking about testing 23.05 RC on my N5105 tonight.

                                    PS: to get C3 states I got this implemented.

                                    login-to-view

                                    S 2 Replies Last reply May 18, 2023, 7:42 AM Reply Quote 0
                                    • S
                                      SiDegr @thebear
                                      last edited by May 18, 2023, 7:42 AM

                                      @thebear

                                      I can confirm that it crashed again even with the updated microcode and the crashes again are consistent this time crashing at sched_pickcpu() almost every single time or on cpu_search_highest() again while the system makes those calls the cores are in acpi_cpu_c1() state so the issue seems to be with cpu states in general

                                      From the data i collected it mostly has to do with when some or most of the cores are in sleep state and the scheduler tries to wake them up to assign some task to them and this is when the panic occurs

                                      I will also try your suggestion to add those flags in my tunables as from what i gathered so far my cores doesn't ever go into C3 they only reach C1 as it is now

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        SiDegr @thebear
                                        last edited by May 18, 2023, 4:27 PM

                                        @thebear

                                        Enabling the C3 state made the issue way worse that it was before so i guess i will stay with C1 for now although i am getting slightly higher temperatures while only allowing it to reach C1

                                        It seems there is a big issue with the kernel and this generation's processors

                                        Let's hope it will get fixed on the final release of 23.05

                                        Dobby_D 1 Reply Last reply May 18, 2023, 6:06 PM Reply Quote 0
                                        • Dobby_D
                                          Dobby_ @SiDegr
                                          last edited by May 18, 2023, 6:06 PM

                                          @sidegr said in Kernel Panics on Intel Apollo Lake Processors:

                                          Let's hope it will get fixed on the final release of 23.05

                                          Perhaps I am wrong with, but could it also be the Intel SpeedStep technology in that case? There were also an interesting thread I was sadly not able to find again
                                          here in the forum, where someone was setting up
                                          in the "tune ables" something based on the
                                          SpeedStep technology and that solved that
                                          problem really good.

                                          #~. @Dobby

                                          Turris Omnia - 4 Ports - 2 GB RAM / TurrisOS 7 Release (Btrfs)
                                          PC Engines APU4D4 - 4 Ports - 4 GB RAM / pfSense CE 2.7.2 Release (ZFS)
                                          PC Engines APU6B4 - 4 Ports - 4 GB RAM / pfSense+ (Plus) 24.03_1 Release (ZFS)

                                          S 1 Reply Last reply May 18, 2023, 7:30 PM Reply Quote 0
                                          3 out of 39
                                          • First post
                                            3/39
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.