Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pFsense + 22.05 keeps crashing

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 5 Posters 1.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Hmm the backtrace shows almost nothing:

      db:0:kdb.enter.default>  show pcpu
      cpuid        = 2
      dynamic pcpu = 0xfffffe007e30b1c0
      curthread    = 0xfffff8000526b000: pid 11 tid 100005 "idle: cpu2"
      curpcb       = 0xfffff8000526b5a0
      fpcurthread  = none
      idlethread   = 0xfffff8000526b000: tid 100005 "idle: cpu2"
      curpmap      = 0xffffffff8368f728
      tssp         = 0xffffffff837198f0
      commontssp   = 0xffffffff837198f0
      rsp0         = 0xfffffe0025698cc0
      kcr3         = 0x80000000040b1002
      ucr3         = 0xffffffffffffffff
      scr3         = 0xc763ef6e
      gs32p        = 0xffffffff83720108
      ldt          = 0xffffffff83720148
      tss          = 0xffffffff83720138
      tlb gen      = 566107
      curvnet      = 0
      db:0:kdb.enter.default>  bt
      Tracing pid 11 tid 100005 td 0xfffff8000526b000
      acpi_cpu_idle_mwait() at acpi_cpu_idle_mwait+0x68/frame 0xfffffe0025698a70
      acpi_cpu_idle() at acpi_cpu_idle+0x186/frame 0xfffffe0025698ab0
      cpu_idle_acpi() at cpu_idle_acpi+0x3e/frame 0xfffffe0025698ad0
      cpu_idle() at cpu_idle+0x9f/frame 0xfffffe0025698af0
      sched_idletd() at sched_idletd+0x326/frame 0xfffffe0025698bb0
      fork_exit() at fork_exit+0x7e/frame 0xfffffe0025698bf0
      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0025698bf0
      --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
      

      Something in the message buffer though:

      <2>NMI ISA 38, EISA 0
      NMI/cpu2 ... going to debugger
      

      NMI errors are usually some hardware failure or incompatibility.
      Did you enable anything that seemed to trigger those errors? Upgrade?

      It looks like you're still running 22.01. Any reason you're not running 22.05?

      Steve

      G GertjanG 2 Replies Last reply Reply Quote 0
      • G
        geekypr @stephenw10
        last edited by

        @stephenw10 Thanks for reply.
        I updated to 22.05 and ran good for a couple of days. Last night it crashes again.
        I'm the office so I don't have access to the new crash reports.
        I have let the system with the minimum configuration possible.

        I will upload a fresh report as soon as I can.

        Thanks again. I'll keep you posted...

        1 Reply Last reply Reply Quote 0
        • GertjanG
          Gertjan @stephenw10
          last edited by

          @stephenw10 said in pFsense + 22.05 keeps crashing:

          still running 22.01. Any reason you're not running 22.05?

          Maybe that is the issue.
          @geekypr thinks he's running 22.05 but pfSense thinks otherwise.
          Or a failed upgrade ?

          NMI (non maskable interrupts) that are not handled / intercepted by the OS will bring the system to a halt.
          A couple of days ago there was a post about NMI and FreeBSD was telling it was related to a failing RAM, a ram parity error, but the system wasn't even equipped with that kind of RAM.

          Start by entering the BIOS, and disable as much as possible.
          No more sound, LED ventilators, and non pfSense related gadgets should be activated. Go for a bare minimum.

          No "help me" PM's please. Use the forum, the community will thank you.
          Edit : and where are the logs ??

          G 1 Reply Last reply Reply Quote 0
          • G
            geekypr @Gertjan
            last edited by

            @gertjan I noticed my error on the subject. I did upgraded to 22.05 once I noticed.
            Maybe you're right, I will start to disable all the junk on the BIOS.

            Thanks for the tip...

            G 1 Reply Last reply Reply Quote 0
            • G
              geekypr @geekypr
              last edited by

              @geekypr Here's an update;

              Following advise from Gertjan, I reviewed the BIOS settings and found two things that not really needed; virtualization and hyper-threading. Those where enabled (changed to disable).
              Also, turbo-boost feature is enabled, but I leave it as is for now.

              The system has been running OK for 4 days straight with no issues so far, at least no crashes nor unexpected reboots.

              I appreciate the help and tips and will keep post any progress in a couple of days.

              Thanks!

              1 Reply Last reply Reply Quote 1
              • S
                skogs
                last edited by

                @geekypr I know that motherboard has 3 NICs... 2 good ones and 1 Realtek. If you're utilizing the Realtek NIC for anything you might need to search forums here for that specific fix action.

                G 1 Reply Last reply Reply Quote 0
                • G
                  geekypr @skogs
                  last edited by

                  @skogs Thanks, that third NIC is for IPMI. Not using it.
                  It worked for a week or two, today it started to crash again.

                  Attached is the latest dump file;
                  textdump.tar

                  Maybe I have a bad memory module. I just removed one (have 4GB x2) and will monitor behavior.

                  I hope there's something in the dump that can be find to resolve this issue.

                  Again, thank you for the help here...

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Still show an NMI error:

                    <2>NMI ISA 28, EISA 0
                    NMI/cpu2 ... going to debugger
                    

                    If it's not an actual hardware issue it's something FreeBSD cannot handle IMO.

                    Did you test running anything else on it? Some burn-in test maybe?

                    Steve

                    G 1 Reply Last reply Reply Quote 0
                    • G
                      geekypr @stephenw10
                      last edited by

                      @stephenw10
                      stress_ng can be an option?

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Sure, whatever you have access to. If you can boot and run some other OS without seeing any issues then it could be something FreeBSD specific.

                        1 Reply Last reply Reply Quote 0
                        • A
                          AdriftAtlas
                          last edited by

                          Are you still having crashes?

                          Both dumps are related to cpu power management. Are C-States enabled in the BIOS?

                          What is the output of the following executed from shell:

                          sysctl machdep | grep -i idle
                          
                          G 1 Reply Last reply Reply Quote 0
                          • G
                            geekypr @AdriftAtlas
                            last edited by

                            @adriftatlas Apologies for my late reply.

                            This is the output;

                            machdep.idle: acpi
                            machdep.idle_available: spin, mwait, hlt, acpi
                            machdep.idle_apl31: 0
                            machdep.idle_mwait: 1

                            It was fine until last night. Attached is the latest dump file.
                            What I noticed is, last night I got high humidity environment. And also remember the same environment before. I just don't think is related, but, I can't figure it out why all of the sudden it crashes.
                            textdump.tar

                            Trowed in other memory stick from another working server, just to make sure.
                            It's frustrating......

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Still hard to ignore the NMI errors for me. But if you can disable power saving features in the BIOS as a test you may as well.

                              You do have Speedstep (C-states) enabled:

                              est0: <Enhanced SpeedStep Frequency Control> on cpu0
                              

                              So you could also just disable powerd in System > Advanced > Misc

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • A
                                AdriftAtlas
                                last edited by

                                https://www.supermicro.com/products/archive/motherboard/x9scm-f

                                This motherboard is more than a decade old. Unless you updated the BIOS recently you're likely running old CPU microcode.

                                The BMC also likely has a watchdog that may be throwing NMIs, worth updating that too. There is a jumper on the motherboard for it and a BIOS setting, see page 57 in the manual:
                                https://www.supermicro.com/manuals/motherboard/C202_C204/MNL-1270.pdf

                                Latest BIOS:
                                1/6/2021 2.3a
                                https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X9SCM-F/BIOS

                                Latest BMC:
                                3.52
                                https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X9SCM-F/BMC

                                Other things to try:

                                • Disable "Power Technology" in BIOS; see page 76 in manual
                                • Disable PowerD in pfSense as suggested by @stephenw10
                                • Set CPU idle to HALT instead of ACPI or MWAIT:
                                sysctl machdep.idle_mwait=0
                                sysctl machdep.idle=hlt
                                
                                G 1 Reply Last reply Reply Quote 1
                                • G
                                  geekypr @AdriftAtlas
                                  last edited by

                                  @adriftatlas Thanks!
                                  I will try that over the weekend.
                                  (powerD is disabled)

                                  Keep you posted...

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.