Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pFsense + 22.05 keeps crashing

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 5 Posters 1.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G
      geekypr
      last edited by

      Hi,
      I'm new on this and need some help point me to the right direction.
      Having some issues lately with my PF box that keep crashing. Sometimes it reboots, sometimes not.
      I have attached the latest crash file to review.
      textdump.tar

      Don't know if this a hardware related or some bad configuration.

      PF running on:
      Supermicro X9SCM-F
      Intel(R) Xeon(R) CPU E31230 @ 3.20GHz
      4GB RAM
      120GB SSD

      Any help will be highly appreciated...

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Hmm the backtrace shows almost nothing:

        db:0:kdb.enter.default>  show pcpu
        cpuid        = 2
        dynamic pcpu = 0xfffffe007e30b1c0
        curthread    = 0xfffff8000526b000: pid 11 tid 100005 "idle: cpu2"
        curpcb       = 0xfffff8000526b5a0
        fpcurthread  = none
        idlethread   = 0xfffff8000526b000: tid 100005 "idle: cpu2"
        curpmap      = 0xffffffff8368f728
        tssp         = 0xffffffff837198f0
        commontssp   = 0xffffffff837198f0
        rsp0         = 0xfffffe0025698cc0
        kcr3         = 0x80000000040b1002
        ucr3         = 0xffffffffffffffff
        scr3         = 0xc763ef6e
        gs32p        = 0xffffffff83720108
        ldt          = 0xffffffff83720148
        tss          = 0xffffffff83720138
        tlb gen      = 566107
        curvnet      = 0
        db:0:kdb.enter.default>  bt
        Tracing pid 11 tid 100005 td 0xfffff8000526b000
        acpi_cpu_idle_mwait() at acpi_cpu_idle_mwait+0x68/frame 0xfffffe0025698a70
        acpi_cpu_idle() at acpi_cpu_idle+0x186/frame 0xfffffe0025698ab0
        cpu_idle_acpi() at cpu_idle_acpi+0x3e/frame 0xfffffe0025698ad0
        cpu_idle() at cpu_idle+0x9f/frame 0xfffffe0025698af0
        sched_idletd() at sched_idletd+0x326/frame 0xfffffe0025698bb0
        fork_exit() at fork_exit+0x7e/frame 0xfffffe0025698bf0
        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0025698bf0
        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
        

        Something in the message buffer though:

        <2>NMI ISA 38, EISA 0
        NMI/cpu2 ... going to debugger
        

        NMI errors are usually some hardware failure or incompatibility.
        Did you enable anything that seemed to trigger those errors? Upgrade?

        It looks like you're still running 22.01. Any reason you're not running 22.05?

        Steve

        G GertjanG 2 Replies Last reply Reply Quote 0
        • G
          geekypr @stephenw10
          last edited by

          @stephenw10 Thanks for reply.
          I updated to 22.05 and ran good for a couple of days. Last night it crashes again.
          I'm the office so I don't have access to the new crash reports.
          I have let the system with the minimum configuration possible.

          I will upload a fresh report as soon as I can.

          Thanks again. I'll keep you posted...

          1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @stephenw10
            last edited by

            @stephenw10 said in pFsense + 22.05 keeps crashing:

            still running 22.01. Any reason you're not running 22.05?

            Maybe that is the issue.
            @geekypr thinks he's running 22.05 but pfSense thinks otherwise.
            Or a failed upgrade ?

            NMI (non maskable interrupts) that are not handled / intercepted by the OS will bring the system to a halt.
            A couple of days ago there was a post about NMI and FreeBSD was telling it was related to a failing RAM, a ram parity error, but the system wasn't even equipped with that kind of RAM.

            Start by entering the BIOS, and disable as much as possible.
            No more sound, LED ventilators, and non pfSense related gadgets should be activated. Go for a bare minimum.

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            G 1 Reply Last reply Reply Quote 0
            • G
              geekypr @Gertjan
              last edited by

              @gertjan I noticed my error on the subject. I did upgraded to 22.05 once I noticed.
              Maybe you're right, I will start to disable all the junk on the BIOS.

              Thanks for the tip...

              G 1 Reply Last reply Reply Quote 0
              • G
                geekypr @geekypr
                last edited by

                @geekypr Here's an update;

                Following advise from Gertjan, I reviewed the BIOS settings and found two things that not really needed; virtualization and hyper-threading. Those where enabled (changed to disable).
                Also, turbo-boost feature is enabled, but I leave it as is for now.

                The system has been running OK for 4 days straight with no issues so far, at least no crashes nor unexpected reboots.

                I appreciate the help and tips and will keep post any progress in a couple of days.

                Thanks!

                1 Reply Last reply Reply Quote 1
                • S
                  skogs
                  last edited by

                  @geekypr I know that motherboard has 3 NICs... 2 good ones and 1 Realtek. If you're utilizing the Realtek NIC for anything you might need to search forums here for that specific fix action.

                  G 1 Reply Last reply Reply Quote 0
                  • G
                    geekypr @skogs
                    last edited by

                    @skogs Thanks, that third NIC is for IPMI. Not using it.
                    It worked for a week or two, today it started to crash again.

                    Attached is the latest dump file;
                    textdump.tar

                    Maybe I have a bad memory module. I just removed one (have 4GB x2) and will monitor behavior.

                    I hope there's something in the dump that can be find to resolve this issue.

                    Again, thank you for the help here...

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Still show an NMI error:

                      <2>NMI ISA 28, EISA 0
                      NMI/cpu2 ... going to debugger
                      

                      If it's not an actual hardware issue it's something FreeBSD cannot handle IMO.

                      Did you test running anything else on it? Some burn-in test maybe?

                      Steve

                      G 1 Reply Last reply Reply Quote 0
                      • G
                        geekypr @stephenw10
                        last edited by

                        @stephenw10
                        stress_ng can be an option?

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Sure, whatever you have access to. If you can boot and run some other OS without seeing any issues then it could be something FreeBSD specific.

                          1 Reply Last reply Reply Quote 0
                          • A
                            AdriftAtlas
                            last edited by

                            Are you still having crashes?

                            Both dumps are related to cpu power management. Are C-States enabled in the BIOS?

                            What is the output of the following executed from shell:

                            sysctl machdep | grep -i idle
                            
                            G 1 Reply Last reply Reply Quote 0
                            • G
                              geekypr @AdriftAtlas
                              last edited by

                              @adriftatlas Apologies for my late reply.

                              This is the output;

                              machdep.idle: acpi
                              machdep.idle_available: spin, mwait, hlt, acpi
                              machdep.idle_apl31: 0
                              machdep.idle_mwait: 1

                              It was fine until last night. Attached is the latest dump file.
                              What I noticed is, last night I got high humidity environment. And also remember the same environment before. I just don't think is related, but, I can't figure it out why all of the sudden it crashes.
                              textdump.tar

                              Trowed in other memory stick from another working server, just to make sure.
                              It's frustrating......

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Still hard to ignore the NMI errors for me. But if you can disable power saving features in the BIOS as a test you may as well.

                                You do have Speedstep (C-states) enabled:

                                est0: <Enhanced SpeedStep Frequency Control> on cpu0
                                

                                So you could also just disable powerd in System > Advanced > Misc

                                Steve

                                1 Reply Last reply Reply Quote 0
                                • A
                                  AdriftAtlas
                                  last edited by

                                  https://www.supermicro.com/products/archive/motherboard/x9scm-f

                                  This motherboard is more than a decade old. Unless you updated the BIOS recently you're likely running old CPU microcode.

                                  The BMC also likely has a watchdog that may be throwing NMIs, worth updating that too. There is a jumper on the motherboard for it and a BIOS setting, see page 57 in the manual:
                                  https://www.supermicro.com/manuals/motherboard/C202_C204/MNL-1270.pdf

                                  Latest BIOS:
                                  1/6/2021 2.3a
                                  https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X9SCM-F/BIOS

                                  Latest BMC:
                                  3.52
                                  https://www.supermicro.com/en/support/resources/downloadcenter/firmware/MBD-X9SCM-F/BMC

                                  Other things to try:

                                  • Disable "Power Technology" in BIOS; see page 76 in manual
                                  • Disable PowerD in pfSense as suggested by @stephenw10
                                  • Set CPU idle to HALT instead of ACPI or MWAIT:
                                  sysctl machdep.idle_mwait=0
                                  sysctl machdep.idle=hlt
                                  
                                  G 1 Reply Last reply Reply Quote 1
                                  • G
                                    geekypr @AdriftAtlas
                                    last edited by

                                    @adriftatlas Thanks!
                                    I will try that over the weekend.
                                    (powerD is disabled)

                                    Keep you posted...

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.