Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Terribly slow boot times and frequent boot freezes

    Scheduled Pinned Locked Moved General pfSense Questions
    14 Posts 3 Posters 586 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      rmeskill
      last edited by

      Hey there! I run PFSense on a Topton 4x 2.5Gb ethernet mini computer from AliExpress and it's been pretty stable for pretty long, save some early issues with setup and interfaces. But I just left the house for a few days and saw my power go out for longer than my UPS could manage which resulted in my router going offline and it seems when it came back up when power was restored the boot failed. Once I got home and was able to reboot I saw it hung most times on a series of issues, but frequently:

      da0 at umass-simo bus a scbusi target a lun a
      da0: ‹Generic STORAGE DEVICE 1404> Removable Direct Access SPC-4 SCSI device
      da0: 40.000MB/s transfers
      da0: Attempt to query device size failed: NOT READY, Medium not present
      

      or what looks like a SCSI/USB error. I'm running my boot/pfsense disk on a NVME drive which has passed it's S.M.A.R.T. tests and I've shut off USB drive booting in the BIOS.

      Sometimes (after rebooting multiple times) I can get it past this point, but then it takes AGES to boot-something like 12-15 minutes before I can get to the web page. It hangs on 'Synchronizing user settings' and at several other points, such as 'Starting Web Configurator' and then all the installed packages. I've gone back and tried to remove as many unused packages as I can, and it seems like it's running stably now, but clearly it's not safe to survive a reboot and I'd like to understand what's causing the boot failure and how to avoid it in future.

      GertjanG 1 Reply Last reply Reply Quote 0
      • GertjanG
        Gertjan @rmeskill
        last edited by

        @rmeskill

        You use a UPS. Is it connected by a serial or USB plug and is pfSense set up to use the UPS ?
        If so, when the battery starts to become empty, pfSense will power down itself the clean way.
        Exactly like you shut it down in the GUI with :

        27520d24-4c71-4894-876f-b30401d21717-image.png

        This way of shutting down is somewhat mandatory.
        You (or your power company) can't remove the power to shut it down. Every device with a complex OS and file system (PC, phone, etc) needs a controlled shut down. Changes are that your device won't boot again, because the file system (structure) got damaged.

        Connect to the console (HDMI, or serial USB - not SSH !) and do this :
        How to Run a pfSense Software File System Check (5/2020).

        That said, this message

        da0: Attempt to query device size failed: NOT READY, Medium not present

        might also be an indication that the drive is somewhat EOL.
        The next time it does boot : take (export) a backup of your pfSense settings !!
        Then get a screw driver, open up the ali device, locate the drive, get a new, identical (no : same size but bigger ^^ as they cost close to notting these days) and order a new one.

        Don't worry, it's 2025, drives still do die all the time 😊 (😠 )

        What is your pfSense version ?

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        R 1 Reply Last reply Reply Quote 0
        • R
          rmeskill @Gertjan
          last edited by

          @Gertjan PFSense version is 24.11. I am running a UPS but it's not plugged into the router, rather into my unRAID server as it offers better management, with the router just set to auto-start after power failure.

          I am willing to believe it could be a slowly failing/bad drive, but it's not showing any SMART issues and is barely 2 years old:

          === START OF SMART DATA SECTION ===
          SMART overall-health self-assessment test result: PASSED

          SMART/Health Information (NVMe Log 0x02)
          Critical Warning: 0x00
          Temperature: 40 Celsius
          Available Spare: 100%
          Available Spare Threshold: 10%
          Percentage Used: 94%
          Data Units Read: 3,947,408 [2.02 TB]
          Data Units Written: 619,185,433 [317 TB]
          Host Read Commands: 52,825,492
          Host Write Commands: 7,555,637,804
          Controller Busy Time: 31,838
          Power Cycles: 64
          Power On Hours: 19,611
          Unsafe Shutdowns: 53
          Media and Data Integrity Errors: 0
          Error Information Log Entries: 0
          Warning Comp. Temperature Time: 341
          Critical Comp. Temperature Time: 0

          Error Information (NVMe Log 0x01, 16 of 64 entries)
          No Errors Logged

          Self-test Log (NVMe Log 0x06)
          Self-test status: No self-test in progress
          Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code
          0 Short Completed without error 19602 - - - - -
          1 Extended Aborted: Self-test command 19602 - - - - -
          2 Extended Completed without error 19602 - - - - -

          GertjanG 1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @rmeskill
            last edited by Gertjan

            @rmeskill said in Terribly slow boot times and frequent boot freezes:

            rather into my unRAID server as it offers better management

            In that case, I'm pretty sure you could, use the pfSense UPS package so it connects to another UPS server available on your LAN : the unRAID UPS server.
            The pfSense UPS software will now be a client, and have indirectly access to the state of the UPS connected to the unRAID so it can do a clean power down when needed.

            I insist on using some UPS protection, as a sudden power loss of pfSense isn't the end of the world, in a worse case scenario, you re install pfSense 'clean' with the installer, or from an USB drive, import the config, and you're back on line again.
            On the other hand, it's always a nasty solution when you main Internet access goes down ...

            @rmeskill said in Terribly slow boot times and frequent boot freezes:

            but it's not showing any SMART issues and is barely 2 years old

            Power On Hours: 19,611 / 24 hours / 365 days = 2,2 years old.
            It should not show, after several seconds after boot up :

            @rmeskill said in Terribly slow boot times and frequent boot freezes:

            da0 at umass-simo bus a scbusi target a lun a
            da0: ‹Generic STORAGE DEVICE 1404> Removable Direct Access SPC-4 SCSI device
            da0: 40.000MB/s transfers
            da0: Attempt to query device size failed: NOT READY, Medium not present

            the drive / device (this is the boot drive, right, not some other drive ?) is detected alright.
            But the OS asked a question, and it doesn't answer with the "wait a moment, not ready yet", but a, imho, more scary : "NOT READY, Medium not present". Afaik, the 'medium' can't be removed ^^

            I see this when I boot mine :

            nvme0: Allocated 16MB host memory buffer
            mmcsd0: 16GB <MMCHC TB2916 9.0 SN 51891D3E MFG 11/2021 by 112 0x0000> at mmc0 50.0MHz/8bit/65535-block
            mmcsd0boot0: 4MB partition 1 at mmcsd0
            mmcsd0boot1: 4MB partition 2 at mmcsd0
            mmcsd0rpmb: 4MB partition 3 at mmcsd0
            Trying to mount root from zfs:pfSense/ROOT/24.11-Relase []...
            Root mount waiting for: CAM
            Root mount waiting for: CAM
            Root mount waiting for: CAM
            nda0 at nvme0 bus 0 scbus0 target 0 lun 1
            nda0: <M.2 (P80) 3TE6 V20B09 YCA12111250120759>
            nda0: Serial Number YCA12111250120759
            nda0: nvme version 1.3
            nda0: 114473MB (234441648 512 byte sectors)
            

            Note : I see a 16 Gbytes nvm drive ... and I don't use that drive, as my "4100 max "has an 114 Gbytes SSD drive, which handles the "writes" a way better over time.

            You use the ZFS file system ?

            @rmeskill said in Terribly slow boot times and frequent boot freezes:

            Data Units Written: 619,185,433 [317 TB]

            oh .... 317 T !

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            R 1 Reply Last reply Reply Quote 0
            • R
              rmeskill @Gertjan
              last edited by

              @Gertjan well this just raises a bunch of questions for me:

              1. Is ZFS bad? I didn't explicitly choose it, it just seems it was installed that way
              2. is running NVMe bad? My Topton box is fairly small and doesn't have internal room for anything other than a NVMe drive, I think? I might be able to put a SSD in instead if that should be better?
              3. is 317TB a lot? I don't know what's writing so much if so...
              4. do we think there's a chance this drive is failing or got corrupted? I can look into UPS but it clearly went down without a UPS so something could have broken there somehow...
              GertjanG 1 Reply Last reply Reply Quote 0
              • GertjanG
                Gertjan @rmeskill
                last edited by

                @rmeskill said in Terribly slow boot times and frequent boot freezes:

                Is ZFS bad? I didn't explicitly choose it, it just seems it was installed that way

                Noop, on the contrary. Is handles way better our 'new' disks that are not spinning plates, but 'sophisticated silicon gates' (SSD, nvme etc etc).
                Still, and it's still me rambling : don't think hardware or software will protect you against power failure.
                Power failures == bad.

                If this :

                da0: Attempt to query device size failed: NOT READY, Medium not present

                wasn't caused by the power failure, you have another issue. Most probably : drive not ok.
                Get a new drive, and I'm pretty sure your issue

                Terribly slow boot times and frequent boot freezes

                will be gone.

                No "help me" PM's please. Use the forum, the community will thank you.
                Edit : and where are the logs ??

                R 1 Reply Last reply Reply Quote 0
                • R
                  rmeskill @Gertjan
                  last edited by

                  @Gertjan said in Terribly slow boot times and frequent boot freezes:

                  If this :

                  da0: Attempt to query device size failed: NOT READY, Medium not present

                  wasn't caused by the power failure

                  This error isn't explicitly from the power failure-it's coming up every boot now. But yeah, I don't know if it's from a failure with the drive or another hardware interface failure. I do, however, have 2x NVMe slots and, when I moved the drive it still had the same error coming up, so that should rule out the physical NVMe slot. I've opened a case with Kingston to see if they'll honor a RMA

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    That error from da0 is probably unrelated. It's not the NVMe drive. I'd guess that device has an SD card slot or something similar. It has no card in it so reports that media error.

                    R 1 Reply Last reply Reply Quote 0
                    • R
                      rmeskill @stephenw10
                      last edited by

                      @stephenw10 if so, then any ideas why my boots are taking 20-30 minutes? And if there might be some way to test/confirm an issue with the NVMe drive?

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Where in the boot is it stalling?

                        Try pressing ctl+t when it's stalled. That should show you what process it's waiting for.

                        You could also try booting verbose. Interrupt the boot at the loader menu to reach the loader prompt (OK>) and enter: boot -v
                        That may give you additional details about what it's doing before the delay.

                        R 1 Reply Last reply Reply Quote 0
                        • R
                          rmeskill @stephenw10
                          last edited by rmeskill

                          @stephenw10

                          Either really early doors here:

                          b2ba77c6-f177-4e55-9938-c097397b0dab-E4D0756E-4F8E-4538-8FA9-0D4AE3519DE7_1_105_c.jpeg

                          or here:

                          91f6f66f-7502-485e-a2b4-ed0edb5c4506-image.png

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Hmm, so in both those situation is does eventually boot?

                            We've seen some other device hit those but AFAIK they never boot from there.

                            Try booting verbose to get more output from the 2nd scenario.

                            R 1 Reply Last reply Reply Quote 0
                            • R
                              rmeskill @stephenw10
                              last edited by

                              @stephenw10 that's actually a good question, but I think no. Sometimes it freezes there and sometimes it boots. But I just had another power cut and ended up with this screen, it looks pretty damning for the NVMe:

                              2abdb9ad-46d7-401b-a6a4-a18a7dfad024-image.png

                              Anyone have any suggestions for a good value/quality NVMe replacement?

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Urgh, yeah that's not good. It's difficult to break ZFS just by removing the power. So, yes, could be a bad drive.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.