Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    PFsense eats hard drives

    Hardware
    11
    19
    4.7k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      charliem
      last edited by

      @radrmr:

      I have been using PFsense for several years now at three different locations. At those locations, I have gone through 11 different motherboards as upgrades
      …..
      Over the years I have gone through 28 HDDs.

      That is an extraordinarily high failure rate (assuming all 28 are failures, and not capacity or speed upgrades).  My experience is that FreeBSD is extremely stable, capable of uptimes measured in years.  I cannot even think of how pfSense could induce hardware failures …

      How is the environment of these three systems?  Dirty, hot, bad power?  Perhaps a good UPS/filter would be good to have.

      1 Reply Last reply Reply Quote 0
      • R
        radrmr
        last edited by

        @charliem,

        They are not in commercial sever rooms or anything. They have all been in office environments. shouldn't be too much dust.
        As far as power, two of the three (the one that just died) have UPSs. Just cheap consumer grade APC for home use, nothing fancy.

        Yes the failure rate is high, but a number of those drives were pulled from desktops and whatever I had on the bench at the time.

        1 Reply Last reply Reply Quote 0
        • ?
          Guest
          last edited by

          so you're using seconds on hardware, and probably inexpensive mini-itx boards.

          I know of Alix boards that have run for years non-stop.

          We attempted to sell a Jetway dual-core Atom board, but the failure/return rate was extremely high, so we stopped.  (Yet I have employees who use them as their main firewall at home with no issue.)

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            @radrmr:

            Over the years I have gone through 28 HDDs.

            Ouch! Something's not right there. Any common elements between the systems? Bad power supplies seem like a likely suspect. It would be interesting to get some drive stats from any of them. The total writes figure for example. Perhaps they came from previously high load systems, hard to believe all 28 did though.

            If you're only running OpenVPN then you could run Nano where there are very few writes to the boot drive.

            Steve

            1 Reply Last reply Reply Quote 0
            • jimpJ
              jimp Rebel Alliance Developer Netgate
              last edited by

              I've lost a few HDDs over the years in my main pfSense box at home. I suspect the primary causes are heat and running a tiny laptop drive 24/7 that wasn't meant for that workload.

              On 2.1 and later you can also move /var/ and /tmp/ to RAM disks if you have enough spare RAM, so that the constant writing of log files and RRD databases doesn't impact your disk (spinning disk or SSD)

              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              1 Reply Last reply Reply Quote 0
              • W
                work_permit
                last edited by

                Have you monitored SMART status of the drives?  I agree that heat, coupled with second hand  hard drives being spun 24/7, could be the culprit.

                1 Reply Last reply Reply Quote 0
                • R
                  radrmr
                  last edited by

                  Interesting news, not even the bios was recognizing the HDD at boot. The POST screen was hanging and I could hear the drive seeking loudly. Out of frustration I banged the drive with the handle of the screwdriver.

                  It booted…?
                  I backed everything up again. I did order a new SSD and will be reinstalling soon.

                  @stephenw10
                  The power supplies in there now have been in use for some time, but not the entire time. Almost a year or so.

                  @jimp
                  Thanks for the tip I will try moving those partitions.
                  I have used laptop sized drives in the past Maybe 3 or 4 of those. And yes, they all died and I wasn't too surprised.

                  1 Reply Last reply Reply Quote 0
                  • D
                    divsys
                    last edited by

                    One occasional (and rather wonky) workaround for failing drives I've used in the past:

                    IF the failure appears in an area of the drive (rather than everywhere and always) I've roughly calculated the relative position on the disk of the "BAD" zone and create a partition that fully encompasses it (perhaps with some spare space on either side.

                    In a manual install, force a mountpoint for something you won't use, eg: /bmnt and assign it to the bad partition.  Depending on what space you have left you may need to assign other mount points around the "bad partition".

                    When pfsense boots, you can remove the reference to /bmnt from fstab and the bad spot remains isolated.  Not a permanent fix by any means (more like bubblegum and bailing wire  ::)  ) but it often stops the drive from stuttering over a failing area in normal usage and generating errors, etc.  It's given me time to find a suitable replacement drive/system while keeping the unit operational for a while longer.

                    In general though I find my pfsense installs to be mostly harmless to hard drives,  YMMV  :)

                    -jfp

                    1 Reply Last reply Reply Quote 0
                    • R
                      radrmr
                      last edited by

                      @work_permit
                      I just checked the SMART self test logs on the drive, it says passed. Weird though, it only shows 410 LifeTime hours. That is only 17 days. that drive is much older than that. Am I reading that wrong?

                      Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

                      1  Short offline      Completed without error      00%      410        -

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Try to read the raw SMART data from the drive and then check the manufacturers datasheet. Some values can be a little non-standard. Does 410 days seem more likely?

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • W
                          work_permit
                          last edited by

                          What does smart report for your drive temperature?  My experience with running consumer drives 24/7, you're asking for trouble at sustained temperatures above 40.  Above 45, you're asking for trouble and are going to get it.

                          I'm not familiar with using smart in pfsense, since i use cf.  fyi, freeBSD command is smartctl -a /dev/da0.

                          1 Reply Last reply Reply Quote 0
                          • H
                            Harvy66
                            last edited by

                            It has been my experience that HDs tend to outlive their usefulness, no matter how hard you run them, as long as they stay cool. If you have a lot of HDs dying, that's not a software issue, that's a hardware issue.

                            Spinning down a HDs is horribly hard on the motor, so make sure you don't have any idle drive power savings going on some how.

                            If you get SSDs, that can pretty much remove all mechanical issues, but if you have an HTTP proxy or anything, you may need to be concerned with data being written to the drive. Makings sure TRIM support is enabled can be very useful for these situations. I personally haven't gone through and manually enabled TRIM or checked on it to see if it was auto-detected. My SSDs rarely get written to and I have no services that want to write, for the most part.

                            1 Reply Last reply Reply Quote 0
                            • W
                              work_permit
                              last edited by

                              I agree with everything you say, Harvey.  Spinning up a drive IS stressful.  And heat IS the path to death.

                              There is a tradeoff.  Keeping a drive running 24/7 reduces the stress of spin up.  And, without good ventilation, insures that drives run hot and stay hot.

                              I spin down my archival storage arrays, but keep frequently used drives up 24/7.  The challenge is to keep them cool as well.

                              1 Reply Last reply Reply Quote 0
                              • R
                                robi
                                last edited by

                                This is why I never trust HDDs in appliances like pfSense. Thank God there exists such a great thing as NanoBSD! Running from RAM the whole thing.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.