Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Dying hard drive, replacing with SSD? How to perform this as quickly as possible

    Scheduled Pinned Locked Moved Hardware
    5 Posts 5 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      pftdm007
      last edited by pftdm007

      Hello guys,

      So I recently had some issues with hard drives on a freenas storage server and that made me look more closely at my pfsense HDD's condition. I think the hard drive I currently use in this pfsense box is dying:

      SMART Attributes Data Structure revision number: 10
      Vendor Specific SMART Attributes with Thresholds:
      ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
        1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       190793353
        3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
        4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       210
        5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       20
        7 Seek_Error_Rate         0x000f   091   060   030    Pre-fail  Always       -       1346666632
        9 Power_On_Hours          0x0032   014   014   000    Old_age   Always       -       75727
       10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
       12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       210
      184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
      187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
      188 Command_Timeout         0x0032   100   019   000    Old_age   Always       -       25770262658
      189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
      190 Airflow_Temperature_Cel 0x0022   068   057   045    Old_age   Always       -       32 (Min/Max 30/33)
      191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       34
      192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1
      193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       445142
      194 Temperature_Celsius     0x0022   032   043   000    Old_age   Always       -       32 (0 15 0 0 0)
      195 Hardware_ECC_Recovered  0x001a   058   051   000    Old_age   Always       -       190793353
      197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
      198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
      199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
      254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0
      

      The "Reallocated sector count" is WAY higher than what I'm willing to deal with, and the "Command Timeout" is INSANELY high (I almost wonder if this is some kind of misreporting)...

      The drive is a SEAGATE Momentus (Yeah I know..)

      === START OF INFORMATION SECTION ===
      Model Family:     Seagate Momentus 5400.6
      Device Model:     ST9160314AS
      Serial Number:    5VCLVMT1
      LU WWN Device Id: 5 000c50 02e9f697a
      Firmware Version: 0001SDM1
      User Capacity:    160,041,885,696 bytes [160 GB]
      Sector Size:      512 bytes logical/physical
      Rotation Rate:    5400 rpm
      Device is:        In smartctl database [for details use: -P show]
      ATA Version is:   ATA8-ACS T13/1699-D revision 4
      SATA Version is:  SATA 2.6, 3.0 Gb/s
      Local Time is:    Thu Jul 15 08:06:15 2021 EDT
      SMART support is: Available - device has SMART capability.
      SMART support is: Enabled
      

      I'm thinking to replace this drive ASAP but I'd like to avoid rebuilding the entire OS and reinstall / reconfigure everything.

      Some questions:

      1. Would you guys recommend a SSD? If so, I'd personally be willing to spend a bit more and get an enterprise grade SSD.
      2. Is it possible to "clone" the drive to a new one and avoid reinstalling everything? Would pfsense be able to deal with this? The idea here is to only clone the data to a new drive and swap the dying one for a new one without reinstalling everything and have lot of downtime.
      3. With such BAD SMART values, why is pfense not issuing notifications (via email or in the gui)? Seems like some sort of cron job looking for the most relevant SMART attributes and warn the admin the drive is "dying" would be nice. Its both a question and a suggestion for the pfsense devs.

      I made a similar suggestion a few years ago (notify via email) if the hardware temps are going > threshold but this was never implemented.

      Thanks to all!

      KOMK fireodoF 2 Replies Last reply Reply Quote 0
      • KOMK
        KOM @pftdm007
        last edited by

        @pftdm007 If you have a backup of your config you can reinstall and restore in under 10 minutes.

        1. SSD is better from a power use perspective
        2. Any disk clone software can do that. Acronis TrueImage, Clonezilla etc
        3. No idea
        1 Reply Last reply Reply Quote 1
        • fireodoF
          fireodo @pftdm007
          last edited by

          @pftdm007 said in Dying hard drive, replacing with SSD? How to perform this as quickly as possible:

          Would you guys recommend a SSD?

          I agree with @KOM but I also had a sudden SSD death without any S.M.A.R.T warnings! (Transcend 32GB mSATA) Maybe I had bad luck, I don't know ...

          Just my 2 cents ...

          Regards,
          fireodo

          Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
          SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
          pfsense 2.8.0 CE
          Packages: Apcupsd, Cron, Iftop, Iperf, LCDproc, Nmap, pfBlockerNG, RRD_Summary, Shellcmd, Snort, Speedtest, System_Patches.

          1 Reply Last reply Reply Quote 0
          • AndyRHA
            AndyRH
            last edited by

            Any SSD with trim (they all may do this) should be fine, although I would use a known brand. pfSense does not write enough to wear out an SSD in a reasonable amount of time.
            IMO a fresh install with a restore is safer, failing drives are tricky things.

            o||||o
            7100-1u

            GertjanG 1 Reply Last reply Reply Quote 0
            • GertjanG
              Gertjan @AndyRH
              last edited by

              Several options / ideas..

              First solution : why bother ? This excellent tool makes a backup of your pfSense config.
              The "install USB"is mall, can be downloaded fast, you'll be back on line 10 minutes after you start re installing.

              Next : Is your pfSense essential ? Use a new drive every 3,4 years, and after that period, use the disk on a less essential place.
              Related : Use an UPS, and all risks are divided by a positive number N, where N is bigger then 1.
              Keep a spare drive on the shelves.

              Next : You have a "server" some where running on the Internet (for your own sites, mails, games, private DDOS attacks and such) Use a data collector tool like Munin - see here - and as soon as one of the values reaches a critical point, you get a mail.
              Btw : I never received a mail from Munin, the drive was always fine now, and dead 10 minutes later, taking pfSense with it (so - see first point). My Munin example is from my dedicated server, it uses a "Raid 1" using two identical drives. For such a setup, smartctrl has more sense. If one drive fails, the system will continue tu run on a single drive. I will have some time preparing the swap and re sync.

              Next : Using the new ZFS filesystem, with pools, with a Raid 1 or bigger) a manual, monthly Smartctrl will do.

              As you said yourself, a basic cron, some grep and mail isn't that hard.

              /usr/local/sbin/smartctl -H -c -l error -l selftest -l selective -a /dev/ada0
              

              (because my drive's driver name is "ada")
              This will show a boatload of info.
              Just 'grep' the possible bad-ass values, and mail them up to yourself.
              Your mini scripts / cron will be update proof.

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.