Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SMART did not report failing drive (Worthless feature Needs Fixed)

    Scheduled Pinned Locked Moved General pfSense Questions
    9 Posts 4 Posters 2.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • V
      Visseroth
      last edited by

      So I just posted in the WebGUI portion of the forum because I was getting a weird error when I tried to edit rules only to find that the error was caused by corrupt data of a failing drive which the firewall NEVER reported to me.
      I have mail setup, SMART working, it should have mailed me or at least, at the very LEAST reported "SMART WARNING" in the dashboard of the GUI but there was NOTHING!!!

      Yea, I'm a bit aggravated, because instead of this being a "OH CRAP, I HAVE TO SWAP IT NOW" this could have been a, "eh, I'll do it this evening".

      So, SMART is not reporting correctly, how can I fix it or do I need to post a bug?

      Attached is a screen shot of the SMART from the drive I pulled.
      Failing.JPG
      Failing.JPG_thumb

      1 Reply Last reply Reply Quote 0
      • K
        kpa
        last edited by

        SMART has never been a reliable indicator that nothing is wrong. It only works as an indicator that something might be wrong if the reported values deviate from the set thresholds. There are plenty of electrical and mechanical faults that never manifest themselves in the SMART values before they actually happen.

        One example is when the controller board of the drive starts to fail electrically in a manner that affects the transfers between the drive and the system, you might see lots of errors relating to the device in your system log but still the SMART values won't show any problems because SMART is not actually designed to monitor the system<->drive interface.

        1 Reply Last reply Reply Quote 0
        • GertjanG
          Gertjan
          last edited by

          @kpa: true.

          Also a fact is that 'smartd' and "smartmontools', are both included with pfSense. But : they aren't well integrated.
          Note : I'm using 2.3.4, I saw development is in progress for 2.4 concerning 'SMART'.

          I never really used up the SMART functionalities of pfSense. The huge avantage of pfSense is that everything is contained in ONE file : config.xml - I'm a fan of backing up info that I care about, so, the drive will do what devices do : they die Friday at 5h00 PM, no matter what, and they will warn you at 4h59 PM at best.

          This is what I found :
          https://github.com/pfsense/pfsense/blob/RELENG_2_3_4/src/usr/local/www/diag_smart.php#L78
          That file doesn't exist. It's "/usr/local/etc/rc.d/smartd" without the dot sh. So : the "smartd" daemon is not started if it was asked to start …

          See https://github.com/pfsense/pfsense/blob/RELENG_2_3_4/src/usr/local/www/diag_smart.php#L258
          Saw the "FIXME" ?
          Someone already figured out something isn't ok there.
          The daemon $smartd ( /usr/local/sbin/smartd ) cannot be executed like that.
          The "-M test -m mail@you.tld" or "-M test" (yep, must be non-capitals, not "-M TEST") must be set in /usr/local/etc/smartd.conf FIRST.

          By default, 'smartd' presumes the presence of a mail server or at least a 'mail' command (from mailutils) but, no, these do not exist on pfSense.
          Some re-scripting is needed, already present btw : /usr/local/etc/smartd_warning.sh - to use the mail (Notification) facilities built into pfSense.

          This Menu : Diagnostics => S.M.A.R.T. Status => Config (I didn't even knew it existed) seem totally not needed to me. The pfSense - SMART implementation should use the Notification settings  already operational in pfSense. One thing is sure : it doesn't work, and the procedure of testing it isn't functioning at all (as stated above).

          Also : the widget just calls "smartctl /dev/ada0 -H" (/dev/ada0 is my drive device right now) : smartctl is just requesting the "database" stored in the drive.
          If I understood well enough how this all works : this database (SMART LOG) is filled when "smartd" (the daemon), running in the background  or "smartctl", used by the GUI ( Diagnostics => S.M.A.R.T. Status => Information & Tests ) is asked to do so.
          So, the widget shows the result that you obtained the last time you ran a short or long "self test" ( S.M.A.R.T. Status => Information & Tests : Perform self-tests and select Self-test) by hand.
          Conclusion : pretty useless, the Widget.

          Let's wait for 2.4.0 ;)

          No "help me" PM's please. Use the forum, the community will thank you.
          Edit : and where are the logs ??

          1 Reply Last reply Reply Quote 0
          • V
            Visseroth
            last edited by

            Well ARGah!

            Good to know. Here I was thinking all is well, my dashboard will tell me when something is wrong, but nope, that's not the case.

            I certainly hope it gets fix, but now I know, take a peak at the logs, the dashboard is not reliable!

            You guys are a wealth of knowledge, thanks for the replies, that does explain why a failing drive was never reported.

            1 Reply Last reply Reply Quote 0
            • V
              Visseroth
              last edited by

              I put in a feature request thought I don't know if it'll be heard or not…
              https://forum.pfsense.org/index.php?topic=131141.0

              1 Reply Last reply Reply Quote 0
              • GertjanG
                Gertjan
                last edited by

                @Visseroth:

                I put in a feature request thought I don't know if it'll be heard or not…
                https://forum.pfsense.org/index.php?topic=131141.0

                Well … I had some time this afternoon, and I have "smartd", the daemon now running on boot. Added to that, it will do a short test every day, and a long test every week.
                Making it even better : the "mail" part uses the mail-out settings already present within pfSense. When I instruct "smartd" to "test" the his notification capabilities, I do receive the mail.

                It wasn't really rocket science since smartd and smartmontools are very well documented ( https://www.smartmontools.org/ ) and the FreeBSD implementation is pretty much the same as version I use on a Debian 8 (Jessie) server - where I'm using smartd to check my server-disks.
                I had to change several config files - I wouldn't be able to shrink-rap it all up now as as a 'patch'. Maybe there are system (diskless, or SSD, or whatever)  that don't need it anyway - and would not accept that smartd is running on  their disks. The best solution might to take it all out of pfSense and building a package for it. For those who need it.

                Think about it : a "SMART" solution is build-into MacS or (desktop) Windows ? I guess not ..... not as far as I know.

                Btw : I ditched the "Config" page in the GUI where a mail can be entered and tested because I didn't need it anymore.

                But : all this will probably never be included in 2.3.4 - and is already been taken care of in "2.4.0" (or work in progress). Maybe it will be back-ported to 2.3.x when 2.4 comes out (2.3.x will be the latest 32 bits version of pfSense).

                Right now I advise you to run the short SMART test ones in a while - and check the results after a minute or two -- and stop using the Widget because .... useless.
                You have a new disk, right ?  ;-)

                No "help me" PM's please. Use the forum, the community will thank you.
                Edit : and where are the logs ??

                1 Reply Last reply Reply Quote 0
                • V
                  Visseroth
                  last edited by

                  Good to know. Another option would be if it wasn't ported as a package is to put a enable/disable option in System-> Advanced-> Misc. because you are right, not all SSD's support it. I'd probably set it Disabled by Default to error on the side of caution and leave it up to the user to enable it. A option to schedule tests and email notification on warning or error would be good as well.

                  And yes, I definitely pulled that disk and put a "newer" one in it that didn't have much run time on it. I don't have any brand new "0" hour disks laying around, so one that's still young should do for a year or two at least and I couldn't really afford to wait because with running Squid God only knows what else was being corrupted.

                  Edit: Oh, and I'm actually running 2.3.4, the GUI was just reporting 2.3.3, likely because of the corruption, I don't really know, but it's reporting correctly now.

                  1 Reply Last reply Reply Quote 0
                  • H
                    Harvy66
                    last edited by

                    While I understand that SMART is not very reliable in itself, a tool that claims to report the SMART status, but does not and gives a false negative is a dangerous tool to have. It's better to have no data than bad data.

                    1 Reply Last reply Reply Quote 0
                    • V
                      Visseroth
                      last edited by

                      Agreed.
                      What's the point of implementation if it does not do what it's supposed to do?
                      Implementation of SMART is supposed to report prior to failure.

                      NetGate/PfSense guys, this should be fixed or removed from the GUI. I'd personally like to see it fixed.

                      I'd also like to see some kind of email reporting if a rule had been triggered. Say in this case, if a CAM error had been seen in the system logs, then the system would email.

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.