SATA Errors every few days?



  • I don't know if this is hardware or software so I'm posting it here.

    Every few days my pf box dies - a quick power cycle and its up again for a few more days.
    It's an Intel Atom D525, 2GB memory, 4x Intel Gigabit interfaces, and a 32GB Intel SSD.

    When it crashes, the disk activity light stays on solid and this is on the screen.
    Any ideas?




  • My guess is that you have one or more bad spots on the disk.



  • Time to toss the disk and install a good one.


  • Netgate Administrator

    Out of interest, how old is that disk?

    Steve



  • It's about 3 months old. Ive run Intel's sad utility and it checks out. Also the smart status says its fine. I've never seen an SSD have a "bad spot". I'm wondering if it's a controller issue.



  • Another possibility occurred to me. I understand many SSDs perform some sort of wear levelling. - they reorganise the mapping of disk logical blocks onto memory "pages" with the aim of making the number of writes to a page even across all the memory pages.

    The disk reported a timeout. Maybe the disk is "busy" for a "long" time when it is wear levelling. I think I read of some changes in the Linux ATA driver to accommodate this behaviour. I don't know of any such changes in FreeBSD. It could be worth looking through the FreeBSD problem reports to see if anyone else has reported similar behaviour.



  • I had a bunch of SSD's do that at my work when we had a storm surge, out of the 400 12 went bad from a power outage while they were all on.


  • Rebel Alliance Developer Netgate

    Given that the LBA's mentioned in the error are all close to one another, I'd also be inclined to say the disk is about to bite it.

    When in doubt, swap it out.

    If it's an SSD there may be something you can do (full disk read, firmware update, etc) to nudge it back to being usable. If it's a spinning disk sometimes a full-disk read will help but generally by the time you're seeing errors it's too late.

    If you're on 2.0 check out Diagnostics > SMART Status.


Locked