HDD dying although SMART says ok?
-
As from this morning, my systemlog is cluttered with the following:
Sep 7 07:00:48 kernel vnode_pager_putpages: residual I/O 4096 at 24 Sep 7 07:00:48 kernel vnode_pager_putpages: I/O error 5 Sep 7 07:00:48 kernel g_vfs_done():ufsid/5799c06539f8a71c[READ(offset=359244398592, length=32768)]error = 5 Sep 7 07:00:48 kernel (ada0:ata0:0:0:0): Error 5, Retries exhausted Sep 7 07:00:48 kernel (ada0:ata0:0:0:0): RES: 51 40 9a 51 d2 29 29 00 00 00 00 Sep 7 07:00:48 kernel (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) Sep 7 07:00:48 kernel (ada0:ata0:0:0:0): CAM status: ATA Status Error Sep 7 07:00:48 kernel (ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00 Sep 7 07:00:43 kernel (ada0:ata0:0:0:0): Retrying command Sep 7 07:00:43 kernel (ada0:ata0:0:0:0): RES: 51 40 99 51 d2 29 29 00 00 00 00 Sep 7 07:00:43 kernel (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) Sep 7 07:00:43 kernel (ada0:ata0:0:0:0): CAM status: ATA Status Error Sep 7 07:00:43 kernel (ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00 Sep 7 07:00:39 kernel (ada0:ata0:0:0:0): Retrying command Sep 7 07:00:39 kernel (ada0:ata0:0:0:0): RES: 51 40 98 51 d2 29 29 00 00 00 00 Sep 7 07:00:39 kernel (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) Sep 7 07:00:39 kernel (ada0:ata0:0:0:0): CAM status: ATA Status Error Sep 7 07:00:39 kernel (ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00 Sep 7 07:00:35 kernel (ada0:ata0:0:0:0): Retrying command Sep 7 07:00:35 kernel (ada0:ata0:0:0:0): RES: 51 40 98 51 d2 29 29 00 00 00 00 Sep 7 07:00:35 kernel (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) Sep 7 07:00:35 kernel (ada0:ata0:0:0:0): CAM status: ATA Status Error Sep 7 07:00:35 kernel (ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00 Sep 7 07:00:32 kernel (ada0:ata0:0:0:0): Retrying command Sep 7 07:00:32 kernel (ada0:ata0:0:0:0): RES: 51 40 98 51 d2 29 29 00 00 00 00 Sep 7 07:00:32 kernel (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) Sep 7 07:00:32 kernel (ada0:ata0:0:0:0): CAM status: ATA Status Error Sep 7 07:00:32 kernel (ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00 Sep 7 07:00:27 kernel (ada0:ata0:0:0:0): Retrying command Sep 7 07:00:27 kernel (ada0:ata0:0:0:0): RES: 51 40 96 51 d2 29 29 00 00 00 00 Sep 7 07:00:27 kernel (ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) Sep 7 07:00:27 kernel (ada0:ata0:0:0:0): CAM status: ATA Status Error Sep 7 07:00:27 kernel (ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 08 00
This repeats itself every first minute of every hour. Until it states 'Retries exhausted'. This system runs for over 7 years now without any troubles, the HDD is in there for about 3 years now (Seagate Momentus 5400.6).
SMART still says the drive is healthy, but I can't see any other reason for these entries. RAM is ok, cables swapped. Northing else would cause this right?
-
In my experience SMART is hit or miss. I'd trust the syslog messages first. Make a config backup ASAP. Then swap that drive for a known good one (an SSD if you can swing it) do a fresh install, and restore your config.
-
What is the output of```
smartctl -a /dev/ada0 -
Given the errors you're seeing, odds are high that there is actually a problem with the drive.
In all the years I've been dealing with SMART, two things have been evident:
1. SMART is prone to false negatives – Just because SMART says a drive is OK, doesn't mean it is. Especially when it comes to physical defects of various kinds or serious controller problems.
2. If SMART says a drive has a problem, it has a problem.
So you can trust that if SMART finds a problem, it's definitely a problem but if SMART says it's OK, you have more work to do.
Same with software RAM tests like memtest86.