HDD dying although SMART says ok?

D0X

As from this morning, my systemlog is cluttered with the following:

Sep 7 07:00:48	kernel		vnode_pager_putpages: residual I/O 4096 at 24
Sep 7 07:00:48	kernel		vnode_pager_putpages: I/O error 5
Sep 7 07:00:48	kernel		g_vfs_done():ufsid/5799c06539f8a71c[READ(offset=359244398592, length=32768)]error = 5
Sep 7 07:00:48	kernel		(ada0:ata0:0:0:0): Error 5, Retries exhausted
Sep 7 07:00:48	kernel		(ada0:ata0:0:0:0): RES: 51 40 9a 51 d2 29 29 00 00 00 00
Sep 7 07:00:48	kernel		(ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Sep 7 07:00:48	kernel		(ada0:ata0:0:0:0): CAM status: ATA Status Error
Sep 7 07:00:48	kernel		(ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00
Sep 7 07:00:43	kernel		(ada0:ata0:0:0:0): Retrying command
Sep 7 07:00:43	kernel		(ada0:ata0:0:0:0): RES: 51 40 99 51 d2 29 29 00 00 00 00
Sep 7 07:00:43	kernel		(ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Sep 7 07:00:43	kernel		(ada0:ata0:0:0:0): CAM status: ATA Status Error
Sep 7 07:00:43	kernel		(ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00
Sep 7 07:00:39	kernel		(ada0:ata0:0:0:0): Retrying command
Sep 7 07:00:39	kernel		(ada0:ata0:0:0:0): RES: 51 40 98 51 d2 29 29 00 00 00 00
Sep 7 07:00:39	kernel		(ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Sep 7 07:00:39	kernel		(ada0:ata0:0:0:0): CAM status: ATA Status Error
Sep 7 07:00:39	kernel		(ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00
Sep 7 07:00:35	kernel		(ada0:ata0:0:0:0): Retrying command
Sep 7 07:00:35	kernel		(ada0:ata0:0:0:0): RES: 51 40 98 51 d2 29 29 00 00 00 00
Sep 7 07:00:35	kernel		(ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Sep 7 07:00:35	kernel		(ada0:ata0:0:0:0): CAM status: ATA Status Error
Sep 7 07:00:35	kernel		(ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00
Sep 7 07:00:32	kernel		(ada0:ata0:0:0:0): Retrying command
Sep 7 07:00:32	kernel		(ada0:ata0:0:0:0): RES: 51 40 98 51 d2 29 29 00 00 00 00
Sep 7 07:00:32	kernel		(ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Sep 7 07:00:32	kernel		(ada0:ata0:0:0:0): CAM status: ATA Status Error
Sep 7 07:00:32	kernel		(ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 40 00
Sep 7 07:00:27	kernel		(ada0:ata0:0:0:0): Retrying command
Sep 7 07:00:27	kernel		(ada0:ata0:0:0:0): RES: 51 40 96 51 d2 29 29 00 00 00 00
Sep 7 07:00:27	kernel		(ada0:ata0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Sep 7 07:00:27	kernel		(ada0:ata0:0:0:0): CAM status: ATA Status Error
Sep 7 07:00:27	kernel		(ada0:ata0:0:0:0): READ_DMA48\. ACB: 25 00 8f 51 d2 40 29 00 00 00 08 00

This repeats itself every first minute of every hour. Until it states 'Retries exhausted'. This system runs for over 7 years now without any troubles, the HDD is in there for about 3 years now (Seagate Momentus 5400.6).

SMART still says the drive is healthy, but I can't see any other reason for these entries. RAM is ok, cables swapped. Northing else would cause this right?

whosmatt

In my experience SMART is hit or miss. I'd trust the syslog messages first. Make a config backup ASAP. Then swap that drive for a known good one (an SSD if you can swing it) do a fresh install, and restore your config.

Jailer

What is the output of```
smartctl -a /dev/ada0

jimp

Given the errors you're seeing, odds are high that there is actually a problem with the drive.

In all the years I've been dealing with SMART, two things have been evident:

1. SMART is prone to false negatives – Just because SMART says a drive is OK, doesn't mean it is. Especially when it comes to physical defects of various kinds or serious controller problems.

2. If SMART says a drive has a problem, it has a problem.

So you can trust that if SMART finds a problem, it's definitely a problem but if SMART says it's OK, you have more work to do.

Same with software RAM tests like memtest86.