Ad4: TIMEOUT - WRITE_DMA retrying (1 retry left)
After setting up squid using a SATA 300MB/s 2.5" hard disk on an D945GSEJT I seem to be getting these errors which eventually lead to a kernel panic. I have tried switching off AHCI and using IDE mode but the errors continue. One strange thing is that pFsense seems to detect the SATA controller as a SATA 150 even though the intel spec page claims its 300:
atapci1: <intel ich7m="" sata150="" controller="">port 0xf0e0-0xf0e7,0xf0d0-0xf0d3,0xf0c0-0xf0c7,0xf0b0-0xf0b3,0xf0a0-0xf0af mem 0xdff40000-0xdff403ff irq 19 at device 31.2 on pci0</intel>
Appears there is a fix here: but it looks like this means patching the kernel is this even possible with the files shipped with pfsense?
What version of pfSense are you using? You might get better results using a pfSense with a more up to date version of FreeBSD.
I was using 2.0.1 release, but I tested with 2.1 dev and its the same. Apparently the ata subsystem has been updated for FreeBSD 9, but that's a while off for pfsense i think :(
You could try to disable DMA maybe (check the wiki) but usually those sorts of timeouts are more a sign of a disk/controller problem. Drivers are possible, but less likely, especially if it happens on multiple versions. Check Diag > SMART Status and run a report on the drive and see if it shows any errors (post the output here and we can look it over)
Unfortunately I don’t have the disk in my system anymore. I did check the smart logs and run a full smart test, none of the errors matched any time of the write dma errors. The disk was working fine in a laptop & the motherboard was working correctly using windows 2008 with another disk. I may give it another go sometime, I was using it as a squid disk I will update if I do.
Using this disk again. Errors popping up again on a fresh install. Have disabled DMA getting this at boot will see if it continues:
Jul 1 00:25:26 kernel: ad4: TIMEOUT - READ_MUL48 retrying (1 retry left) LBA=430935279
Jul 1 00:25:26 kernel: ad4: TIMEOUT - READ_MUL48 retrying (1 retry left) LBA=430940303smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.3-RELEASE-p3 i386] (local build)
Copyright (C) 2002-11 by Bruce Allen, START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error1 Extended offline Completed without error 00% 799 -
2 Extended offline Completed without error 00% 559 -
3 Extended offline Aborted by host 70% 557 -
4 Short offline Completed without error 00% 552 -
5 Short offline Completed without error 00% 386 -
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.3-RELEASE-p3 i386] (local build)
Copyright (C) 2002-11 by Bruce Allen, START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 040 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 147 147 033 Pre-fail Always - 2
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 849
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 040 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 099 099 000 Old_age Always - 801
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 458
191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 37
193 Load_Cycle_Count 0x0012 095 095 000 Old_age Always - 54937
194 Temperature_Celsius 0x0002 148 148 000 Old_age Always - 37 (Min/Max 11/45)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 9
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 7
223 Load_Retry_Count 0x000a 100 100 000 Old_age Always - 0smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.3-RELEASE-p3 i386] (local build)
Copyright (C) 2002-11 by Bruce Allen, START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 171 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.Error 171 occurred at disk power-on lifetime: 748 hours (31 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.After command completion occurred, registers were:
– -- -- -- -- -- --
84 51 80 7f 24 64 ea Error: ICRC, ABRT at LBA = 0x0a64247f = 174335103Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 00 ff 23 64 ea 00 7d+22:31:00.000 WRITE DMA
ca 00 00 ff 22 64 ea 00 7d+22:31:00.000 WRITE DMA
ca 00 00 ff 21 64 ea 00 7d+22:31:00.000 WRITE DMA
ca 00 00 ff 20 64 ea 00 7d+22:31:00.000 WRITE DMA
ca 00 00 ff 1f 64 ea 00 7d+22:31:00.000 WRITE DMAError 170 occurred at disk power-on lifetime: 744 hours (31 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.After command completion occurred, registers were:
-- -- -- -- -- -- --
84 51 10 0f e6 0c ea Error: ICRC, ABRT 16 sectors at LBA = 0x0a0ce60f = 168617487Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 20 ff e5 0c ea 00 7d+18:14:48.500 READ DMA
ca 00 0c ff 10 63 ea 00 7d+18:14:48.400 WRITE DMA
ca 00 20 bf d6 0c ea 00 7d+18:14:48.400 WRITE DMA
ca 00 20 bf d6 0c ea 00 7d+18:14:48.400 WRITE DMA
ca 00 0c df f8 67 ea 00 7d+18:14:48.400 WRITE DMAError 169 occurred at disk power-on lifetime: 610 hours (25 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.After command completion occurred, registers were:
-- -- -- -- -- -- --
84 51 00 5a e8 0c ea Error: ICRC, ABRT at LBA = 0x0a0ce85a = 168618074Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 04 57 e8 0c ea 00 2d+04:30:10.300 WRITE DMA
ca 00 20 3f 7f 12 ea 00 2d+04:30:10.300 WRITE DMA
ca 00 04 57 e8 0c ea 00 2d+04:30:10.300 WRITE DMA
ca 00 20 3f 7f 12 ea 00 2d+04:30:10.300 WRITE DMA
ca 00 04 57 e8 0c ea 00 2d+04:30:10.300 WRITE DMAError 168 occurred at disk power-on lifetime: 533 hours (22 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.After command completion occurred, registers were:
-- -- -- -- -- -- --
84 51 10 4f d9 0c ea Error: ICRC, ABRT at LBA = 0x0a0cd94f = 168614223Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 20 3f d9 0c ea 00 2d+04:59:56.000 WRITE DMA
ca 00 08 df 92 77 eb 00 2d+04:59:56.000 WRITE DMA
ca 00 20 1f d9 0c ea 00 2d+04:59:56.000 WRITE DMA
ca 00 08 df 92 77 eb 00 2d+04:59:56.000 WRITE DMA
ca 00 20 ff d8 0c ea 00 2d+04:59:56.000 WRITE DMAError 167 occurred at disk power-on lifetime: 532 hours (22 days + 4 hours)
When the command that caused the error occurred, the device was active or idle.After command completion occurred, registers were:
-- -- -- -- -- -- --
84 51 00 3e 9c 66 ea Error: ICRC, ABRT at LBA = 0x0a669c3e = 174496830Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 04 3b 9c 66 ea 00 2d+04:05:37.400 WRITE DMA
ca 00 04 37 9c 66 ea 00 2d+04:05:37.400 WRITE DMA
ca 00 04 33 9c 66 ea 00 2d+04:05:37.400 WRITE DMA
ca 00 14 bf a0 6b ea 00 2d+04:05:37.400 WRITE DMA
ca 00 04 2f 9c 66 ea 00 2d+04:05:37.400 WRITE DMA