Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Ad4: TIMEOUT - WRITE_DMA retrying (1 retry left)

    Scheduled Pinned Locked Moved Hardware
    7 Posts 3 Posters 7.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      Roots0
      last edited by

      After setting up squid using a SATA 300MB/s 2.5" hard disk on an D945GSEJT I seem to be getting these errors which eventually lead to a kernel panic. I have tried switching off AHCI and using IDE mode but the errors continue. One strange thing is that pFsense seems to detect the SATA controller as a SATA 150 even though the intel spec page claims its 300:

      atapci1: <intel ich7m="" sata150="" controller="">port 0xf0e0-0xf0e7,0xf0d0-0xf0d3,0xf0c0-0xf0c7,0xf0b0-0xf0b3,0xf0a0-0xf0af mem 0xdff40000-0xdff403ff irq 19 at device 31.2 on pci0</intel>

      Mobile Computer & Network Support Stockport, UK
      www.timotten.co.uk

      1 Reply Last reply Reply Quote 0
      • R
        Roots0
        last edited by

        Appears there is a fix here: http://linux-bsd-sharing.blogspot.co.uk/2009/03/howto-fix-sata-dma-timeout-issues-on.html but it looks like this means patching the kernel is this even possible with the files shipped with pfsense?

        Mobile Computer & Network Support Stockport, UK
        www.timotten.co.uk

        1 Reply Last reply Reply Quote 0
        • W
          wallabybob
          last edited by

          What version of pfSense are you using? You might get better results using a pfSense with a more up to date version of FreeBSD.

          1 Reply Last reply Reply Quote 0
          • R
            Roots0
            last edited by

            I was using 2.0.1 release, but I tested with 2.1 dev and its the same. Apparently the ata subsystem has been updated for FreeBSD 9, but that's a while off for pfsense i think :(

            Mobile Computer & Network Support Stockport, UK
            www.timotten.co.uk

            1 Reply Last reply Reply Quote 0
            • jimpJ
              jimp Rebel Alliance Developer Netgate
              last edited by

              You could try to disable DMA maybe (check the wiki) but usually those sorts of timeouts are more a sign of a disk/controller problem. Drivers are possible, but less likely, especially if it happens on multiple versions. Check Diag > SMART Status and run a report on the drive and see if it shows any errors (post the output here and we can look it over)

              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              1 Reply Last reply Reply Quote 0
              • R
                Roots0
                last edited by

                Unfortunately I don’t have the disk in my system anymore. I did check the smart logs and run a full smart test, none of the errors matched any time of the write dma errors. The disk was working fine in a laptop & the motherboard was working correctly using windows 2008 with another disk. I may give it another go sometime, I was using it as a squid disk I will update if I do.

                Mobile Computer & Network Support Stockport, UK
                www.timotten.co.uk

                1 Reply Last reply Reply Quote 0
                • R
                  Roots0
                  last edited by

                  Using this disk again. Errors popping up again on a fresh install. Have disabled DMA getting this at boot will see if it continues:
                  Jul 1 00:25:26 kernel: ad4: TIMEOUT - READ_MUL48 retrying (1 retry left) LBA=430935279
                  Jul 1 00:25:26 kernel: ad4: TIMEOUT - READ_MUL48 retrying (1 retry left) LBA=430940303

                  smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.3-RELEASE-p3 i386] (local build)
                  Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

                  === START OF READ SMART DATA SECTION ===
                  SMART Self-test log structure revision number 1
                  Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

                  1  Extended offline    Completed without error      00%      799        -

                  2  Extended offline    Completed without error      00%      559        -

                  3  Extended offline    Aborted by host              70%      557        -

                  4  Short offline      Completed without error      00%      552        -

                  5  Short offline      Completed without error      00%      386        -

                  smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.3-RELEASE-p3 i386] (local build)
                  Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

                  === START OF READ SMART DATA SECTION ===
                  SMART Attributes Data Structure revision number: 16
                  Vendor Specific SMART Attributes with Thresholds:
                  ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
                    1 Raw_Read_Error_Rate    0x000b  100  100  062    Pre-fail  Always      -      0
                    2 Throughput_Performance  0x0005  100  100  040    Pre-fail  Offline      -      0
                    3 Spin_Up_Time            0x0007  147  147  033    Pre-fail  Always      -      2
                    4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      849
                    5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
                    7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
                    8 Seek_Time_Performance  0x0005  100  100  040    Pre-fail  Offline      -      0
                    9 Power_On_Hours          0x0012  099  099  000    Old_age  Always      -      801
                  10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
                  12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      458
                  191 G-Sense_Error_Rate      0x000a  100  100  000    Old_age  Always      -      0
                  192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      37
                  193 Load_Cycle_Count        0x0012  095  095  000    Old_age  Always      -      54937
                  194 Temperature_Celsius    0x0002  148  148  000    Old_age  Always      -      37 (Min/Max 11/45)
                  196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      9
                  197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
                  198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
                  199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      7
                  223 Load_Retry_Count        0x000a  100  100  000    Old_age  Always      -      0

                  smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.3-RELEASE-p3 i386] (local build)
                  Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

                  === START OF READ SMART DATA SECTION ===
                  SMART Error Log Version: 1
                  ATA Error Count: 171 (device log contains only the most recent five errors)
                  CR = Command Register [HEX]
                  FR = Features Register [HEX]
                  SC = Sector Count Register [HEX]
                  SN = Sector Number Register [HEX]
                  CL = Cylinder Low Register [HEX]
                  CH = Cylinder High Register [HEX]
                  DH = Device/Head Register [HEX]
                  DC = Device Command Register [HEX]
                  ER = Error register [HEX]
                  ST = Status register [HEX]
                  Powered_Up_Time is measured from power on, and printed as
                  DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
                  SS=sec, and sss=millisec. It "wraps" after 49.710 days.

                  Error 171 occurred at disk power-on lifetime: 748 hours (31 days + 4 hours)
                    When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                    ER ST SC SN CL CH DH
                    – -- -- -- -- -- --
                    84 51 80 7f 24 64 ea  Error: ICRC, ABRT at LBA = 0x0a64247f = 174335103

                  Commands leading to the command that caused the error were:
                    CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
                    -- -- -- -- -- -- -- --  ----------------  --------------------
                    ca 00 00 ff 23 64 ea 00  7d+22:31:00.000  WRITE DMA
                    ca 00 00 ff 22 64 ea 00  7d+22:31:00.000  WRITE DMA
                    ca 00 00 ff 21 64 ea 00  7d+22:31:00.000  WRITE DMA
                    ca 00 00 ff 20 64 ea 00  7d+22:31:00.000  WRITE DMA
                    ca 00 00 ff 1f 64 ea 00  7d+22:31:00.000  WRITE DMA

                  Error 170 occurred at disk power-on lifetime: 744 hours (31 days + 0 hours)
                    When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                    ER ST SC SN CL CH DH
                    -- -- -- -- -- -- --
                    84 51 10 0f e6 0c ea  Error: ICRC, ABRT 16 sectors at LBA = 0x0a0ce60f = 168617487

                  Commands leading to the command that caused the error were:
                    CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
                    -- -- -- -- -- -- -- --  ----------------  --------------------
                    c8 00 20 ff e5 0c ea 00  7d+18:14:48.500  READ DMA
                    ca 00 0c ff 10 63 ea 00  7d+18:14:48.400  WRITE DMA
                    ca 00 20 bf d6 0c ea 00  7d+18:14:48.400  WRITE DMA
                    ca 00 20 bf d6 0c ea 00  7d+18:14:48.400  WRITE DMA
                    ca 00 0c df f8 67 ea 00  7d+18:14:48.400  WRITE DMA

                  Error 169 occurred at disk power-on lifetime: 610 hours (25 days + 10 hours)
                    When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                    ER ST SC SN CL CH DH
                    -- -- -- -- -- -- --
                    84 51 00 5a e8 0c ea  Error: ICRC, ABRT at LBA = 0x0a0ce85a = 168618074

                  Commands leading to the command that caused the error were:
                    CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
                    -- -- -- -- -- -- -- --  ----------------  --------------------
                    ca 00 04 57 e8 0c ea 00  2d+04:30:10.300  WRITE DMA
                    ca 00 20 3f 7f 12 ea 00  2d+04:30:10.300  WRITE DMA
                    ca 00 04 57 e8 0c ea 00  2d+04:30:10.300  WRITE DMA
                    ca 00 20 3f 7f 12 ea 00  2d+04:30:10.300  WRITE DMA
                    ca 00 04 57 e8 0c ea 00  2d+04:30:10.300  WRITE DMA

                  Error 168 occurred at disk power-on lifetime: 533 hours (22 days + 5 hours)
                    When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                    ER ST SC SN CL CH DH
                    -- -- -- -- -- -- --
                    84 51 10 4f d9 0c ea  Error: ICRC, ABRT at LBA = 0x0a0cd94f = 168614223

                  Commands leading to the command that caused the error were:
                    CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
                    -- -- -- -- -- -- -- --  ----------------  --------------------
                    ca 00 20 3f d9 0c ea 00  2d+04:59:56.000  WRITE DMA
                    ca 00 08 df 92 77 eb 00  2d+04:59:56.000  WRITE DMA
                    ca 00 20 1f d9 0c ea 00  2d+04:59:56.000  WRITE DMA
                    ca 00 08 df 92 77 eb 00  2d+04:59:56.000  WRITE DMA
                    ca 00 20 ff d8 0c ea 00  2d+04:59:56.000  WRITE DMA

                  Error 167 occurred at disk power-on lifetime: 532 hours (22 days + 4 hours)
                    When the command that caused the error occurred, the device was active or idle.

                  After command completion occurred, registers were:
                    ER ST SC SN CL CH DH
                    -- -- -- -- -- -- --
                    84 51 00 3e 9c 66 ea  Error: ICRC, ABRT at LBA = 0x0a669c3e = 174496830

                  Commands leading to the command that caused the error were:
                    CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
                    -- -- -- -- -- -- -- --  ----------------  --------------------
                    ca 00 04 3b 9c 66 ea 00  2d+04:05:37.400  WRITE DMA
                    ca 00 04 37 9c 66 ea 00  2d+04:05:37.400  WRITE DMA
                    ca 00 04 33 9c 66 ea 00  2d+04:05:37.400  WRITE DMA
                    ca 00 14 bf a0 6b ea 00  2d+04:05:37.400  WRITE DMA
                    ca 00 04 2f 9c 66 ea 00  2d+04:05:37.400  WRITE DMA

                  Mobile Computer & Network Support Stockport, UK
                  www.timotten.co.uk

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.