Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    PfSense v2 through 2.3 - Hard drive Drops

    Scheduled Pinned Locked Moved Hardware
    9 Posts 3 Posters 1.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • W
      webdawg
      last edited by

      I ran pfSense 1.x on Dell E5520 laptops for years without any issues.  I have tested all hardware.

      Ever since the upgrade to pfSense 2+ (freebsd 8.1?) once a month my hard drive just ejects itself/drops out of the system.  I can never get a good log of the messages because bsd/pfsense does not have anywhere to write them to.

      I have finally gotten a picture of the log at the time when it removes the drive.  This is the main drive in the system and this happens with the bios set to ahci or old ata.  It is a western digital blue drive and tests fine.  I know how hard drives work and I get that timeouts are timeouts and I also know about things like RAID, TLER, SCSI, SATA.

      I get that enterprise and SAS pass errors to the OS while consumer SATA does not.

      I have tried many drives, many different systems of the exact model number

      What I came across today was:  https://wiki.freebsd.org/JeremyChadwick/ATA_issues_and_troubleshooting

      It talks more of a DMA timeout but I am willing to bet we have the same thing going on here.  The drive finds something, takes it time to fix it, and FreeBSD removes the drive because it does not wait any longer for it.

      Looks like there may be options to fix DMA timeouts, but I do not know about my messages (all from the linked wiki.freebsd.org):
      *PATA only: Set hw.ata.ata_dma=0 in /boot/loader.conf. This will disable use of ATA DMA. NOTE: This workaround greatly decreases I/O performance. You have been warned…
      **How slow are we talking about here?  I have also read some things about turning DMA ON instead with nanobsd.

      *Volker Theile of the FreeNAS project informs me that they have solved most of the DMA problems by increasing a hard-coded arbitrary timeout value of 5 (seconds) in the ATA code to 10 or 15, while simultaneously making the timeout value adjustable via sysctl. Volker submit patches to sos@ over a year ago, but never received a response.
      **So some patches that would help me but it looks like no one in the FreeBSD community cares?

      *As of 2008/02/27, Scott Long has offered to help track this problem down. Those who are able to reproduce the problem reliably should get in contact with Scott; serial console access will very likely be mandatory.
      **We are talking over 8 years today someone was trying to work on this.

      My next resort is USB and most likely nanobsd because I do not need a lot anymore for these devices.

      Next though, I am going to get these logs typed out.
      bad_pfsense_bsd.jpg
      bad_pfsense_bsd.jpg_thumb

      1 Reply Last reply Reply Quote 0
      • W
        webdawg
        last edited by

        Typed out the log:

        ada0: - bla - detached
        Device bla went missing before all of the data could be written to it:  expect data loss
        NOP.  ACB: 00 00 00 00 00 00 00 00 00 00 00 00
        CAM status: ATA Status Error
        ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
        RES: d1 04 ff ff ff ff ff ff ff ff ff
        Error 5, Retries exhausted
        NOP.  ACP: 00 00 00 00 00 00 00 00 00 00 00 00
        CAM status: ATA Status Error
        ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
        RES: d1 04 ff ff ff ff ff ff ff ff ff
        Error 5, Retries exhausted
        
        1 Reply Last reply Reply Quote 0
        • H
          heper
          last edited by

          Or if you know that there are issues with hardware_x FOR YEARS, you switch to hardware_y that is known to not have these issues ?
          In a way i understand the itch in the back of the head that says: "whatever it takes, i'll find a fix for this problem'

          the longer i'm in IT, the more i realize that some things are better/easier/more effecient to work around , instead of fixing it.

          no for the somewhat constructive part: have you tried updating the bios to the latest version ?

          1 Reply Last reply Reply Quote 0
          • W
            webdawg
            last edited by

            Can't switch hardware, no one will buy it.

            Bios is latest.

            Edit:

            and I am not going to until I get some facts.

            1 Reply Last reply Reply Quote 0
            • W
              webdawg
              last edited by

              What are you guys talking about?

              It's not the hardware, it is FreeBSD.  I did not post this to rant about FreeBSD though.

              I hate to say it like this but:  How the hell do you guys respond to a question like this with get different hardware?

              I mean, are you kidding me?

              You can say, this software was not designed to work on this hardware, but it is.  FreeBSD was designed to run on an array of different systems.  Laptops, magic boxes, enterprise servers, etc.

              This is common hardware, not even top of the line/new.

              It is a SATA hard drive, something you would find inside almost any computer.

              I suppose, a better answer to my question would be something like:

              "You may need to try a solid state because it will not time out and FreeBSD has known issues proven by something other then the information that op posted that is years and years old coming from FreeBSD 8.0."

              I mean, why do you guys even post anything if you cannot back it by any fact.  How do you know there is not a kernel tun-able or something like that?  You guys are making arbitrary statements, while possibly decent recommendations, plague the internet forums with non answers to millions of forums posts.

              I mean, how many times to I have to read forum posts that go like this:

              OP:  How do I do this/Why is this not working?

              Response:  Why would you ever want to try that, you should do this!

              …

              I mean these are forums, not story books.  I do not need you to contribute any amount with regard to an informal imagination.  There is obviously some situation that you cannot comprehend or refuse to that requires someone to do something.  Either help them out with informative responses or say nothing at all.

              1 Reply Last reply Reply Quote 0
              • W
                webdawg
                last edited by

                It looks like this is what I might be looking for:

                https://www.freebsd.org/cgi/man.cgi?query=ada&sektion=4

                
                     kern.cam.ada.retry_count
                
                	 This variable determines how many times the ada driver	will retry a
                	 READ or WRITE command.	 This does not affect the number of retries
                	 used during probe time	or for the ada driver dump routine.  This
                	 value currently defaults to 4.
                
                      kern.cam.ada.default_timeout
                
                	 This variable determines how long the ada driver will wait before
                	 timing	out an outstanding command.  The units for this	value are sec-
                	 onds, and the default is currently 30 seconds.
                
                
                1 Reply Last reply Reply Quote 0
                • W
                  webdawg
                  last edited by

                  So I was going to go this route:

                  sysctl kern.cam.ada.default_timeout=60
                  sysctl kern.cam.ada.retry_count=20
                  

                  I ended up finding this:  https://forums.freenas.org/index.php?threads/hacking-wd-greens-and-reds-with-wdidle3-exe.18171/

                  I guess I never understood how these consumer wd drives auto park.  I really wonder how the other brands of hard drives handle this.

                  I mean, I guess I want to 'save' power but the WD blue that I am working with (2.5 inch laptop drive) was set to park every 4 seconds.

                  Ended up getting the recommended wdidle3 from:  http://support.wdc.com/downloads.aspx?p=113

                  I disabled the auto park with:

                  wdidle3.exe /D

                  It takes 3 weeks to a month for the 'random' error to happen so I will report.  My next report should be success or fail and then I will do the sys tunables with sysctl and then report again.

                  1 Reply Last reply Reply Quote 0
                  • E
                    edwardwong
                    last edited by

                    @webdawg:

                    So I was going to go this route:

                    sysctl kern.cam.ada.default_timeout=60
                    sysctl kern.cam.ada.retry_count=20
                    

                    I ended up finding this:  https://forums.freenas.org/index.php?threads/hacking-wd-greens-and-reds-with-wdidle3-exe.18171/

                    I guess I never understood how these consumer wd drives auto park.  I really wonder how the other brands of hard drives handle this.

                    I mean, I guess I want to 'save' power but the WD blue that I am working with (2.5 inch laptop drive) was set to park every 4 seconds.

                    Ended up getting the recommended wdidle3 from:  http://support.wdc.com/downloads.aspx?p=113

                    I disabled the auto park with:

                    wdidle3.exe /D

                    It takes 3 weeks to a month for the 'random' error to happen so I will report.  My next report should be success or fail and then I will do the sys tunables with sysctl and then report again.

                    This is not BSD specific issue, in forums talking about storage/NAS there were more discussion about this (since this will kick the disk out form a RAID group

                    Disable parking is the only way (which you already did), but an enterprise level HDD should really be employed (or using SSD) for long term use.

                    1 Reply Last reply Reply Quote 0
                    • W
                      webdawg
                      last edited by

                      @edwardwong:

                      This is not BSD specific issue, in forums talking about storage/NAS there were more discussion about this (since this will kick the disk out form a RAID group

                      Disable parking is the only way (which you already did), but an enterprise level HDD should really be employed (or using SSD) for long term use.

                      I just wonder if those system tuneable will help, right now I have disabled parking and we will see what happens next, it looks like Linux has some different default settings.  If these two things do not work, I am going to throw in an  SSD.

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.