I believe PFsense is killing my drives..



  • Of course I have way to emperically prove this. But, I am now on my 4th drive. None of them have been brand new, although I did use one that was only 2 yrs old.

    Get a lot of errors, like those mentioned here
    http://forum.pfsense.org/index.php/topic,43651.0.html

    The first couple drives, being older, I wondered about but chalked it up to maybe it was thier time. I have seen many hdds fail, new and old. But, when the 3rd one went, the newer one, and then the latest, I am seriously beginning to wonder.

    The odd part of it all is that once the drive fails (that is, pfsense no longer stays running becuase of the drive), I can put it in any other machine, and it sees it, but will not format it. I used dozens of tools, from wipes/killdisks to full blown drive suites, both off linux livecds, different windows methods and UBCD, and more. Even spinrite and mhdd refuse to do anything to the drives. All of them, not just one.

    My experience with drives has been that when a drive develops bad spots, software can detect this. Some software is supposed to remap it, some just verify where the bad sectors are, etc. But in this case, all four drives refuse to allow any tool to even scan the surface.

    Maybe I am off the mark here, but in all my years working on computers and having to RMA many drives, or had many drives die, this is more than just coincidence.

    Anyway, I have seen a few other people asking the same type of question "does pfsense harm my drive". Hard to say, but to me the evidence is pointing to yes. That being said, perhaps it is the machine. The machine was used for about 4years in an business, with the same drive the whole time. While it could be, I have never seen this type of thing, ever.

    Now, I bought a CF card, 4gb, and an adapter for IDE. I "think" I am reading correctly that by using an embedded version (vga v123) it basically holds the install, loads it into a ramdisk, and minimizes writes to the CF, due to a very low read/write lifespan on CF. But I also "think" I read correctly that some packages can work with embedded versions. The spreadsheet located here
    https://docs.google.com/spreadsheet/ccc?key=0AojFUXcbH0ROdFdHTTlKNWNSMG5rRjQwZE1fYVgySGc&hl=en#gid=0
    seems to indicate that embedded v123 will allow the use of squid and squid-guard. However, a post here
    http://forum.pfsense.org/index.php?topic=16463.0;prev_next=next
    seems to indicate that it was disabled perhaps.

    So what is the skinny on squid and squid guard in the embedded versions? I realize that there should be no logging, which is fine, I just want to use it to filter sites for my kids using shallas blacklist.

    Finally, could the topic of this thread
    http://forum.pfsense.org/index.php/topic,26626.0.html
    have anything to do with things, as I have only used 3.5" hdds, not laptop drives.

    Sorry such a long post btw.



  • iirc people were talking about SSD and the "intellipark" Load Cycle Count issue with certain drives (search these forums for "ataidle").

    In your case, it's impossible to tell without knowing your exact pfsense configuration and without running some low-level diagnostics on your failed HDDs, but it could be that Squid is indeed shortening your HDD's lifetime. However, if it's all happening on the same computer, then maybe it's a power-supply issue.



  • @dhatz:

    iirc people were talking about SSD and the "intellipark" Load Cycle Count issue with certain drives (search these forums for "ataidle").

    In your case, it's impossible to tell without knowing your exact pfsense configuration and without running some low-level diagnostics on your failed HDDs, but it could be that Squid is indeed shortening your HDD's lifetime. However, if it's all happening on the same computer, then maybe it's a power-supply issue.

    I havent looked at ataidle yet.

    As for psu related, well, I simply don't have the tools nor knowledge to figure that one out. It is a dell machine, so swapping psu is out as they are proprietary pinouts.

    Which leaves me with but one option, to install it on a different machine, with a different hdd and wait the month or two it takes for the problem to develop.

    As for configuration and how that effects things, it simply should make no difference. I have a normal install of pfsense v123, with packages for squid and squidguard with shallas blacklist. There should be nothing of note to make it any different, as the logging configuration for squid is pretty much all at default, as I don't care about the logs, only that it filters addresses for certain computers.

    I exported the config and installed pfsense to a vmware box, so that I have all of the settings available to look at for future reference.

    Thanks for replying.



  • None of the things people claimed killed drives would apply to a normal 3.5" drive (aside from the fact what people claimed on those mostly isn't reputable), it's just not possible for software to kill those drives. Can't say it surprises me you've been through a few, it's not typical, but older drives generally aren't very reliable.

    Most of our servers are 4-7 years old and we lose about 15% of our drives annually across them. In a Windows file server I've had for about 8 years, every drive in the array has been replaced at least once, and most of them 2 or 3 times. I believe VMware ESX and Windows Server must be killing our hard drives, OMG!!!11! That's even with much higher quality server grade SCSI and SAS drives. Old hard drives die, software doesn't kill them.

    I have seen some hardware that will eat drives like crazy (a couple Dell Windows PCs and servers) for whatever reason, couldn't keep a drive alive in them for more than 2-3 months at most. Just trashed that hardware, wasn't worth the effort to figure out why on out of warranty machines, and they were chewing up more in drives and effort than they were worth. That was Windows, not FreeBSD, but it's not related to software regardless. I suspect you may be seeing a similar case. The only alternative is you have really bad luck with drives.


  • Rebel Alliance Global Moderator

    Just to throw in other end of story, you say you have lost 3 drives, now on 4th

    I am on same drive - that should of really died years ago ;)

    Its a old
    === START OF INFORMATION SECTION ===
    Device Model:     ST36424A
    Serial Number:    7CN05SH2
    Firmware Version: 3.10
    User Capacity:    6,448,619,520 bytes [[b]6.44 GB]

    Yeah its OLD, still ticking – its been in my pfsense box for years.  Which is an old p3 800mhz - runs great!  I juts did a short smart test on the drive and shows lifetime hours of 51621, so its been on for going on 6 years..

    If pfsense is killing drives, its sure taking its sweet ass time doing it - atleast in my case ;)



  • @johnpoz:

    I am on same drive - that should of really died years ago ;)

    Its a old
    === START OF INFORMATION SECTION ===
    Device Model:     ST36424A
    Serial Number:    7CN05SH2
    Firmware Version: 3.10
    User Capacity:    6,448,619,520 bytes [[b]6.44 GB]

    Yeah its OLD, still ticking – its been in my pfsense box for years.  Which is an old p3 800mhz - runs great!  I juts did a short smart test on the drive and shows lifetime hours of 51621, so its been on for going on 6 years..

    If pfsense is killing drives, its sure taking its sweet ass time doing it - atleast in my case ;)

    Glad you mentioned that, I actually have you beat by a long shot! :) Didn't think about it when I originally replied. My primary firewall in my home office has been running on this drive (actually drive swapped to various hardware platforms over the years as I've upgraded and experimented with various hardware) since the inception of the project in 2004.

    === START OF INFORMATION SECTION ===
    Model Family:    IBM Travelstar 4GN
    Device Model:    IBM-DKLA-24320
    User Capacity:    4,327,464,960 bytes [4.32 GB]

    It came in a new laptop purchased in 1999. Going on 13 years old. Used in the laptop for about 4 years, sat on a shelf for a year, been in my firewalls for the last 8 years. Never thrown a single read or write error, and SMART checks out healthy, though it has "old age" warnings across a number of the values. The hours value isn't right, not sure why, but only shows 3.3 years of run time when it's been on 24/7/365 for much longer than that.

    Another older one I'm running in my secondary firewall.

    === START OF INFORMATION SECTION ===
    Model Family:    Seagate U Series 5
    Device Model:    ST320413A
    User Capacity:    20,020,396,032 bytes [20.0 GB]

    That was an old, but lightly used drive when I put it in that box a couple years ago. Its run time is around 3.5 years which is about right I'd guess, near 2 years of that in the firewall, not very long.


  • Rebel Alliance Global Moderator

    yeah I am not sure on the run hours as well on mine..  I think its been in that box a lot longer, but not sure exactly when I made that box my router??  And it too sat on the shelf for a while I do believe – I know it use to run ipcop back in the day, but once I went pfsense - never looked back ;)

    But clearly pfsense is not killing it ;)

    edit: Kind of wish the box would die, I would like to use some quiet little low power box with much more umph, etc.  But really just can not justify going to a low power box just for power savings.  Since prob cost a good $200 to get what I want, and this box only uses 50w are so.  Had killawatt on it for a few months.  And it just not really use all that much power.  Even switching to say a 5w box would take years to pay back $200 -- so just going to let this thing run til it dies.  Or it can no longer handle my internet traffic/firewall needs.



  • I do realize that drives die, and that enterprise grade drives have a higher mean hour lifespan.

    I could be unlucky I suppose. I have had to replace many drives, but my experience in the last 18yrs of computing has been that drives usually give a warning notice. Granted, I don't routinely check smart nor run software that checks them. In windows boxes, they start having issues, then eventually you figure out they are dying. Once in awhile they get a bad had that clicks once and it is done, dead drive.

    I realize software is not probable to kill a drive. I wasn't really 100% serious that PFsense killed it because it was a drive killer ;) More like, why is it that these drives, pulled from windows boxes which had no issues, don't seem to last very long on this pfsense box?

    As you say, it could just be the hardware. I am reluctant to even attempt to use a CF drive on that machine. While I don't want to waste any more drives on it, I am just curious enough to really want to know why.

    I suppose at this point my only option is to put a different machine in place, and the pay close attention to it for awhile.

    I did not think that ataidle issue would apply to standard 3.5" hdds, but one never knows, which is why I brought it up.

    I would not be suprised though if the software could kill a drive. What I mean by that is that if a drive is older or just has a flaw to begin with, it could well be possible that a software could read/write very heavily, thus stressing the drive more than in a "typical" setting. I have no idea if pfsense with squid and squid guard would fit that description or not. It isn't that I am rallying against the ideal that software doesn't kill drives, rather I wonder if the nature of what the software is doing could aggravate conditions already present. Since I am not a unix geek, I really don't know the answer to that question. In the windows world, I have investigated applications that do a lot of read/writes both to disc as well as places like the registry, and stopped using them for various reasons. But like I said, in unix I simply don't know enough to make such calls myself.

    So, granted that I have older drives and even a newer drive can go bad, and granted that the hardware itself might be the culprit, is there any consensus on how pfsense with squid and squid guard might increase this load dis-proportionately?

    Thanks for the replies BTW.


  • Rebel Alliance Developer Netgate

    I have lost several HDDs over the years, but nearly all of them laptop HDDs that failed due to load cycle counts. Though the same issue applied to pfSense, FreeBSD, Ubuntu, etc.

    I made some changes recently in 2.1 so ataidle will run on every boot, since my original analysis was incomplete (the value is sticky across power cycles, reboots without a power cycle will keep the APM value) so now it runs every boot to be safe.

    I've lost my share of IDE and SCSI drives over the years as well, mostly due to old age or random chance.

    I still have a bunch of these in service that are at least 10 years old, probably more.

    da0 at ahc1 bus 0 scbus1 target 1 lun 0
    da0: <quantum atlas10k2-ty092j="" ddd6="">Fixed Direct Access SCSI-3 device 
    da0: 40.000MB/s transfers (20.000MHz, offset 127, 16bit)
    da0: Command Queueing enabled
    da0: 8759MB (17938985 512 byte sectors: 255H 63S/T 1116C)</quantum> 
    

    RAID helps, either in hardware or gmirror with two IDE drives.



  • I cannot verify whether the box alone killed the drives or not. I put another drive in the box, installed XP, let it run for awhile. No issues. I then installed pfsense on the same box/drive, and within a few weeks, drive was getting same errors, and is no longer usable. Like the other drives, it is no longer seen properly by any program/utility. Is it due to pfsense doing more read/writes than XP just sitting there? I turned off logging and such to see, but the drive still died.

    Since starting this thread, I pulled out an old 600mhz celeron machine with 256mb ram and an ata66 7.4gb samsung drive. It has been running without issue since. I have no issues with this machine for generic purposes, however it is a bit slow for much squid filtering/logging. When I find a newer machine that is super quiet like the current one, I will try ver. 2.1 and see what is going on.



  • And I thought the 13.6GB HD in my pfSense box was small. :)

    I swapped it out of a PC I bought back in '98 into a Dell with a 2.66GHz P4 with 2GB DDR RAM running the current version of pfSense with the pfBlocker package installed.

    It passed a SMART check I ran on it when I first installed pfSense and still seems to be doing fine after a few months.



  • I've had a few Dell desktops that "eat" drives from time to time, sometimes after just a few months.

    Modern Dell machines (some time towards the end of the PIII era) have a standard ATX pinout, so the power supply can be electrically replaced.  Now, I say electrically since it may not physically fit well due to plug position or some other oddity that Dell physically designed around their chosen power supply, that also includes cable length.

    (Of course, I also have another white box machine that started to eat drives after a few years, so it's not just a Dell thing, I just have a lot of Dells sitting around.)



  • My BIL gave me the Dell Dimension 2400 I'm using for my pfSense box and 4600 I use for my everyday computer running FreeBSD 9.0. Both were originally running XP but he had taken the HD out of each and had them sitting in his basement.

    I had taken the 13.6GB HD out of a Gateway PC and replaced it with an 80GB Seagate around 2000 and used them both for the Dell's. I don't want to jinx myself, but neither have given me any problems over the past 3 months I've been using them since.


Locked