Another Netgate with storage failure, 6 in total so far
-
@stephenw10 said in Another Netgate with storage failure, 6 in total so far:
Some time ago there was an alarming thread about SSDs failing that turned out to be almost entirely due to an early controller version in a small/cheap SSD that was popular.
I remember that - I think it was the SandForce SF2000 controller? I had several OCZ drives die because of that controller!
-
I think there was a specific Kingston drive that newegg had for a very good price. It's ware levelling was basically completely broken. #funtimes!
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
Your eMMC is definitely dead, and that can cause lots of strange behavior.
@andrew_cb @stephenw10 Agree that impending drive failure causes weird symptoms prior to the actual death.
I often (maybe 5 or 6 times over a 2 month period) experienced absolute shutdown of an ability to interact with the software (browser, ssh, serial console) until reboot (sometimes hard, by pulling the power from the 4200), yet internet traffic seemed unaffected. No obvious errors or entries in log files.
The eMMC drive failure in my case (confirmed and RMAed by Netgate) was NOT immediate and catastrophic, but (a) the problems I experienced were intermittent (yet occurring more often) over a period of months and thus not really reproducible, and (b) not documented on the forums other than maybe caused by the WAN link dropping.
Never did it occur to me that the eMMC was failing, and nowhere on the forums did what I experienced ever hint at possible eMMC failure. Had @andrew_cb's post I excerpted from been written two months earlier (or had there been some official warning about eMMC failures and what to look for), then I may have reacted quicker and with much less consternation and frustration. As it was, the drive finally died an unceremonious death, and only then did I put two and two together.
Alas, for me, it worked out because I was one of the "lucky" ones whose drive failed with about a month of warranty left.
-
How old was your 6100? 3 years 4 months (22nd September 2021)
What troubleshooting did TAC perform? Not much just asked for diagnostic files, blamed my local network, told me DHCP addresses were being issued when they were not. I've opened about three tickets regarding the lack of forwarding. There was a correlation with a pfblocker update so TAC asked me to increase the php memory limit to 2MB. Its happened again since that change but I wasn't prepared to raise a fourth ticket and be told no fault found.
Did TAC acknowledge that the onboard eMMC storage had failed? No, they never discussed the eMMC.
Did TAC indicate any knowledge of past or recent discussions about onboard storage failure? No, eMMC was never raised.
Did TAC express that onboard storage failure is normal and to be expected? No, eMMC was never raised.
Did TAC ask you what packages you were running or otherwise imply that you were at fault for using the device incorrectly? No, eMMC was never raised.
Did TAC provide any suggestions for installing an SSD or which SSD model to purchase? No, eMMC was never raised. -
@toroloco I have a few questions for you:
- How old was your 4200?
- What troubleshooting did TAC perform?
- Did TAC acknowledge that the onboard eMMC storage had failed?
- Did TAC indicate any knowledge of past or recent discussions about onboard storage failure?
- Did TAC express that onboard storage failure is normal and to be expected?
- Did TAC ask you what packages you were running or otherwise imply that you were at fault for using the device incorrectly?
- Did TAC provide any suggestions for installing an SSD or which SSD model to purchase?
-
I have a Netgate 6100 showing this:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x0b eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
Not sure what to make of these values. The EOL value is OK, but Life value is really bad?
-
@ltctech said in Another Netgate with storage failure, 6 in total so far:
[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]
I believe its been said before these are estimates and not definitive however I had the same eMMC figures and have been experiencing issues with forwarding stopping, I've install an Intel Optane drive, just in case.
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
0x0b = The disk has used 100%-110% of its estimated life time
0x01 = Normal - The disk has consumed less than 80% of its reserved blocks -
@ltctech Those values indicate imminent failure of your onboard storage.
From what I have seen, the Pre EOL value always says 0x01, so it seems to be a useless indicator - the onboard eMMC might not support it.I would suggest installing an SSD ASAP before your device stops working, and export a backup of your configuration (Diagnostics > Backup & Restore).
Netgate has chosen not to publish instructions for installing an SSD in the 4100 and 6100.
Make sure you get an M.2 NVMe that has B+M key (2 notches). Be sure it is NVMe as they are hard to find and most B+M key drives are SATA.
This video shows how to disassemble the device.
-
@andrew_cb I'm thankful I stumbled onto this thread. My 6100 is not a MAX model, and it's over 3 years old. Purchased right after they came out.
I'm just going to replace the storage immediately as it's clear it's very probably EOL. I'm comfortable with the process you've outlined, but I'm not clear on exactly which NVMe M.2 2280 SSD to get. Can you please link one from Amazon that would work?
Would this KingSpec work? I saw higher up that this is what someone else bought, although there are several.
-
That drive would not work because it's m.2 SATA. It must be an NVMe drive for 4100 or 6100.
-
@dane_h I used an Intel Optane 16Gb drive https://www.ebay.co.uk/itm/395684843954
-
@punting_packets That appears to be the same as this one on Amaz.
-
@dane_h Yep, I'd agree.
-
These eMMC topics have been here for a while. I’ve had two SG-1100 units fail on eMMC, one of which I fixed using an old SSD in a USB enclosure.
It was enough for me to order a Max model (SG-2100 with 128GB SSD preinstalled by Netgate), just to stay on the safe side of things.
-
@dane_h I ordered and installed this one, up and running. Price about the same.
https://www.amazon.com/dp/B08TTDQ5WH?ref=fed_asin_title&th=1
-