Another Netgate with storage failure, 6 in total so far
-
@toroloco I have a few questions for you:
- How old was your 4200?
- What troubleshooting did TAC perform?
- Did TAC acknowledge that the onboard eMMC storage had failed?
- Did TAC indicate any knowledge of past or recent discussions about onboard storage failure?
- Did TAC express that onboard storage failure is normal and to be expected?
- Did TAC ask you what packages you were running or otherwise imply that you were at fault for using the device incorrectly?
- Did TAC provide any suggestions for installing an SSD or which SSD model to purchase?
-
I have a Netgate 6100 showing this:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x0b eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
Not sure what to make of these values. The EOL value is OK, but Life value is really bad?
-
@ltctech said in Another Netgate with storage failure, 6 in total so far:
[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]
I believe its been said before these are estimates and not definitive however I had the same eMMC figures and have been experiencing issues with forwarding stopping, I've install an Intel Optane drive, just in case.
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
0x0b = The disk has used 100%-110% of its estimated life time
0x01 = Normal - The disk has consumed less than 80% of its reserved blocks -
@ltctech Those values indicate imminent failure of your onboard storage.
From what I have seen, the Pre EOL value always says 0x01, so it seems to be a useless indicator - the onboard eMMC might not support it.I would suggest installing an SSD ASAP before your device stops working, and export a backup of your configuration (Diagnostics > Backup & Restore).
Netgate has chosen not to publish instructions for installing an SSD in the 4100 and 6100.
Make sure you get an M.2 NVMe that has B+M key (2 notches). Be sure it is NVMe as they are hard to find and most B+M key drives are SATA.
This video shows how to disassemble the device.
-
@andrew_cb I'm thankful I stumbled onto this thread. My 6100 is not a MAX model, and it's over 3 years old. Purchased right after they came out.
I'm just going to replace the storage immediately as it's clear it's very probably EOL. I'm comfortable with the process you've outlined, but I'm not clear on exactly which NVMe M.2 2280 SSD to get. Can you please link one from Amazon that would work?
Would this KingSpec work? I saw higher up that this is what someone else bought, although there are several.
-
That drive would not work because it's m.2 SATA. It must be an NVMe drive for 4100 or 6100.
-
@dane_h I used an Intel Optane 16Gb drive https://www.ebay.co.uk/itm/395684843954
-
@punting_packets That appears to be the same as this one on Amaz.
-
@dane_h Yep, I'd agree.
-
These eMMC topics have been here for a while. I’ve had two SG-1100 units fail on eMMC, one of which I fixed using an old SSD in a USB enclosure.
It was enough for me to order a Max model (SG-2100 with 128GB SSD preinstalled by Netgate), just to stay on the safe side of things.
-
@dane_h I ordered and installed this one, up and running. Price about the same.
https://www.amazon.com/dp/B08TTDQ5WH?ref=fed_asin_title&th=1
-
-
@dstaylor FWIW, I use the same one.
-
@andrew_cb
I am beyond frustrated with Netgate. The whole point of buying Netgate as opposed to using cheap Mini PCs and installing pfSense was to avoid these kind of surprises.This particular unit is installed at an office with no IT staff on the other side of the world. We might have to send them a new unit and getting it swapped out may be a challenge.
We'll have to audit the other units that we have deployed (thankfully stateside) and see which ones are eMMC and which are SSD.
-
@andrew_cb and @stephenw10 Some questions:
#1 Is every eMMC equipped netgate prone to be affected? Or are there just a limited number of occurrences? #2 Does the eMMC production series have any influence or is it simply more writes = issue?
A good friend of mine is running a 4100 base. He believes he’s fine regarding the eMMC issue because he doesn’t do much logging. I don’t believe he even checks his eMMC health periodically, he’s not concerned about it. -
@Cabledude said in Another Netgate with storage failure, 6 in total so far:
doesn’t do much logging
It's very relative hence the (my) list of mitigating settings above. The default deny rules log. Is pfSense behind an ISP router that blocks incoming? Suricata logs HTTP requests. Some people leave the dashboard open which logs every web request for each widget update. pfBlocker DNSBL logs DNS requests, and a few feeds like UT1 are gigantic.
My 2100 at home is from October 2020 and it shows 10% used:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01eMMC (as a technology) has less "disk writes per day" than SSD. It is also usually much smaller. So writing (completely making up a number here) 5 GB per day has way more impact on an 8 GB eMMC than a 128 GB SSD. Which, overall, is the point of this thread.
-
Yup, the larger the drive the less write each individual 'bit' sees for a fixed total drive writes. So larger drives are less affected.
-
Wildly speculating around the 4100, it appears enough damage to the eMMC can brick the boxes too!
One of mine won't post now precluding my ability to install NVMe at all. The leds on the board indicate activity on one flash drive after the reset indicator flickers without any console output or getting past the orange circle of death. This even after pulling the cmos battery, NVMe etc. -
Of the confirmed eMMC failures we've seen most do not fail like that. In fact I don't think we've seen a single failure that presented like that in person. There was one user here on the forum who reported removing the eMMC chip and that that allowed it to bot from NMVe. So far unconfirmed though. So it could be some other failure.
-
@stephenw10 Figures I'd be a unicorn. To be fair, I suspect once a unit is deemed a brick, they probably seldom make it back to your bench from customers unless they're in the short window.