Another Netgate with storage failure, 6 in total so far
-
The emmc-utils package is only available in Plus... so users of CE have absolutely no way to monitor their eMMC health. Apparently, monitoring your eMMC health is a special privilege? Maybe a way of discouraging the use of CE?
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
This package is currently only available on pfSense
Plus software and does not have a GUI component. It must be run from an SSH or console shell prompt.
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
The emmc-utils package is only available in Plus... so users of CE have absolutely no way to monitor their eMMC health. Apparently, monitoring your eMMC health is a special privilege? Maybe a way of discouraging the use of CE?
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
This package is currently only available on pfSense
Plus software and does not have a GUI component. It must be run from an SSH or console shell prompt.
Well, in Netgate's defense, I suspect the number of pfSense CE users running on eMMC is miniscule. Most whitebox hardware is most likely going to have either SSD or a spinning disk. I believe eMMC is much more prevalent in the Netgate appliances, and since anyone purchasing a Netgate appliance gets pfSense Plus it's more logical to include the utility there. Maybe I missed it, but I don't recall seeing a single post from a CE user that has experienced failed eMMC. It would be trivial to add the utility to the CE package repo, but I suspect it would not be widely used there.
-
Some more recent threads about storage failure.
Overall, storage failures seem to be the most common on the 4100, possibly it is the most popular model?https://www.reddit.com/r/PFSENSE/comments/1ilhit2/my_netgate_4100_is_defect/
https://www.reddit.com/r/PFSENSE/comments/1ikprzt/4100_disassembly/
https://www.reddit.com/r/PFSENSE/comments/1ie17xz/ideas_for_an_eol_4100/
https://forum.netgate.com/topic/196253/sg-1100-storage-health-questions -
-
Hmm, not sure why the pkg isn't in the CE repo. I guess there wasn't much call for it at the time. Seems like we could add that pretty easily. Let me see....
-
@stephenw10 It would be great if you can get mmc-utils added to the CE repo!
-
@w0w I share your frustration. One minute their Netgate is working, then just dies. Then they try to reinstall pfSense and the installer says no disks were found...
Those are great suggestions on how to spread awareness. This issue has been brought up many times before but it never goes anywhere, so hopefully we can bring about some change and prevent this from happening to others.
-
@bmeeks It's possible that not many are using CE on a whitebox with eMMC, but I have seen threads about it on Reddit. I think Protectli, Firewalla, and Topton also use eMMC in some of their models, but I am not positive. Several models list 16 or 32GB storage, which is often eMMC.
-
I also want to mention the repair options. I'm not sure if it's possible to replace the eMMC chip with a larger one without modifying the BIOS, but I'm almost certain that you can replace it with the same model or a full equivalent.
Of course, this depends on the country and the price charged for the work. Again, whether the technician is truly a professional or just incompetent remains a question... But this option definitely exists.
-
-
A thread from 2022 has resurfaced and it is eerily similar to the discussion happening now in 2025:
- The expected lifetime of 16 and 32GB eMMC storage at various average write rates.
- The increased wear from running popular IDS and IPS packages.
- Request for adding mmc-utils to the base pfSense image (including a Redmine).
- Users already experiencing storage wearout.
- Suggestions to use ramdisks and disable logging of default rules.
- The effects of ZFS vs UFS on storage wear.
- TRIM appears to be disabled.
- Requests/suggestion to include storage considerations on the product pages.
I cannot understand why Netgate did not investigate or take any action on these issues in 2022, 2023, or 2024.
@dugeem checked 3 devices and noted:
eMMC drives generally support TRIM, but in all cases it was disabled.
@jwt said
TRIM (or an equivalent such as DISCARD) are required by JEDEC standards as far back as 2010.
So there seems to be a discrepancy in whether TRIM support is actually enabled and working or not.
Further, the JEDEC eMMC v5.0 standard which enables eMMC health reporting is from 2013 and is supported by many Netgate devices, so it is confusing why it is not supported by the 4200 that was released in 2024.
@Cabledude asked in 2024:
Would the 128GB SSD benefit (have extended life) if RAM disk is used?
@stephenw10 responded:
Yes. But the write cycle life on any recent SSD is likely to outlive the usefulness of the device anyway. So I'd question the value in doing so.
If a 128GB SSD "is likely to outlive the usefulness of the device", then what is the implication for the lifespan of 16GB eMMC storage?
I am not sure what conclusion to draw other than beginning in 2022 Netgate knew or should have known that 16GB of eMMC storage was insufficient for running anything other than the most basic of configurations (and even then, it is necessary to disable most of the default logging and possibly use ramdisks).
@keyser 's words from 2022 seems tragically prophetic:
This is going to become a netgate scandal
I think it officially has now.
-
I've been having some issues with my 6100 locking up and becoming unresponsive, reported the issue to Netgate TAC who didn't provide any useful feedback. Searching reddit for support and I read about the eMMC failures on 6100. Go and check mine;
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x0b
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01Yikes! Got an Intel Optane 16Gb in there now after a lot of pain with the installer not working with my PPPoE internet service.
I have to say this really seems like planned obsolescence on the part of Netgate, why sell a device with hardware which cannot support its operation beyond a couple of years of normal use.
Why doesn't your TAC team identify this as an issue?
Why are you trying to dissuade customers from implementing a fix?
When are you going to compensate customers for the damages? -
@punting_packets said in Another Netgate with storage failure, 6 in total so far:
my 6100 locking up and becoming unresponsive
It's usually pretty obvious if the boot drive fails. Just becoming unresponsive but rebooting back to normal operation is not what I would expect to see. So you may not be seeing a failing driver there even though the estimated ware levels are high.
Drive failures usually throw a lot of drive/controller errors. Even if the logging stops the console will be filled with errors. If you can, checking the console in the hung situation should confirm that. -
@stephenw10 Thanks for the response, the 6100 simply stopped forwarding traffic but the console was still responsive. There was nothing in the logs other than a lot of failed PPPoE sessions and the only way to restore service was a reboot. I might be conflating the ware on the eMMC with other issues, only time will tell :-)
-
@stephenw10 Drive errors are common when the storage is failing, but not always.
I have been troubleshooting a 7100 for the past few days where the internet connection was basically unusable despite being on 1Gb fibre and 300Mbps cable connections. I disabled nearly all of the port forwards and services, but the CPU load was constantly high and the GUI was very sluggish to navigate, even though there was less than 5Mbps of network traffic. The gateway monitors were constantly in warning status with latency over 100ms and 30% packet loss. There were no storage-related errors anywhere. I was still able to make configuration changes, and the unit was able to reboot.
I checked the eMMC health and found it was at 0x0a (100%).I went onsite yesterday and installed a 250GB WD Blue SSD, and the device is working great now.
To be fair, it is a 7100 and 5-6 years old, so the storage failure is less surprising, but the lack of any alerting is the biggest problem.
Before becoming aware of storage wear out, we had other devices that appeared to be working but stopped responding when a config change was made. During troubleshooting we would find that the device was no longer detecting the onboard eMMC storage.
-
Hmm, I'd be surprised if low throughput like that was caused by a drive issue. I assume you had to reinstall to the new SSD, had you tried reinstalling to the eMMC?
But it could have been indirectly caused by high CPU usage that itself was caused by some access issue. Though I would expect to be able to see that in the usage or errors logged.
-
In hindsight, this has been how I have always discovered failed eMMCs as well. A config change or update being the last straw for an underlying condition I had not realized I needed to install separate software and manually check on some interval. Unfortunately mine are alwyas located in a different time zone without qualified local support.
-
@punting_packets Your eMMC is definitely dead, and that can cause lots of strange behavior. The wear values reported are just estimates, and the exact wear is unique for each case.
Based on reading this forum and Reddit, the symptoms of eMMC failure can include:
- Numerous error messages in the system log, dmesg, and the console.
- No errors are reported at all in some cases.
- The device continues to work but has strange behaviour issues (such as locking up when making configuration changes).
- The device suddenly stops responding/locks up.
- Device attempts to boot but fails with mmcsd0 timeout errors.
- The onboard eMMC storage is not detected at all, but the device still powers on (but works when an SSD is installed).
- The onboard eMMC storage fails and the device will not power on at all (some have de-soldering the eMMC chips which allowed the device to power on).
- Most failures seem to occur when devices are 2 to 3 years old, but failures are also being reported in devices less than 1 year old.
@punting_packets I have a few questions for you:
- How old was your 6100?
- What troubleshooting did TAC perform?
- Did TAC acknowledge that the onboard eMMC storage had failed?
- Did TAC indicate any knowledge of past or recent discussions about onboard storage failure?
- Did TAC express that onboard storage failure is normal and to be expected?
- Did TAC ask you what packages you were running or otherwise imply that you were at fault for using the device incorrectly?
- Did TAC provide any suggestions for installing an SSD or which SSD model to purchase?
-
@stephenw10 My focus was on installing the SSD with minimal downtime, so unfortunately I was not able to investigate the eMMC any further.
I wonder if the eMMC was not fully dead but possibly taking extra time to erase blocks in real-time rather than in the background due to a reduced pool of usable blocks, or the drive's internal error handling was causing delays as it dealt with write failures?All I did was install the SSD, wipe the eMMC, reinstall pfSense Plus, and restore the configuration. So whether it needed a new drive or a fresh install of pfSense, neither is a great thing to have happen unexpectedly.
-
@arri If you logged into the GUI and there was a big warning message that the storage device's health was critical, that would certainly get your attention and give you a chance to address the situation before the device dies.
Unfortunately, we are in a similar situation and some of our devices are located far away as well.
How many Netgate devices do you have deployed?
-
@stephenw10 It looks like you are quite active in helping users on this forum. How many threads about devices with failed storage do you think you've replied to over the past few years?
-
@andrew_cb Not enough that changing to a different vendor or platform is daunting. My needs are simple, I just need reliability combined with longevity and the ability to both preconfigure devices or remote configure them through a tunnel.