Another Netgate with storage failure, 6 in total so far
-
@jwt Thanks for jumping in here.
I started writing a response to the points you raised, but I realized that we could get into a deep technical debate that would take us off-track and miss the real issue:
"I don't want to quote Steve Jobs, but... you're holding it wrong." (I thought of this quote too, while writing my previous posts)
So, how am I and so many others using it wrong? According to what criteria, and most importantly, how would we know?
For the sake of argument, let us stipulate that the technical differences between eMMC and SSD storage are not materially significant.
Suppose I am looking for a new firewall and am unfamiliar with Netgate and pfSense. I look through the product pages and decide the 6100 looks like it meets my network throughput and performance requirements. I am not planning to store much data on the firewall, so the 8GB of onboard storage should provide adequate capacity. The system advertises many impressive capabilities, particularly:
LOW TOTAL COST OF OWNERSHIP
- No artificial limits or add-ons required to make your system fully functional.
- This system is designed for a long deployment lifetime.
GROWS WITH YOU
- Add optional packages such as Snort or Suricata for IDS/IPS and network security monitoring.
Firewall, Attack Prevention, & Content Filtering:
- IP/DNS-based filtering and blacklisting [w/pkg]
- IDS/IPS with Snort-based packet analyzer [w/pkg]
- Layer 7 application detection and blocking [w/pkg]
- Reverse proxy [w/pkg]
- Geo/country blocking, IP block lists [w/pkg]
Monitoring & Reporting:
- Customizable dashboard with widgets
- Local monitoring graphs
- Network usage monitoring [w/pkg]
- Network diagnostics [w/pkg]
In this scenario, what would be wrong with using the features advertised on the BASE version?
There are no suggestions, recommendations, warnings, disclaimers, footnotes, or links to further information about storage concerns or scenarios where getting the MAX version (or upgrading the storage) is recommended/required.In a simplified scenario, suppose I do not use any of the packages that are noted as requiring an SSD/HDD: HAProxy, nTopNG, Snort, and Suricata (as is the case for nearly all of our devices). Based on the advertising information, is it reasonable to use all other firewall packages and functionality without concern and to get 5 years of life, even with a BASE version?
From my perspective, the root cause is simple: Incomplete or missing information about what Netgate considers to be the correct way to use pfSense on their hardware. If so many people are using it wrong, then perhaps Netgate needs to consider WHY that is the case and implement better messaging and safeguards to reduce the likelihood of inadvertent misuse.
Whatever the technical reason, if there are limitations, additional considerations, unsupported, or not recommended usage/configurations, those need to be clearly identified on the product page:
- Information/Warning blurbs on the page relating to storage considerations
- Superscripts or links beside the advertised features where there are storage considerations
- Explanation of the differences between BASE and MAX and recommended usage scenarios/packages (a comparison table would be great for this)
An equally critical issue is the lack of built-in health monitoring for eMMC storage (and it is completely unsupported on the 4200 and possibly other models).
Even if someone is using their device wrong and is accelerating the wearout of the storage, this information should be prominently visible. The pfSense GUI gives warning messages about ISC deprecation, certificate expiry, PHP errors, programming crashes, and failure to check for updates, but no information or warnings about eMMC health. There is support for SMART monitoring, but it still needs to be manually added as a widget or manually checked by the user. Yet despite the Redmine request from 3 years ago and many posts of users experiencing storage failure (eMMC or otherwise), it appears that Netgate has done nothing to improve storage health monitoring. -
In response to your points:
When I search for "enterprise-ready" on store.netgate.com, the only two devices that come up are the 8300 Base and 8300 Max. Neither has an eMMC.
The only device in the Netgate catalog that does not have a SSD/NVMe option is the 1100. Every other device (2100, 4200, 6100) has an option for NVMe storage or only comes with NVMe storage (8200, 8300).I said "Enterprise-ready," not as a literal search term/description. Do you prefer "business-ready" instead? If you search for "enterprise," I think you will find it under the heading "Best For" on the 4100, 6100, and 7100. All models are Best For "Managed Service Provider / Managed Security Service Provider (MSP/MSSP) On-Premises Appliance."
What's the problem here, other than at one point, they said "42%" when the figure is 48%.
There have been four posts about storage failure in the last two weeks, and that's just in this subforum. This shows that storage failure is a frequent and ongoing issue. Particularly concerning is the storage failure on 4200, which has only been out for nine months!
I am unaware of any package which is marked as religious rather than secular.
I referred to the package list as "sacred" since in most of the threads about storage failure, someone nearly always replies, "Check the package list and see that package XYZ requires an SSD." The package list and package documentation are used to explain and justify most storage failures. If the package documentation is the source of truth for storage requirements, then it needs to be much more prominent and up-to-date with all packages.
In a word: No. There is no statement about safety on that page. In fact, neither the word "safety", or "safe" occurs on that page.
If that is the case, then there should clearly be a warning, even for packages maintained by Netgate.
Even more useful would be redmine bug reports.
Can you clarify what sort of bug reports (storage requirements, update package documentation, update package list, etc) are?
Not if the trailer hitch you install isn't rated for 10 tons.
That's a good point. Hopefully, the truck, hitch, and/or trailer manual would clearly mention this so I would be aware of it before purchasing.
I don't want to quote Steve Jobs, but... you're holding it wrong.
Addressed in my previous post.
Used within its limitations, eMMC is a good solution.
What are the limitations? I have been unable to locate any information on store.netgate.com, and docs.netgate.com only has "Troubleshooting Disk Lifetime."
Please show me the Netgate device that came (only) with eMMC that cost(s) "thousands of dollars".
"Thousands of dollars" was not the main point of that sentence. But for reference, the 7100 was $999 USD in this announcement.
I suggest you tone down rhetoric such as this. "White-labeled" does disservice to our level of effort and engagement with Silicom. You suggest that all we do is slap a label on the box. Are you aware that Silicom also builds (quite similar) devices for Dell and others? Are you suggesting that these are also "white label"? Does Apple using Foxconn to build iPhones mean that iPhones are also "white label"?
My point was that concerning the storage, unless Netate is speccing something different, Silicom appears to use
eMMC storage in many of its devices (such as the unit the 4200 is based on). Also, I understand that the products that Foxconn builds for Apple are designed by Apple, manufactured exclusively for Apple, and are not resold under other brand names.TRIM (or an equivalent such as DISCARD) are required by JEDEC standards as far back as 2010.
I may be wrong about TRIM support on eMMC devices. I saw it mentioned in a few posts. TRIM support should help increase eMMC life (or, inversely, it would be even worse without it).
Yet you do not show these calculations, such that we may assess their accuracy. Specifically, I'm curious what you used for a Write Amplification Factor and how you determined same.
I used the calculations from here but increased the device size to 8GB, which doubled the result as expected. The WAF used is 4.5. If you can access better information, I would like to see how it changes the calculations. I also gave an example of the values for a 10x life increase.
An 8GB device with 3000 P/E cycles per cell works out to 24TB of total writes, assuming that every P/E cycle allows user data to be stored. That works out to ~22GB per day, ~900MB per hour, 15MB per minute, 252KB per second, to last for 3 years.
If pfSsense could use the disk stats in FreeBSD to track the data written (per second/minute and cumulatively) and display a warning when it exceeds a certain threshold, that would be very helpful.
I found it interesting that you did not acknowledge my frustration or express concern that a Netgate customer is experiencing such a high failure rate, regardless of the cause. I noticed you did not even address my assertion that we are experiencing a 30-40% failure rate of devices under 3 years old. Your focus seems to be on debating specific technicalities and semantics rather than the elephant in the room of why I and many others experience storage failures and what Netgate is doing to address this, whether through technical means or improved documentation and awareness.
-
@jwt said in Another Netgate with storage failure, 6 in total so far:
I actually came back to this thread to see if you had responded, but apparently not.
Unfortunately, I have been busy planning the storage replacement of the 10 Netgate devices that are showing 100% and 110% storage wear. Hopefully, we can complete the replacements before they suddenly die and leave our customers without internet service.
-
Another idea: Could pfSense display a warning when a user attempts to install certain packages, either always or just when eMMC storage is detected?
Warning: Package xyz is write-intensive and can cause accelerated wear and/or premature storage failure. It is recommended to use mmc-utils or the SMART tool and widget to monitor storage health. Further information is available here, here and here.
Maybe something like this on the Packages menu in pfSense?
Some packages are resource-intensive and/or require additional storage considerations. Review the supported packages list for any specific requirements.
I noticed that the documentation for the SMART status page is not in the Diagnostics menu documentation category - I had to search for it and found it under "System Monitoring" - it should probably be linked in both places.
-
@jwt I came back to this thread to see if you had responded, but apparently not.
-
-
It seems that @stbellcom is in a similar situation of the eMMC storage on his Netgate 6100's dying, some in a little as 6 months.
One device refused to power on after the eMMC failed, which I have also experienced. After he desoldered the eMMC chips, the unit started working.
https://forum.netgate.com/topic/196028/6100-failed-emmc-replaced-with-nvme-but-now-no-longer-reboots
-
@andrew_cb thread went silent………
-
Well it is the weekend. Even I try to clock out occasionally.
-
The problem with eMMC would hardly ever have occurred if the eMMC size had been chosen appropriately. After all, TBW is proportional to this parameter. Essentially, in this sense, the difference from SSDs is minimal, except for the type of connection and the fact that SSDs can be replaced without hot air soldering skills . What seems most odd to me is the choice of such small eMMC sizes, especially considering that eMMC prices in those years didn’t significantly increase the final cost of the product.
-
I was directed to this post from 2021 about pfSense with ZFS writing 14-20GB per day.
This got me curious, so I checked our fleet and found that the range is 3-28GB per day, with an average of 18GB. That's about 2-3 DWPD.This works out to an average of 6.5TB per year, and 19.7TB after 3 years.
Assuming each cell in the eMMC storage can be written 3000 times, 8GB can handle 24TB of writes before the storage is worn out.
This is the best-case scenario. Given the background filesystem behavior plus internal drive processing, writing 20TB of data of a 24TB maximum seems "good".
A 32GB eMMC could handle 96TB TBW, which would be 52GB per day maximum, and probably about 43GB (80%) per day real-world.Comparatively, a 250GB Samsung 860 EVO has a TBW value of 150TB, which works out to 82GB per day for 5years. Halving that to approximate a 120GB drive gives a TBW of 75TB and 41 GB per day for 5 years. This fits with the typical 0.3 DWPD rating of most SSDs.
Clearly, a larger storage drive, whether eMMC or SSD, greatly increases the lifespan of the storage.
If eMMC itself is not the problem, then the small 8GB appears to be a significant factor. Increasing the onboard eMMC storage to 16 or 32GB would greatly increase its lifespan and significantly reduce the number of premature device failures. -
Good morning @jwt . Just checking to see if Netgate is still looking into this and if you have any further comments to add.
-
My 2 1/2-year-old 6100 is showing over 100% usage, so I've ordered an NVMe to install. I did not have any logging-intensive applications running, so it's frustrating. The cheaper mini PCs I bought for other family members' houses are just chugging along with their SSDs, more memory, and faster CPUs.
Is there a guide on installing an NVMe and reinstalling pfSense on that drive out there somewhere? I think my drive arrives on Sunday, so that will be one of my weekend tasks.
-
@dstaylor There isn't a guide for the 6100. Try to find some forum threads on that since I seem to recall discussions. It's possible it's not an NVMe drive, too, I don't know, but some other models use SATA. Just in case you want to check before opening it.
-
@SteveITS I did see mention of NVMe for the 6100, and even a couple of pictures others had posted showing the drive installed. I just have not found any guide on doing an install and making sure it selects the new drive.
Guess I'll have my VM of 2.7.2 standing by in case the whole thing dies.
-
It is NVMe in the 4100/6100/8200. It's not hard to fit if you have any experience assembling PCs.
-
@dstaylor I feel your pain. The storage on your 6100 should not wear out so quickly!
Netgate staff say that it is not possible/supported/recommended for an end-user to install an SSD in the 4100 and 6100, so there is no official documentation.
"It's a time of purchase upgrade option."
Why does Netgate misleadingly advertise the slots when they have no use and the user is not supposed to touch them? Well, that is a real puzzler!
So if your Netgate device is 31 to 365 days old, you are SOL. Fortunately, this limitation does not apply to you since your device is out of warranty.
The actual SSD installation process is easy:
- Remove the rubber feet and then remove all 8 torx screws (2 front, 2 back, 4 under the feet).
- Remove the plastic filler panel that was held in by the screws.
- Gently separate the top and bottom half of your 6100.
- Install the M.2 NVMe drive into the slot.
- Carefully put the top and bottom halves together - pay attention to the LED lights and the 3 plastic "shrouds" that bend around the circuit board.
- Reinstall the screws and attach the feet.
The instructions from the 4200 are pretty similar and cover the software re-installation process.
This video of opening a 6100 should be helpful.
-
@jimp @stephenw10 @kphillips @marcosm @cmcdonald Would any of you care to comment on this thread? With 4.3k views here and over 60k views on Reddit, I hope that @jwt's comments do not represent Netgate's official and only response to the issues that have been raised.
-
@andrew_cb Some good points have been raised along with actionable suggestions to mitigate the issue. Thanks for the constructive feedback - the issue has our attention.