Another Netgate with storage failure, 6 in total so far
-
Yesterday, I had ANOTHER 4100 die, just 2 years old. I tried to salvage the situation by remotely walking the customer through installing pfSense onto a USB drive, but unfortunately, after I logged into the fresh installation and restored the configuration, the firewall refused to boot or even power on. There is no console activity and it just sits with a flashing orange LED. This is the second unit to die completely and refuse to power on.
I have tried to defend other pfSense issues as anomalies, but these storage and device failures are now a complete disaster.
No other firewall brand stipulates that storage must be treated gently for it to last.
This limitation might not be so bad if it was clearly visible and disclosed before purchase, but I challenge anyone to find any mention of the limitations/dangers of eMMC storage on the product pages or anywhere other than the 2 troubleshooting articles.
Let's take a look at the 6100 product page:
https://shop.netgate.com/products/6100-base-pfsenseLOW TOTAL COST OF OWNERSHIP -No artificial limits or add-ons required to make your system fully functional. -This system is designed for a long deployment lifetime. GROWS WITH YOU -From firewall to Unified Threat Management, get all the security features you need to protect your home or business. -Flexible configuration and support for multi-WAN, high availability, VPN, load balancing, reporting and monitoring, etc. -Add optional packages such as Snort or Suricata for IDS/IPS and network security monitoring. EASY GUI MANAGEMENT -Manage pfSense Plus software settings through our web-based GUI. -No fumbling with a command-line interface or typing arcane commands.
The only noticeable difference between the BASE and MAX versions is the addition of a 128GB SSD for $100. The product pages mention all the great things that can be done natively and with packages, but they do not mention any storage concerns.
The product pages need a section that describes the limitations of the onboard eMMC storage with links to the "troubleshooting" documents, and advises getting the MAX version for anything other than basic, out-of-the-box usage.
Further, packages and any other system features, such as RRD graphs and logs, should include text warning of the storage issues, and contain links to the documentation. The current situation leaves a high risk that a user will make a few simple changes and unknowing turn their Netgate appliance into a ticking bomb.
To make things worse, the default configuration does not automatically perform disk health monitoring nor does it place the SMART widget on the dashboard. Monitoring eMMC storage requires using the CLI to install a package and run commands manually. Even worse, the storage of the 4200 cannot be monitored at all!
I have read recommendations from Netgate staff to disable logging the default firewall ruleset to reduce storage wear.
Enabling RAM disks could help alleviate these issues, but then all logging is lost on reboot. This would be a possible solution if the general system log was kept on disk for troubleshooting and security monitoring, but some things (like ARP watch, gateway issues etc) can still flood the log and cause disk writes.
We like pfSense, and the Netgate hardware is fine, but the Achilles heel is the eMMC storage, which is simply unfit for purpose. There are many posts online and here in the forums of people with similar issues.
For a business-class device, the onboard storage device and limitations do not make sense.
My management team is concerned and we are looking at solutions for our entire fleet of 45 Netgates before more fail and cause disruption to our customers.
If there is something we are overlooking, I would be happy to hear any suggestions.
-
-
Including the eMMC monitoring package with pfSense was requested 3 years ago but so far still has not been done:
-
@andrew_cb I agree with you. I make it a point to only deploy MAX versions of the Netgate due to the storage. The lowest spec i would go is the 4200 and must be MAX.
On top of the issues you mentioned, you have to take care about the amount of logging that you do. In my case, every single rule created is logged. Thats the policy. My 6100s would've died a year into service but because they are running SSDs, i am 2 years in and without any issues. If you are logging heavily or just have lots of I/O to your drive for whatever reason, selecting a Netgate with eMMC is going to cause you lots of pain. The only exception I make is the 1100. Thats a very cheap device you throw into a closet somewhere in a cafe not in a datacenter so the risks I take with it are worth having.I think its a huge flaw to advertise devices with eMMC storage. The standard should be SSD drives, full stop.
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
Enabling RAM disks could help alleviate these issues, but then all logging is lost on reboot. This would be a possible solution if the general system log was kept on disk for troubleshooting and security monitoring, but some things (like ARP watch, gateway issues etc) can still flood the log and cause disk writes.
You should be sending logs to a remote syslog server to be fair.
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
Even worse, the storage of the 4200 cannot be monitored at all!
If this is for a business deployment then not selecting this model is a no brainer. If there are no tools to monitor the health of the device including SNMP related monitoring for it, then don't deploy it. You are putting clients at risk putting an unreliable and un-monitorable solution in their environment.
-
You should be sending logs to a remote syslog server to be fair.
I definitely agree, but again, there is no indication that just the regular device logs are a threat to longevity. It would also require more infrastructure setup than comparable devices. For example, Sonicwall, Sophos, Fortinet, Meraki, etc devices can do the same kind of logging for 10 years without an issue. Filling up the storage space is one thing, but having it outright die in 2-3 years due to logging is ridiculous.
If this is for a business deployment then not selecting this model is a no brainer. If there are no tools to monitor the health of the device including SNMP related monitoring for it, then don't deploy it. You are putting clients at risk putting an unreliable and un-monitorable solution in their environment.
The product page has no mention of the inability to monitor the 4200's eMMC storage, which is a loss of functionality compared to the 4100 and 6100 it replaces. Unfortunately, we've already purchased and deployed several 4200 (including 3 to replace failed 4100). I completely agree about not deploying devices that cannot be monitored. We have Zabbix on all pfSense firewalls and it works great.
I agree with you. I make it a point to only deploy MAX versions of the Netgate due to the storage. The lowest spec i would go is the 4200 and must be MAX.
On top of the issues you mentioned, you have to take care about the amount of logging that you do... My 6100s would've died a year into service but because they are running SSDs, i am 2 years in and without any issues. If you are logging heavily or just have lots of I/O to your drive for whatever reason, selecting a Netgate with eMMC is going to cause you lots of pain.Would it be correct to assume that you learned these issues the hard way and have experienced storage failures in the past before switching to only MAX versions?
My main gripe is the complete lack of information, warnings, or disclaimers prior to purchasing and during general usage. and there is no way for a reasonable person to know about the risks with the onboard eMMC storage until it is too late.
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
Would it be correct to assume that you learned these issues the hard way and have experienced storage failures in the past before switching to only MAX versions?
I have been a pfsense user for quite some time. I am on the forums here and on reddit. The countless tales of unreliable eMMC storage is a tale as old as time so i knew that once i was going the MSP route i knew based on other users' experiences of what not to do.
Should there be a warning in the marketing? I don't know...eMMC may work really well depending on the deployment. Arista 7050CX3 switches have eMMC storage. Enterprise-grade vendor putting in crappy storage. Then again, there isn't heavy writing to the storage on a switch but I am just trying to illustrate to you that putting these parts in a networking device isn't uncommon. As i mentioned, i have shoved 1100s in a corner at a cafe and no issues for years. I also tune the logging down significantly.
-
@andrew_cb it’s not a product page but I think you’re asking for https://www.netgate.com/supported-pfsense-plus-packages
FWIW I don’t recall that we’ve ever had storage failure at any of our clients. Obviously, situations/setups can differ.
Also maybe useful for readers:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html