eMMC Write endurance
-
Hi.
I cannot find any specifics from netgate about the write endurance of the eMMC in the smaller appliances (SG-1100, SG-2100, SG-3100 and SG-6100).
Since a package like NtopNG fx. easily writes a sustained one MB/s for a "smallish" home network, the eMMC will VERY likely be the device killer for all these appliances. If we are talking "high quality 3D MLC" flash in endurance mode it will still not last more than a 3-4 years with that kind of intensity on so little flash space. If it's just regular MLC flash it will be dead well within a year at that level of writes.
Can we get some official figures for write endurance on the different models?
-
@keyser As a follow up I would like to express my worry about going ZFS on the small eMMC appliances. ZFS adds quite a heavy write overhead on my 6100, and with pfblockerNG now again seeming to write heavily in "python mode", the eMMC will not last more than a couple of years in my device. And that's with a very trimmed log-less pfblockerNG config.
Just for fun: I have 25 active devices on my network - most of them does nothing, and my internet connection is rarely above 800KB/s unless I'm doing direct downloads or streaming is active.
With ZFS, pfBlockerNG in default config with only 4 feeds loaded and NTopNG running, my box averages about 1 MB/s sustained write to the SSD. That is actually more data than is being fetched on the internet connection.
At that rate I calculate my eMMC will last far less than a year.This is going to become a netgate scandal unless the write endurance on those eMMC's are on an intirely different level than regular SSD's
Seems I need to go get some SSD's for my boxes to remove this threat of prematurely killing my boxes.
-
@keyser Wondering... would running with the RAM disk option be of any help here? I have a 6100 too and popped an SSD into it. Never worried to much about it but reading your post has got me slightly concerned.
-
-
@luckman212 said in eMMC Write endurance:
@keyser Wondering... would running with the RAM disk option be of any help here? I have a 6100 too and popped an SSD into it. Never worried to much about it but reading your post has got me slightly concerned.
I assume it will help since that causes /var and /tmp to be memory based Filesystems.
But since it is a "all or nothing" approach that will loose you all your logs, and all monitoring info at reboots it's not really a nice solution either.
If it could just be used for pfBlockersNG's sustained temporary writes to disk (data not destined for logs) it would be great :-)
Likewise with NtopNG. I wish you could enable a Memory option to keep all the RDD's i memory and only flush data to disk at shutdown/reboot. -
Yes it is disappointing that Netgate chooses not to provide detailed endurance specifications for the eMMC components used in their hardware. Although to be fair this is not uncommon.
Anyway I have access to 3 different models of Netgate hardware containing eMMC. Based on FreeBSD dmesg output (dmesg | grep mmc) and a bit of research:
-
SG-1000 - Kingston M627 4GB MLC eMMC
Endurance - not published -
SG-1100 - Sandisk iNAND 7250 (DG4008) 8GB MLC/SLC eMMC
Listed endurance 20TBW. P/E cycles - MLC 3k ; SLC 30k -
XG-7100 - Kingston M525 32GB MLC eMMC
P/E cycles - MLC 3k ; pSLC 30k
(Later variants of this component may be TLC based with similar P/E cycles)
NB these are only samples - it is entirely possible that eMMC components may change across different production runs.
Checking eMMC Wear
- Install mmc-utils package - pkg install mmc-utils
(package only available in pfSense Plus 21.05 / CE 2.5 & later) - Check eMMC life time estimation
mmc extcsd read /dev/mmcsd0rpmb | egrep -i ^emmc - Life Time Estimation A is SLC NAND (or pseudo SLC) - multiply by 10 to get upper bound of % life used - eg 0x08 is 80%
- Life Time Estimation B is MLC NAND - multiply by 10 to get upper bound of % life used
- Pre EOL status - 0x01 = Normal, 0x02 = Warning (80%+), 0x03 = Urgent (90%+)
An example is my lab SG-1000 which has SLC wear @ 100% and MLC wear @ 80%. Bit of a surprise as it is only used as a VPN client test router with no extra packages installed. Recently I enabled RAM disk for /var & /tmp to try to squeeze a few more months of life out of it.
My SG-1100 fleet is still running 2.4.5p1 so I'll start finding out in the next few months (as they are upgraded) how they're going.
Other issues
Generally partition alignment is considered important for SSD/eMMC devices. Based on the 3 samples above:
- SG-1000 - aligned to 4MB - good
- SG-1100 - unaligned. Maybe a quirk of the EspressoBin board? Or perhaps later pfSense Plus recovery images correct this?
- XG-7100 - aligned to 32kB - good
Also to be noted is that eMMC drives generally support TRIM, but in all cases it was disabled. Again there may be reasons for this (eg TRIM on some older drives is problematic). Having said that TRIM is generally considered useful and so perhaps Netgate could revisit this?
Finally the ZFS elephant is now in the room :-) Only for 64 bit systems but since the SG-1100 only has 1GB RAM that may be optimistic. XG-7100 has plenty of RAM so we will evaluate it. In theory the possibility of ZFS integrity & snapshots is very interesting but not if it wears the eMMC faster than UFS. Hopefully Netgate may publish some more technical information on how their ZFS implementation is tuned for pfSense.
-
-
@keyser said in eMMC Write endurance:
will loose you all your logs, and all monitoring info at reboots
Not all, 1-24 hours as specified. Screen cap:
-
@steveits said in eMMC Write endurance:
@keyser said in eMMC Write endurance:
will loose you all your logs, and all monitoring info at reboots
Not all, 1-24 hours as specified. Screen cap:
Yes I know, but that still looses the important data (pfBlockerNG and NtopNG)
-
@dugeem said in eMMC Write endurance:
Yes it is disappointing that Netgate chooses not to provide detailed endurance specifications for the eMMC components used in their hardware. Although to be fair this is not uncommon.
Anyway I have access to 3 different models of Netgate hardware containing eMMC. Based on FreeBSD dmesg output (dmesg | grep mmc) and a bit of research:
-
SG-1000 - Kingston M627 4GB MLC eMMC
Endurance - not published -
SG-1100 - Sandisk iNAND 7250 (DG4008) 8GB MLC/SLC eMMC
Listed endurance 20TBW. P/E cycles - MLC 3k ; SLC 30k -
XG-7100 - Kingston M525 32GB MLC eMMC
P/E cycles - MLC 3k ; pSLC 30k
(Later variants of this component may be TLC based with similar P/E cycles)
NB these are only samples - it is entirely possible that eMMC components may change across different production runs.
Checking eMMC Wear
- Install mmc-utils package - pkg install mmc-utils
(package only available in pfSense Plus 21.05 / CE 2.5 & later) - Check eMMC life time estimation
mmc extcsd read /dev/mmcsd0rpmb | egrep -i ^emmc - Life Time Estimation A is SLC NAND (or pseudo SLC) - multiply by 10 to get upper bound of % life used - eg 0x08 is 80%
- Life Time Estimation B is MLC NAND - multiply by 10 to get upper bound of % life used
- Pre EOL status - 0x01 = Normal, 0x02 = Warning (80%+), 0x03 = Urgent (90%+)
An example is my lab SG-1000 which has SLC wear @ 100% and MLC wear @ 80%. Bit of a surprise as it is only used as a VPN client test router with no extra packages installed. Recently I enabled RAM disk for /var & /tmp to try to squeeze a few more months of life out of it.
My SG-1100 fleet is still running 2.4.5p1 so I'll start finding out in the next few months (as they are upgraded) how they're going.
Other issues
Generally partition alignment is considered important for SSD/eMMC devices. Based on the 3 samples above:
- SG-1000 - aligned to 4MB - good
- SG-1100 - unaligned. Maybe a quirk of the EspressoBin board? Or perhaps later pfSense Plus recovery images correct this?
- XG-7100 - aligned to 32kB - good
Also to be noted is that eMMC drives generally support TRIM, but in all cases it was disabled. Again there may be reasons for this (eg TRIM on some older drives is problematic). Having said that TRIM is generally considered useful and so perhaps Netgate could revisit this?
Finally the ZFS elephant is now in the room :-) Only for 64 bit systems but since the SG-1100 only has 1GB RAM that may be optimistic. XG-7100 has plenty of RAM so we will evaluate it. In theory the possibility of ZFS integrity & snapshots is very interesting but not if it wears the eMMC faster than UFS. Hopefully Netgate may publish some more technical information on how their ZFS implementation is tuned for pfSense.
Thank you for the absolutely excellent post :-)
Since we are VERY likely at the limit of the write endurance on these eMMCs, and I'd rather be safe than sorry, I have ordered a 512Gb GB SSD for both my SG-2100 and 6100.
That way I can allow my self to enable all the logging and monitoring I want (IE: what the box can handle when it comes to SG-2100).So this post is now a reminder to people looking - and perhaps Netgate to answer officially - in the future :-)
-
-
@dugeem said in eMMC Write endurance:
Check eMMC life time estimation
mmc extcsd read /dev/mmcsd0rpmb | egrep -i ^emmcThis tool came up in in previous thread talking about writes to emmc and ssds, etc. and zfs writing all the time..
I have a sg4860, and I don't see any such mmcs* in my /dev dir - I see da0 and 0p1, 0p2 and 0p3 but I don't see how to use this tool to check the eMMC wear, which I do believe my sg4860 is suppose to have a 4GB eMMC, I sure didn't put any ssd into it, etc.
I see this in my dmesg
da0 at umass-sim0 bus 0 scbus6 target 0 lun 0 da0: <Generic Ultra HS-COMBO 1.98> Removable Direct Access SCSI device da0: Serial Number 000000225001 da0: 40.000MB/s transfers da0: 29184MB (59768832 512 byte sectors) da0: quirks=0x2<NO_6_BYTE>
I should prob open this up.. Because clearly I have more than 4GB eMMC because show 23GB size of disk.. Which actually then I could replace that I would think - which seems good. Now if just check how much wear it has on it ;)
-
@rcoleman-netgate said in updated to 22.01 - SG1100 high CPU usage '/sbin/pfctl -vvsr':
Most users that are [using Snort] on an eMMC have stated they were unaware of the recommendation to use an SSD/HDD for that task (and since the 1100 lacks that ability) they had a configuration that destroyed their eMMC in weeks, or months.
So that's disappointing. As pointed out in that thread, https://www.netgate.com/supported-pfsense-plus-packages lists several as "requires SSD" including NtopNG, and others as SSD recommended. I was unaware of that list.
I checked one of our older (Oct 2017) 3100s that isn't using IDS and I get:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x05
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x00
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01So that's 50% life used and 0% life used?
A more recent (Oct 2020) 2100 using Snort but with very few alerts shows:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01A client's 3100 from Nov 2017 that is using Suricata and OpenVPN shows:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x04
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x00For reference, per https://forum.netgate.com/topic/170081/gui-services-in-the-system-log-are-filled-with-nginx-messages/8 the web server logging can be turned off between pfSense updates by editing /etc/inc/system.inc.
After installing the mmc-utils package, if the path hasn't been rescanned, use /usr/local/sbin/mmc.
Edit: we also normally turn off a lot of logging like the four "Log firewall default blocks" options.
-
@johnpoz said in eMMC Write endurance:
I have a sg4860, and I don't see any such mmcs* in my /dev dir - I see da0 and 0p1, 0p2 and 0p3 but I don't see how to use this tool to check the eMMC wear, which I do believe my sg4860 is suppose to have a 4GB eMMC, I sure didn't put any ssd into it, etc.
Weird - that looks like an USB attached drive. According to Netgate SG-4860 specs they started fitting larger 32GB eMMC drives from the end of 2015.
Certainly older eMMC components (pre eMMC v5.0 IIRC) may not support the lifetime estimates.
-
FYI I've opened a Redmine feature request to add mmc-utils to base images:
https://redmine.pfsense.org/issues/12860
Hopefully also a simple GUI wrapper for lifetime estimates & EOL info.
-
@dugeem said in eMMC Write endurance:
Weird - that looks like an USB attached drive
Exactly that. In the RCC-VE platform devices the eMMC is USB attached and mmcutils cannot read it directly.
Some of the 1100 drives cannot be read either due to the eMMC version.
Steve
-
-
-
-
@keyser said in New Netgate Appliance for IPS/IDS:
@steveits said in New Netgate Appliance for IPS/IDS:
The other 3100 (40%) is 3 days 7 hours uptime and:
device r/s w/s kr/s kw/s ms/r ms/w ms/o ms/t qlen %b flash/sp 0 0 0.0 0.0 7 0 0 7 0 0 mmcsd0 0 0 0.5 29.1 2 7 0 7 0 0 mmcsd0bo 0 0 0.0 0.0 0 0 0 0 0 0 mmcsd0bo 0 0 0.0 0.0 0 0 0 0 0 0 md0 0 0 0.0 0.0 0 0 0 0 0 0
Probably would be better to wait a few weeks and do the math. :)
Yes, a long uptime would be much better. Those numbers posted with this box is more in line with the 11 - 12Tb Write endurance I guesstimated for the 8GB eMMC.
Also, I forgot we recently enabled the RAM disk feature on that router so the “iostat -x” numbers I quoted here are with the RAM disk active. I have been doing that when upgrading routers to 22.01.
I'll try to remember to check our 3100 in a few weeks.
-
-
-
-
@steveits said in eMMC Write endurance:
For reference, https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
Minor note: the above instructions include a step with the csh builtin command rehash:
pkg install -y mmc-utils; rehash
but using the GUI Diagnostics->Command prompt is a sh, not csh, hence:
Shell Output - rehash sh: rehash: not found
Nevertheless, for others perhaps with similar setup, the first device I checked, an 18 month old sg-1100 (pfblockerng-dev the only package) reports:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01 eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x04 eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
Would appear it has maybe 3 years of life remaining.
-
Also per the above posts and https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-writes.html "...if there is enough RAM to spare, using RAM disks will drastically reduce disk writes over time."
One note on the RAM Disk feature, that doc page says it will preallocate the RAM. However with the RAM disk now using tmpfs, it only allocates RAM as files are written to it.
-
@steveits said in eMMC Write endurance:
Also per the above posts and https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-writes.html "...if there is enough RAM to spare, using RAM disks will drastically reduce disk writes over time."
One note on the RAM Disk feature, that doc page says it will preallocate the RAM. However with the RAM disk now using tmpfs, it only allocates RAM as files are written to it.
Hi Steve
I must admit I have not tested the ramdisk feature thoroughly - probably mostly because I lack detailed understanding of the “collateral dataloss” it will cause.
I get that you can have RDD data, DHCP leases and system logs flush to disk periodically, but since I use pfBlockerNG and NTopNG for historical logs and trend analysis, I have never bothered really testing RamDisk.
I assume it is still true that they will loose all logs and trend data at every reboot if you use RamDisk? -
@keyser said in eMMC Write endurance:
loose all logs and trend data at every reboot if you use RamDisk
I missed your question, sorry. Yes and no... on the System/Advanced/Miscellaneous page the "Periodic RAM Disk Data Backups" section covers how often that info is written to disk. Per the Netgate doc on RAM disks, "Data for both is saved during a proper shutdown or reboot, and also periodically if configured." By "both" I think it means RRD and DHCP (mentioned in the previous sentence)? Possibly /tmp and /var but I suspect it would be up to a package to copy their own files...? Not really sure, there. I just logged into a backup router to generate a system log entry, rebooted, and the log entry for my login was still there, along with a few others for Suricata and pfBlocker processes stopping.
So an unexpected power off is the main risk. Also, RAM disks should be easier on UFS drives in terms of file system corruption during power loss.
@steveits said in eMMC Write endurance:
remember to check our 3100 in a few weeks
After 14 Days 10 Hours uptime, the 3100 with the RAM disk active and without IDS:
iostat -x extended device statistics device r/s w/s kr/s kw/s ms/r ms/w ms/o ms/t qlen %b flash/sp 0 0 0.0 0.0 7 0 0 7 0 0 mmcsd0 0 0 0.1 4.8 1 4 0 4 0 0 mmcsd0bo 0 0 0.0 0.0 0 0 0 0 0 0 mmcsd0bo 0 0 0.0 0.0 0 0 0 0 0 0
I had also found the "Ignore denied clients" option in DHCP server which reduced log writing somewhat.
-
@steveits Thanks Steve.
Well your 3100 will last a lifetime with that - almost non-existent - write intensity to the eMMC. No doubt the RAM disk has a profound impact on this issue.
I’ll see if I can find the time to investigate and test RAMdisk further.