Another Netgate with storage failure, 6 in total so far
-
It looks like the ZFS block size can be changed with the command
sudo zfs set recordsize=[size] [dataset name]
However, this will not change the block size of existing files.
In theory, this should help with log files since a new file is created at rotation. -
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
For example, if a log file is written every 1 second, then in the span of 5 seconds there could be 640 KB (5 x 128 KB blocks) of data to write, just for that single log file.
Maybe, but it could also be one block if all updates fit into the one block. (there is also compression)
(Edit: 5 updates to one log file, or to 5 log files)Note the amount of metadata changes with recordsize (there are more blocks per file), as do things like write speed.
It stands to reason though most of the writing on pfSense is log files, or periodic file updates like pfBlocker feeds.
-
K keyser referenced this topic
-
I was able to do some testing of different zfs_txg_timeout values. I am using Zabbix to call a PHP script every 60 seconds and store the value in an item. Attached to the item is a problem trigger that fires whenever the value changes. The data is visualized using Grafana, with an annotation query that overlays the "problems" (each time the value changes) as vertical red lines. There is a slight delay before Zabbix detects the change, so the red lines do not always align exactly with changes on the graph.
The testing was performed on 3 active firewalls performing basic functions, so they are representative of real-world performance in our environment and likely many others. Notice how the graphs for all 3 firewalls are nearly identical, reinforcing that the underlying OS drives this behavior.
Changing from 5s to 120s results in a 90% reduction in write activity. The change numbers shown on the graph are a bit low because some data from the previous 10s value is included from before changing to 5s at the start of the test period. Interestingly, values of 45 and 90 seem to produce more variation in the write rate compared to other values.
Looking at just the change from 5s to 10s, the average write rate has been reduced by 35%.
Changing from 10s to 15s gives a further 30% average reduction in write rate.
There are diminishing returns from values above 30.
The gain from 15s to 120s is 32%, specifically, 15s to 30s is 16%, and from 30s to 120s is an additional 16%.Calculating for a 16GB storage device with a TBW of 48 gives us the following:
timeout (s) kb/s change lifespan (days) lifespan (years) 5 309 656 1.8 10 175 -43% 1159 3.2 15 129 -58% 1572 4.3 20 104 -66% 1950 5.3 30 81 -74% 2503 6.9 45 66 -79% 3072 8.4 60 59 -81% 3437 9.4 90 39 -87% 5199 14.2 120 31 -90% 6541 17.9
Based on this data, changing the default value of zfs_txg_timeout to 10 or 15 will double the lifespan of devices with eMMC storage without causing too much data loss in the event of an unclean shutdown.
This change, combined with improved eMMC monitoring, would nearly eliminate unexpected eMMC failures due to wearout.
-
Leaving this here for reference: https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Module%20Parameters.html#zfs-txg-timeout
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
Based on this data, changing the default value of zfs_txg_timeout to 10 or 15 will double the lifespan of devices with eMMC storage without causing too much data loss in the event of an unclean shutdown.
This is a great write-up. Thanks
-
-
@fireodo said in Another Netgate with storage failure, 6 in total so far:
Edit 23.04.2025: Today I had a Power failure that exhausted the UPS Battery. The timeout was set at 2400. The pfsense has shudown as set in the apcupsd and after Power has come back the pfsense came up without any issues.
Such a high timout (40 minutes) doesn't make you start to worry about filesystem resiliance/consistency?
I found some interesting remarks on the
vfs.zfs.txg.timeout
parameter in Calomel's FreeBSD Tuning and Optimization guide (CTRL+f
'ing for it). -
@tinfoilmatt said in Another Netgate with storage failure, 6 in total so far:
high timout (40 minutes) doesn't make you start to worry about filesystem resiliance/consistency?
I'm experimenting ... I dont know what happends if the Power will suddenly dissapear without clean shutdown ...
-
Is this considered a maintenance/revision item with respects to software development or a regression? I can't understand the amount of debugging by way of brute force that must have taken place with this. YOU ARE AMAZING !!! Thank you for sharing how you fixed this.
-
We had a 6100 die on Sunday. It was installed in May 2023, which means it lasted 1.9 years, so my calculation of 1.8 years was pretty close! I had recently set zfs.txg.timeout to 60 seconds, but it was clearly too little, to late. I have now set all high-wear devices to 3600s and am hoping they will survive a bit longer until we can upgrade or replace them.
Getting B+M key NVMe drives has been a major challenge - the KingSpec NE 2242 is pretty much the only option and it's hard to get in Canada. Combined with the fact that many devices are at remote sites, upgrading all devices to SSDs is taking much longer than desired.
When I look at storage wear by model, nearly all of the 110% devices are 2-3 year old 4100 and 6100. The 6-7 year old 3100 and 7100 are mostly at 50% or less. I suspect that the 3100 being limited to UFS only and the 7100 having 32GB of eMMC and being on ZFS for less than 3 years explain why they are outlasting the 4100 and 6100.
For some good news, we were able to revive a non-booting 4100 by desoldering the eMMC chip using a heatgun. The eMMC had failed and I had got it running from a USB flash drive, but after a few reboots it stopped showing anything on the console and the orange light was pulsing instead of solid. Once the eMMC chip was removed, it boots to the NVMe drive and works great.
@marcosm is working on a patch to reduce write rates while ensuring configuration changes are immediately committed to disk.
-
@andrew_cb
There are some dumb adapters from 2242 to 2230. Looks like 2230 nvme is a common thing, you can buy it almost everywhere.
Of course, there's no certainty that it will work, but if I had enough devices and no better options available, I would give this approach a try. -
I've had no issues with the old Western Digital SN520 NVMe drives in my 4100's. The key to finding the older key B+M key in an NON sata interface (which seems more common) is that SN520 line. Here's my verbatim search which does pretty well at locating them without limiting it to the length or capacity:
"sn520" "nvme" "m.2" "x2"I usually just go for the full length 80mm ones by adding "2280" to the search so I don't have to use any of those adapters which always seem so flimsy.
-
@andrew_cb recordsize is an upper limit, ashift is the lower limit (should be 4k on modern storage devices, which is a value of 12, set on pool creation time and is viewable on the pool variables).
The record will be from one ashift allocation up to record size in multiples of ashift. The bigger the difference between ashift and recordsize, the higher potential for compression gains, as compression can only work on multiples of ashift inside the recordsize.
So if e.g. a write is 7KB, then it will be a 8KB write not 128KB.
The one change should just be boost zfs txg timeout to something like 60-120 seconds as a default.