Another Netgate with storage failure, 6 in total so far
-
Well you can lose /var and /tmp entirely (if they are ramdisks) and the system will still boot back and replace them.
-
@stephenw10 said in Another Netgate with storage failure, 6 in total so far:
Well you can lose /var and /tmp entirely (if they are ramdisks) and the system will still boot back and replace them.
So, you agree with me, letting /tmp AND /var "out" of sync would do no harm?
Fact is - since 2021 I had reboots but no problems with these settings. Power failure I have not because my pfsense hangs on a UPS.
-
Side note: I keep thinking about a smart Optane NVMe
drive that leverages ram for the constant rewrites and a capacitor/battery for power fail to write to SSD. Some Symantec and Veritas systems as well as TinTri systems have such a capacitor. 4200 systems need the NVMe drive. the needs of zfs are burning up onboard componentsPut specific items in the ram of the nvme that burn up drives with zfs in this type of environment.
-
@stephenw10 said in Another Netgate with storage failure, 6 in total so far:
Well you can lose /var and /tmp entirely (if they are ramdisks) and the system will still boot back and replace them.
@fireodo said in Another Netgate with storage failure, 6 in total so far:
So, you agree with me, letting /tmp AND /var "out" of sync would do no harm?
That's not the same thing. Use of ramdisks is a configuration driven thing, and the system knows that /var must be recreated at boot. There is a lot of data stored in /var, and if it's corrupted you could encounter any number of problems.
IMO, you're fine letting /tmp go to pot, but not /var.
-
Yup when ramdisks are enabled it triggers a bunch of things at boot and shutdown. But it might be possible....
-
@JonathanLee said in Another Netgate with storage failure, 6 in total so far:
needs of zfs are burning up onboard components
I'm not trying to say people aren't having problems, I'm trying to understand "why" they are.
What needs of ZFS are causing this?
Burning up these kinds of components are related to the number of erase cycles, which circles back around to writes.
What parts of pfSense are doing a lot of writes? I can see persistent logging, maybe "check for updates", maybe atime property enabled? Updating persistant store for block lists?The way I look at it, once the system is configured and running, it should be doing mostly read from the filesystem and running from memory.
-
@stephenw10 said in Another Netgate with storage failure, 6 in total so far:
Be good if you could set it for /tmp only.....
It would be really good if you could simply do ramdisk for /tmp only. No need to save/restore.
-
@mer said in Another Netgate with storage failure, 6 in total so far:
What parts of pfSense are doing a lot of writes?
Some packages (https://www.netgate.com/supported-pfsense-plus-packages), logging of default block rules, IGMP block logging, logging set in packages, updates of block lists and country lists, nginx access log (dashboard widgets), and similar.
-
@SteveITS what files can use a linker file to direct to a usb drive? That model can use a usb drive right ?
-
@SteveITS
Hmm, so enabling compression "on the fly" in case of logs can significantly reduce writes, yes?zfs set compression=lz4 pfsense/var/log
Some log compression options can be enabled via the GUI, but I don’t think they use "on-the-fly" compression.
-
@SteveITS said in Another Netgate with storage failure, 6 in total so far:
logging of default block rules, IGMP block logging
These two can be quite voluminous, but are easy to address:
- Add a rule on Local to pass IPv4+IPv6 IGMP with IP options set. I think this should actually be a default rule in pfSense.
- Disable logging of packets blocked by the default rule in Firewall Logs. There are often thousands of these every day, and the individual log entries really aren't of much value.
-
@JonathanLee Oh I have no idea. :)
@w0w said in Another Netgate with storage failure, 6 in total so far:
Some log compression options can be enabled via the GUI, but I don’t think they use "on-the-fly" compression.
Yes it does: https://docs.netgate.com/pfsense/en/latest/monitoring/logs/index.html#log-format
"ZFS already compresses this data"
@dennypage said in Another Netgate with storage failure, 6 in total so far:
easy to address
Yep, mentioned above. In a link maybe, it's been a long thread. We actually don't pass the IGMP, since it's "supposed" to be blocked (always has been) we add a block rule that is set to not log. Otherwise IGMP is logged even if the logging for the default block rule is off.
-
@SteveITS said in Another Netgate with storage failure, 6 in total so far:
We actually don't pass the IGMP, since it's "supposed" to be blocked (always has been) we add a block rule that is set to not log.
I would not say IGMP is supposed to be blocked, and it's rather inefficient to do. Multicast flooding is not desirable, even if it's only mDNS.
Of course, if IGMP is completely disabled in your switches, it doesn't matter. But if it is disabled in your switches, you won't see the IGMP messages to begin with.
-
@dennypage rephrasing, pfSense blocks them by default.
https://docs.netgate.com/pfsense/en/latest/troubleshooting/log-filter-blocked.html#packets-with-ip-options -
@SteveITS said in Another Netgate with storage failure, 6 in total so far:
rephrasing, pfSense blocks them by default.
Yea, that's why I called out that pfSense should add a default pass rule for IGMP.
Blocking packets with IP options that are to be forwarded is a good default, however IGMP isn't forwarded. Blocking IGMP by default makes little sense.
-
@dennypage @SteveITS I had commented on redmine 15400 but since it was closed I guess that my message went unnoticed.
I have created a new redmine 16068 for adding options to disable logging of packets with IP options.
-
Thanks to @andrew_cb and others for bringing awareness to this. I had no idea my 6100 has limitations due to the eMMC. I went out of my way to buy a 6100 over my own router build because I just wanted to setup my router and forget about it. As someone who is fully remote the router is the last thing I can have fail.
I saw the threads on Reddit and did a quick check. Just over 1.5 years of having my 6100, it’s already at 70% wear.
I bought 2x16GB Intel Optane Drives which you can get for less than 5 euro a piece and managed to get them installed and set up in a mirror for redundancy (the drives are so cheap, I think it’s silly not too). I also 0’d out my eMMC drive to ensure it does’t cause any conflicts.
I’m not thrilled that I had to do this, I’m thankful there were M.2 ports on the 6100 that I could use. But my biggest take-away is that installing your own SSD is not “supported” and could void your warranty is unacceptable. I think there should be a well documented SSD upgrade for any device that has an available slot, it should not void the warranty, and most definitely shouldn’t be discouraged.
As a side note, I really wish the installer was offline. I was sweating bullets attempting to configure the WAN in the Installer with PPoE and VLAN tagging (don’t get me wrong, it was easy, but if it didn’t work I’d be SOL).
-
@kingsleyadam I am glad you discovered the storage wear on your 6100 and installed an SSD before you experienced a sudden failure!
I had no idea my 6100 has limitations due to the eMMC. I went out of my way to buy a 6100 over my own router build because I just wanted to setup my router and forget about it. As someone who is fully remote the router is the last thing I can have fail.
Your comment is exactly what this thread is about.
There have been many good suggestions in this thread on ways to reduce the wear of the onboard eMMC, but they do not address the main point of this thread:
If any usage assumptions or limitations are not clearly stated upfront or in the documentation, then it does not matter what the technical reasons are, how valid they are, or what workarounds are available!
You cannot advertise a ladder as great for construction work and then not disclose that it has a 100-pound weight limit, just like you cannot sell a manual transmission vehicle without an instrument panel and then say it is the user's fault when the engine blows up.
If Netgate sold ladders like they sell firewalls, what kind of chaos would result from people using the Netgate Ladder-4100 BASE version?
If there are limitations, recommendations, or "best practices" regarding firewalls with eMMC storage, then state them clearly on the product page and conspicuously in the documentation! That would significantly reduce or even eliminate this entire problem.
It has been nearly two months since Netgate acknowledged the issue, and there have been no changes. I do not understand why Netgate refuses to spend an hour copying and pasting an informational blurb to the store product pages.
-
-
It has been nearly two months since Netgate acknowledged the issue, and there have been no changes.
[literally responding from a Starbucks in So Colorado at 5:30 on a Friday]
You are wrong, there are a lot of changes in-progress, but I’m not getting I to this with you, here, right now.
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
just like you cannot sell a manual transmission vehicle without an instrument panel and then say it is the user's fault when the engine blows up.
I have several old Toyota trucks with manual transmissions that do not have tachometers.