• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Another Netgate with storage failure, 6 in total so far

Official Netgate® Hardware
32
264
38.8k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    SteveITS Galactic Empire @michmoor
    last edited by Feb 8, 2025, 10:23 PM

    @michmoor A RAM disk seems unlikely to help a low memory problem. You could limit writes as noted above though.

    Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
    When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
    Upvote 👍 helpful posts!

    1 Reply Last reply Reply Quote 1
    • A
      arri
      last edited by arri Feb 8, 2025, 10:30 PM Feb 8, 2025, 10:27 PM

      Fortunately all of my 4100's are only using a small portion of their RAM (especially since I can't really use any additional packages causing emmc wear now that I know better) whereas they exhausted 100% of their emmc writes in just a couple of years so the RAM disk is a helpful stopgap until I am able to physically access the devices again.

      Unfortunately, it looks like the reboot necessary to engage the ram disk probably took out another one today!

      1 Reply Last reply Reply Quote 1
      • W
        w0w @stephenw10
        last edited by Feb 9, 2025, 5:49 AM

        @stephenw10 said in Another Netgate with storage failure, 6 in total so far:

        My edge device here is a 3100

        This one does not use ZFS, is not it?
        And I also noticed that pfSense very often incorrectly displays the actual size of /tmp and /var.

        @SteveITS said in Another Netgate with storage failure, 6 in total so far:

        Recent versions of pfSense don’t allocate the RAM disk space until it’s used, so it’s more flexible.

        Yep, but for some reason (like a huge syslog file), I have been running out of space several times.

        1 Reply Last reply Reply Quote 0
        • W
          w0w
          last edited by Feb 9, 2025, 6:56 AM

          And I want to repeat once again: the problem is not whether the RAM disk is enabled, whether to enable it, or how to do it. The problem is that disk wear goes unnoticed by the user, and they only start paying attention when the device has already died or is in a critical "almost dead" state.

          So maybe, I don’t know, it's worth updating the documentation and, through some kind of newsletter, news post, or blog, recommending that users perform checks and follow the recommendations in the updated documentation?

          A A 2 Replies Last reply Feb 9, 2025, 7:29 PM Reply Quote 1
          • A
            arri @w0w
            last edited by arri Feb 9, 2025, 7:30 PM Feb 9, 2025, 7:29 PM

            @w0w said in Another Netgate with storage failure, 6 in total so far:

            or is in a critical "almost dead" state

            If only this were true, unfortunately there is no system in place for tracking the wear state that I'm aware of. The only warning is failure on a stock appliance. The only tools I'm aware of to check the state require proactive installation by the user from the command line.

            Since this appears to be a common problem, it's strange to me mmc-utils isn't included on at least the base appliances. I would have appreciated bars in the System Information dashboard showing the eMMC Life Time Estimations and Pre EOL states. Once in place, a selectable threshold value to trigger a notification would be nice too 😀

            1 Reply Last reply Reply Quote 2
            • A
              arri
              last edited by arri Feb 9, 2025, 7:50 PM Feb 9, 2025, 7:47 PM

              Working backward from having had an emmc failure which forced me to further research "Troubleshooting Disk Writes" of course it's obvioius in hindsight why my base model 4100's are dying.

              That article clearly warns against installing write heavy packages such as pfBlockerNG, Snort, Suricata, HAProxy, nmap, darkstat, other monitoring packages. It also says "the package list at Package List also notes when specific packages require or work better with an SSD or HDD." Recognizing the difference between eMMC, SSD and HDD is all well and good; however, warning a package will potentially harm eMMC might be more effective at discouraging idiots like me from buying base models in the first place and/or installing such packages innapropriately.

              Finally, if such a warning or the existing verbiage on the web based package list were additionally included in the actual package manager where most people will decide to install said packages it might be considerably more effective in preventing accelerated eMMC wear.

              A 1 Reply Last reply Feb 9, 2025, 8:32 PM Reply Quote 1
              • A
                andrew_cb @arri
                last edited by Feb 9, 2025, 8:32 PM

                @arri Sorry to hear that your 4100 died.

                I have already made the same suggestions as you. Just some warnings and links in a few places (like the package manager and log settings) would help users avoid getting into situations that can cause excess writing.

                Storage failures are a frequent occurrence and including emmc-utils was requested over 3 years ago. In all the new daily threads about storage failure, the user is at blamed, yet they are not provided with any tools for monitoring the storage.

                It is puzzling why emmc-utils has not been included the base install and why the SMART and EMMC monitoring are not running by default.

                M 1 Reply Last reply Feb 9, 2025, 8:34 PM Reply Quote 0
                • M
                  michmoor LAYER 8 Rebel Alliance @andrew_cb
                  last edited by Feb 9, 2025, 8:34 PM

                  @andrew_cb
                  It’s interesting how the thread went silent from the Netgate team. Maybe they’re still looking into it?

                  Firewall: NetGate,Palo Alto-VM,Juniper SRX
                  Routing: Juniper, Arista, Cisco
                  Switching: Juniper, Arista, Cisco
                  Wireless: Unifi, Aruba IAP
                  JNCIP,CCNP Enterprise

                  1 Reply Last reply Reply Quote 1
                  • A
                    andrew_cb
                    last edited by Feb 9, 2025, 8:43 PM

                    The emmc-utils package is only available in Plus... so users of CE have absolutely no way to monitor their eMMC health. Apparently, monitoring your eMMC health is a special privilege? Maybe a way of discouraging the use of CE?

                    https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html

                    This package is currently only available on pfSense® Plus software and does not have a GUI component. It must be run from an SSH or console shell prompt.

                    B 1 Reply Last reply Feb 9, 2025, 9:11 PM Reply Quote 0
                    • B
                      bmeeks @andrew_cb
                      last edited by bmeeks Feb 9, 2025, 9:14 PM Feb 9, 2025, 9:11 PM

                      @andrew_cb said in Another Netgate with storage failure, 6 in total so far:

                      The emmc-utils package is only available in Plus... so users of CE have absolutely no way to monitor their eMMC health. Apparently, monitoring your eMMC health is a special privilege? Maybe a way of discouraging the use of CE?

                      https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html

                      This package is currently only available on pfSense® Plus software and does not have a GUI component. It must be run from an SSH or console shell prompt.

                      Well, in Netgate's defense, I suspect the number of pfSense CE users running on eMMC is miniscule. Most whitebox hardware is most likely going to have either SSD or a spinning disk. I believe eMMC is much more prevalent in the Netgate appliances, and since anyone purchasing a Netgate appliance gets pfSense Plus it's more logical to include the utility there. Maybe I missed it, but I don't recall seeing a single post from a CE user that has experienced failed eMMC. It would be trivial to add the utility to the CE package repo, but I suspect it would not be widely used there.

                      A 1 Reply Last reply Feb 10, 2025, 12:25 AM Reply Quote 3
                      • A
                        andrew_cb
                        last edited by Feb 9, 2025, 9:17 PM

                        Some more recent threads about storage failure.
                        Overall, storage failures seem to be the most common on the 4100, possibly it is the most popular model?

                        https://www.reddit.com/r/PFSENSE/comments/1ilhit2/my_netgate_4100_is_defect/
                        https://www.reddit.com/r/PFSENSE/comments/1ikprzt/4100_disassembly/
                        https://www.reddit.com/r/PFSENSE/comments/1ie17xz/ideas_for_an_eol_4100/
                        https://forum.netgate.com/topic/196253/sg-1100-storage-health-questions

                        1 Reply Last reply Reply Quote 0
                        • A andrew_cb referenced this topic on Feb 9, 2025, 9:45 PM
                        • S
                          stephenw10 Netgate Administrator
                          last edited by Feb 9, 2025, 11:04 PM

                          Hmm, not sure why the pkg isn't in the CE repo. I guess there wasn't much call for it at the time. Seems like we could add that pretty easily. Let me see....

                          A 2 Replies Last reply Feb 10, 2025, 12:09 AM Reply Quote 2
                          • A
                            andrew_cb @stephenw10
                            last edited by Feb 10, 2025, 12:09 AM

                            @stephenw10 It would be great if you can get mmc-utils added to the CE repo!

                            1 Reply Last reply Reply Quote 0
                            • A
                              andrew_cb @w0w
                              last edited by Feb 10, 2025, 12:18 AM

                              @w0w I share your frustration. One minute their Netgate is working, then just dies. Then they try to reinstall pfSense and the installer says no disks were found...

                              Those are great suggestions on how to spread awareness. This issue has been brought up many times before but it never goes anywhere, so hopefully we can bring about some change and prevent this from happening to others.

                              1 Reply Last reply Reply Quote 1
                              • A
                                andrew_cb @bmeeks
                                last edited by Feb 10, 2025, 12:25 AM

                                @bmeeks It's possible that not many are using CE on a whitebox with eMMC, but I have seen threads about it on Reddit. I think Protectli, Firewalla, and Topton also use eMMC in some of their models, but I am not positive. Several models list 16 or 32GB storage, which is often eMMC.

                                1 Reply Last reply Reply Quote 1
                                • W
                                  w0w
                                  last edited by Feb 10, 2025, 8:50 AM

                                  I also want to mention the repair options. I'm not sure if it's possible to replace the eMMC chip with a larger one without modifying the BIOS, but I'm almost certain that you can replace it with the same model or a full equivalent.

                                  Of course, this depends on the country and the price charged for the work. Again, whether the technician is truly a professional or just incompetent remains a question... But this option definitely exists.

                                  1 Reply Last reply Reply Quote 0
                                  • J jared.silva referenced this topic on Feb 10, 2025, 11:47 AM
                                  • A
                                    andrew_cb
                                    last edited by andrew_cb Feb 11, 2025, 3:14 AM Feb 11, 2025, 3:13 AM

                                    A thread from 2022 has resurfaced and it is eerily similar to the discussion happening now in 2025:

                                    • The expected lifetime of 16 and 32GB eMMC storage at various average write rates.
                                    • The increased wear from running popular IDS and IPS packages.
                                    • Request for adding mmc-utils to the base pfSense image (including a Redmine).
                                    • Users already experiencing storage wearout.
                                    • Suggestions to use ramdisks and disable logging of default rules.
                                    • The effects of ZFS vs UFS on storage wear.
                                    • TRIM appears to be disabled.
                                    • Requests/suggestion to include storage considerations on the product pages.

                                    I cannot understand why Netgate did not investigate or take any action on these issues in 2022, 2023, or 2024.

                                    @dugeem checked 3 devices and noted:

                                    eMMC drives generally support TRIM, but in all cases it was disabled.

                                    @jwt said

                                    TRIM (or an equivalent such as DISCARD) are required by JEDEC standards as far back as 2010.

                                    So there seems to be a discrepancy in whether TRIM support is actually enabled and working or not.

                                    Further, the JEDEC eMMC v5.0 standard which enables eMMC health reporting is from 2013 and is supported by many Netgate devices, so it is confusing why it is not supported by the 4200 that was released in 2024.

                                    @Cabledude asked in 2024:

                                    Would the 128GB SSD benefit (have extended life) if RAM disk is used?

                                    @stephenw10 responded:

                                    Yes. But the write cycle life on any recent SSD is likely to outlive the usefulness of the device anyway. So I'd question the value in doing so.

                                    If a 128GB SSD "is likely to outlive the usefulness of the device", then what is the implication for the lifespan of 16GB eMMC storage?

                                    I am not sure what conclusion to draw other than beginning in 2022 Netgate knew or should have known that 16GB of eMMC storage was insufficient for running anything other than the most basic of configurations (and even then, it is necessary to disable most of the default logging and possibly use ramdisks).

                                    @keyser 's words from 2022 seems tragically prophetic:

                                    This is going to become a netgate scandal

                                    I think it officially has now.

                                    1 Reply Last reply Reply Quote 1
                                    • P
                                      punting_packets
                                      last edited by Feb 12, 2025, 2:51 PM

                                      I've been having some issues with my 6100 locking up and becoming unresponsive, reported the issue to Netgate TAC who didn't provide any useful feedback. Searching reddit for support and I read about the eMMC failures on 6100. Go and check mine;

                                      eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x0b
                                      eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b
                                      eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01

                                      Yikes! Got an Intel Optane 16Gb in there now after a lot of pain with the installer not working with my PPPoE internet service.

                                      I have to say this really seems like planned obsolescence on the part of Netgate, why sell a device with hardware which cannot support its operation beyond a couple of years of normal use.

                                      Why doesn't your TAC team identify this as an issue?
                                      Why are you trying to dissuade customers from implementing a fix?
                                      When are you going to compensate customers for the damages?

                                      S A 2 Replies Last reply Feb 12, 2025, 4:58 PM Reply Quote 1
                                      • S
                                        stephenw10 Netgate Administrator @punting_packets
                                        last edited by Feb 12, 2025, 4:58 PM

                                        @punting_packets said in Another Netgate with storage failure, 6 in total so far:

                                        my 6100 locking up and becoming unresponsive

                                        It's usually pretty obvious if the boot drive fails. Just becoming unresponsive but rebooting back to normal operation is not what I would expect to see. So you may not be seeing a failing driver there even though the estimated ware levels are high.
                                        Drive failures usually throw a lot of drive/controller errors. Even if the logging stops the console will be filled with errors. If you can, checking the console in the hung situation should confirm that.

                                        P A 2 Replies Last reply Feb 12, 2025, 5:08 PM Reply Quote 0
                                        • P
                                          punting_packets @stephenw10
                                          last edited by Feb 12, 2025, 5:08 PM

                                          @stephenw10 Thanks for the response, the 6100 simply stopped forwarding traffic but the console was still responsive. There was nothing in the logs other than a lot of failed PPPoE sessions and the only way to restore service was a reboot. I might be conflating the ware on the eMMC with other issues, only time will tell :-)

                                          1 Reply Last reply Reply Quote 1
                                          80 out of 264
                                          • First post
                                            80/264
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.