Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Another Netgate with storage failure, 6 in total so far

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    284 Posts 36 Posters 43.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      dstaylor @dane_h
      last edited by

      @dane_h I ordered and installed this one, up and running. Price about the same.

      https://www.amazon.com/dp/B08TTDQ5WH?ref=fed_asin_title&th=1

      dennypageD 1 Reply Last reply Reply Quote 3
      • A
        andrew_cb @dane_h
        last edited by andrew_cb

        @dane_h I have not used this drive, but it should work: https://www.amazon.com/KingSpec-256GB-Performance-Internal-Ultrabook

        edit: I checked and it's the same one @dstaylor linked lol

        1 Reply Last reply Reply Quote 0
        • dennypageD
          dennypage @dstaylor
          last edited by

          @dstaylor FWIW, I use the same one.

          1 Reply Last reply Reply Quote 0
          • L
            ltctech @andrew_cb
            last edited by

            @andrew_cb
            I am beyond frustrated with Netgate. The whole point of buying Netgate as opposed to using cheap Mini PCs and installing pfSense was to avoid these kind of surprises.

            This particular unit is installed at an office with no IT staff on the other side of the world. We might have to send them a new unit and getting it swapped out may be a challenge.

            We'll have to audit the other units that we have deployed (thankfully stateside) and see which ones are eMMC and which are SSD.

            1 Reply Last reply Reply Quote 0
            • C
              Cabledude
              last edited by

              @andrew_cb and @stephenw10 Some questions:
              #1 Is every eMMC equipped netgate prone to be affected? Or are there just a limited number of occurrences? #2 Does the eMMC production series have any influence or is it simply more writes = issue?
              A good friend of mine is running a 4100 base. He believes he’s fine regarding the eMMC issue because he doesn’t do much logging. I don’t believe he even checks his eMMC health periodically, he’s not concerned about it.

              Pete
              Home: SG-2100 + UniFi + Synology. SG-1100 retired
              Parents: SG-1100 + UniFi + Synology
              Testing: SG-1100 w/ 120GB SSD via ext USB (eMMC dead). Works great

              S 1 Reply Last reply Reply Quote 0
              • S
                SteveITS Galactic Empire @Cabledude
                last edited by

                @Cabledude said in Another Netgate with storage failure, 6 in total so far:

                doesn’t do much logging

                It's very relative hence the (my) list of mitigating settings above. The default deny rules log. Is pfSense behind an ISP router that blocks incoming? Suricata logs HTTP requests. Some people leave the dashboard open which logs every web request for each widget update. pfBlocker DNSBL logs DNS requests, and a few feeds like UT1 are gigantic.

                My 2100 at home is from October 2020 and it shows 10% used:
                eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
                eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
                eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01

                eMMC (as a technology) has less "disk writes per day" than SSD. It is also usually much smaller. So writing (completely making up a number here) 5 GB per day has way more impact on an 8 GB eMMC than a 128 GB SSD. Which, overall, is the point of this thread.

                Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                Upvote 👍 helpful posts!

                w0wW 1 Reply Last reply Reply Quote 3
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Yup, the larger the drive the less write each individual 'bit' sees for a fixed total drive writes. So larger drives are less affected.

                  1 Reply Last reply Reply Quote 0
                  • arriA
                    arri
                    last edited by

                    Wildly speculating around the 4100, it appears enough damage to the eMMC can brick the boxes too!
                    One of mine won't post now precluding my ability to install NVMe at all. The leds on the board indicate activity on one flash drive after the reset indicator flickers without any console output or getting past the orange circle of death. This even after pulling the cmos battery, NVMe etc.

                    1 Reply Last reply Reply Quote 1
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Of the confirmed eMMC failures we've seen most do not fail like that. In fact I don't think we've seen a single failure that presented like that in person. There was one user here on the forum who reported removing the eMMC chip and that that allowed it to bot from NMVe. So far unconfirmed though. So it could be some other failure.

                      arriA A 2 Replies Last reply Reply Quote 0
                      • arriA
                        arri @stephenw10
                        last edited by

                        @stephenw10 Figures I'd be a unicorn. To be fair, I suspect once a unit is deemed a brick, they probably seldom make it back to your bench from customers unless they're in the short window.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Indeed if it fails to POST entirely it's difficult to confirm any sort of cause.

                          arriA 1 Reply Last reply Reply Quote 0
                          • arriA
                            arri @stephenw10
                            last edited by

                            @stephenw10 But hey, thanks for the reminder this box is now into the "can't hurt to try" zone where I get to play with solder! Now to go find my magnifier and a certain flash chip ;)

                            1 Reply Last reply Reply Quote 0
                            • A
                              andrew_cb
                              last edited by andrew_cb

                              This morning, I was surprised to find that my threads in /r/pfsense and /r/netgate have been deleted. Fortunately, I still have screenshots of an interesting post from kphillips-netgate (@kphillips), in which he says:

                              ...I think it's important to err on the side of letting people discuss things without overbearing moderation unless it becomes necessary...

                              Interesting. Can you find out why both Reddit threads were deleted and who made that decision?

                              ...[I] process RMA support tickets for devices every day...
                              ...Beware of confirmation bias...

                              These are good points to keep in mind as they tie into his next statement:

                              I haven't seen any particularly unusual numbers of RMAs for any particularly product in our lineup.

                              What is specifically considered to be an RMA? Is this all claims for RMA, or only claims that have been accepted as warranty? What is the ratio of approved RMAs versus RMAs denied because they are past the 1-year warranty? It appears that a significant amount of posts are about devices that failed after the 1-year warranty, and many users have had their warranty claim rejected or did not bother contacting Netgate since they were out of warranty. This would significantly alter the number and ratio of RMA claims.

                              ...I'm not trying to admonish or belittle anybody here...
                              ...sometime's it's totally by accident and I'm not trying to "blame shift" here...

                              I am glad to hear that you personally are not trying to admonish, belittle, or blame shift, but...

                              ...You will only see people who ran into issues...

                              It seems that others do not share your view of giving the user the benefit of the doubt. In all the posts made by users who encountered storage failure, what we do see is Netgate consistently blaming the user for causing the storage failure, and never apologizing or showing empathy.

                              We have a page outlining many of the packages that need an SSD here (linked to the page).

                              I have repeatedly mentioned that this page is NOT linked anywhere. The only way you can be aware of its existence is by searching or following a link from someone. It is impossible to for a purchaser to be aware of a) storage wear caused by high writes, b) what packages and settings can cause high writes, and c) the decision criteria for choosing a MAX model. If anyone can show me a direct link to the "Supported pfSense Plus Packages" page on the Netgate website, I will be forever humbled.

                              I run a 6100 as my edge for work at Netgate with on eMMC (no NVME SSD installed) and it gets worked.....HARD. It's my only 6100 I have and I use it for new release building, bug testing, package testing, and much more. It has been in continuous operation for about 3 years with little to know downtime. [screenshots showing eMMC health at 0x05, 0x06, 0x01, and a manufacturing date of 06/2022]

                              These VERY interesting data points are provided to us by a Netgate employee. Remember what another Netgate employee @jwt said in post #34?

                              ...the principle difference between eMMC and NVMe or SSD device is the amount of flash present on a typical eMMC .vs SSD or NVMe drive...

                              Let us analyze these statements further.

                              jwt asserts that there is no significant difference between eMMC, SSD, or NVMe other than the total amount of flash. Okay, so then eMMC is not a technologically inferior storage medium.

                              Now, kphillips helpfully provides some evidence of how he has a 6100 with eMMC storage and he has worked it "HARD" for 3 years and yet the storage wear is only at 50-60%.

                              This is where things start getting confusing...

                              If the statements made by jwt is correct and the data supplied by kphillips is true and representative of the durability of Netgate devices with eMMC storage, then why are so many users experiencing eMMC failure in under 3 years? And why is there a presumption that the user is at fault for causing the failure? And why does Netgate never express concern about these "rare" incidents? If kphillips were to create a post about his 6100 dying under these conditions, I wonder what kind of response he would receive? Is kphillips trying to prove that the 16GB of onboard eMMC storage actually is fine when "worked HARD"?

                              On the other hand, if kphillips' data is simply anecdotal and does not represent what a user can expect from the onboard eMMC storage, then it is irrelevant to this discussion.

                              So, which is it?
                              Is eMMC endurance similar to SSD/NVMe and Netgate devices with eMMC should be durable enough to handle package usage as kphillips demonstrated and that the devices were simply equipped with faulty eMMC, in which case the users did nothing wrong, and it is simply tough luck because the failure occurred after the 1-year warranty expired;
                              --OR--
                              Is the 16GB of onboard eMMC storage a known weak point that needs to be clearly identified and the user provided with abundant warnings on the product page, on the package manager page, whenever installing a package, and on the settings page of each package?

                              1 Reply Last reply Reply Quote 0
                              • A
                                andrew_cb @stephenw10
                                last edited by

                                @stephenw10 There have been a few posts now on here and reddit about successfully reviving a dead Netgate by removing the eMMC. I have a 2-year old 4100 here that had the eMMC die and I got it working on a USB flash drive only for it to go completely dead after restoring the config.

                                I have not had time to dig into it further, but if it truly is dead then I will try removing the eMMC chips to see if that gets it revives it.

                                arriA 1 Reply Last reply Reply Quote 0
                                • A
                                  andrew_cb
                                  last edited by andrew_cb

                                  Here are screenshots of @kphillips post for reference:

                                  kphillips1.jpg kphillips2.jpg kphillips3.jpg kphillips4.jpg kphillips5.jpg

                                  A 1 Reply Last reply Reply Quote 0
                                  • A
                                    andrew_cb @andrew_cb
                                    last edited by

                                    @andrew_cb I reached out to kphillips-netgate last night on Reddit and suggested we have a call to discuss this situation further, clarify the issues, and hopefully identify solutions. He asked for clarification of what threads I was referring to, and I sent him links to this thread and two others.

                                    He has not gotten back to me yet, so I will update here when/if I receive any further responses from him.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Mmm, the problem with escalating things in this way is that it suppresses actual useful posts. It moves from a technical discussion to a marketing/legal matter where I (and others) can no longer comment. 😞

                                      A 1 Reply Last reply Reply Quote 3
                                      • arriA
                                        arri @andrew_cb
                                        last edited by arri

                                        @andrew_cb Holy crap, it worked for me! Yanked (read carefully removed using appropriate rework methodology) the Kingston eMMC out of my bricked 4100 that wouldn't post and lo and behold I've got a console back and have booted the USB installer!

                                        A stephenw10S 2 Replies Last reply Reply Quote 2
                                        • A
                                          andrew_cb
                                          last edited by andrew_cb

                                          I'll continue to monitor and report internally about any situations I see crop up that might be trends or pattern.

                                          Are all the posts about eMMC failure over the last few years, nor are the explicit requests/suggestions for improved messaging enough to indicate any trend or pattern with regard to eMMC failure? If the issue truly is misuse by the user, then why has nothing been done to better educate purchasers and users before they do things that could result in accelerated eMMC wear. Better education and messaging would likely eliminate or significantly reduce the frequency of eMMC failure.

                                          Similarly, @stephenw10 and others have posted hundreds of responses in which they advise users to reduce logging (including disabling the DEFAULT logging rules) and use ramdisks.

                                          Why have these common suggested changes not been incorporated into the default settings for pfSense or at least recommended (such as in the setup wizard)? Just what does Netgate actually consider to be a trend or pattern that needs to be actioned?

                                          Despite being incomplete and not linked anywhere, the "Supported pfSense Plus Packages" page seems to be a "gotcha" shield to deflect any and all failures onto the user.

                                          1 Reply Last reply Reply Quote 0
                                          • A
                                            andrew_cb @arri
                                            last edited by

                                            @arri Wow that is cool! I am glad to hear that it worked for you!
                                            I will report back when I get around to trying this on the dead 4100 I have here.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.