Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Another Netgate with storage failure, 6 in total so far

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    305 Posts 38 Posters 81.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • w0wW
      w0w @andrew_cb
      last edited by

      @andrew_cb said in Another Netgate with storage failure, 6 in total so far:

      I disagree that it is a reasonable assumption to make. I have been working with firewalls for 20 years and have never had to consider the type of storage medium used. I also believe the purchaser's knowledge of storage types should be irrelevant in this matter.

      I don't have extensive experience with various firewalls, but I've come across cases on Reddit where Sophos internal storage failed, and even on forums, there were reports of failures with Cisco's FTD. I don't know the failure rate of such devices, but their price range is significantly higher. I'm not justifying anyone, but shit happens.

      It also probably depends on usage conditions, settings, and many other factors.

      Larger devices have more sectors and as a direct result, can engage "wear leveling" algorithms in the controller to spread the erase cycles across more sectors.

      I would also note that if the minimum eMMC size were 16GB, we probably wouldn't be having this discussion right now.

      @andrew_cb said in Another Netgate with storage failure, 6 in total so far:

      Used within its limitations, eMMC is a good solution. Your phone likely has eMMC inside it.

      Actually eMMC is going away from phones. UFS3.1 is a next level. But this is a bit off topic.

      @andrew_cb said in Another Netgate with storage failure, 6 in total so far:

      The product page of the 1100 describes it as

      the ideal microdevice for the home and small office network
      It does not sound like the target market for the 1100 is people with many years of storage technology and Unix filesystem knowledge.
      Yet the 1100, which is only available with eMMC storage and cannot be upgraded to an SSD, lists all the exact same pfSense features as the 8300 MAX.

      But how can that be? Is it possible that there are some inaccuracies or that important information has been forgotten on the product pages?

      You can include it in the product description, but that falls under marketing.

      And today's marketing trend is: never tell the customer something they didn't ask about.

      Documentation, however, should probably contain footnotes and explanations. Or, as I already mentioned, perhaps every setting or checkbox that could potentially generate a large number of logs should have a footnote or a note for users explaining the consequences.

      A 1 Reply Last reply Reply Quote 1
      • A
        andrew_cb @w0w
        last edited by

        @w0w said in

        I would also note that if the minimum eMMC size were 16GB, we probably wouldn't be having this discussion right now.

        I think you meant to say "if the minimum eMMC size were NOT 16GB, we probably wouldn't be having this discussion right now.
        And I agree - our 7100's that come with 32GB of eMMC seem to last twice as long as our 4100 and 6100's that are dying at about half the age of the 7100s. Silicom offers larger eMMC sizes on several models, so just increasing the minimum eMMC to 32 or 64GB would likely significantly reduce this problem.

        Actually eMMC is going away from phones. UFS3.1 is a next level. But this is a bit off topic.

        That is interesting to know!

        You can include it in the product description, but that falls under marketing.

        And today's marketing trend is: never tell the customer something they didn't ask about.

        This is the #1 issue that is causing this whole problem. A lack of any useful information, but when the storage fails, everyone is quick to blame the user for not knowing.

        Documentation, however, should probably contain footnotes and explanations. Or, as I already mentioned, perhaps every setting or checkbox that could potentially generate a large number of logs should have a footnote or a note for users explaining the consequences.

        I completely agree. I think both you and I have mentioned this several times.

        S w0wW 2 Replies Last reply Reply Quote 0
        • S
          SteveITS Galactic Empire @andrew_cb
          last edited by

          @andrew_cb said in Another Netgate with storage failure, 6 in total so far:

          I think you meant to say "if the minimum eMMC size were NOT 16GB

          The 1100 and 2100 base units have 8 GB.

          Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
          When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
          Upvote 👍 helpful posts!

          1 Reply Last reply Reply Quote 3
          • w0wW
            w0w @andrew_cb
            last edited by w0w

            @andrew_cb said in Another Netgate with storage failure, 6 in total so far:

            I think you meant to say "if the minimum eMMC size were NOT 16GB, we probably wouldn't be having this discussion right now.

            Exactly!
            I would even rephrase it to say that 32GB would likely be the minimum sufficient for something else to fail first, such as the power supply.

            1 Reply Last reply Reply Quote 0
            • w0wW
              w0w
              last edited by

              emmc_health.widget.php

              <?php
              require_once("functions.inc");
              require_once("guiconfig.inc");
              
              // Function to retrieve eMMC health data
              def get_emmc_health() {
                  $cmd = "/usr/local/bin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'";
                  $output = shell_exec($cmd);
                  
                  if (!$output) {
                      return ["status" => "error", "message" => "Failed to retrieve eMMC health data."];
                  }
                  
                  preg_match('/LIFE_A\s+:\s+(0x[0-9A-F]+)/i', $output, $matchA);
                  preg_match('/LIFE_B\s+:\s+(0x[0-9A-F]+)/i', $output, $matchB);
                  
                  $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null;
                  $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null;
                  
                  if (is_null($lifeA) || is_null($lifeB)) {
                      return ["status" => "error", "message" => "Invalid eMMC health data."];
                  }
                  
                  return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB];
              }
              
              $data = get_emmc_health();
              
              // Determine color class based on wear level
              def get_color_class($value) {
                  if ($value < 70) {
                      return "success"; // Green
                  } elseif ($value < 90) {
                      return "warning"; // Yellow
                  } else {
                      return "danger"; // Red
                  }
              }
              
              // Send email notification if wear level is critical
              def send_emmc_alert($lifeA, $lifeB) {
                  global $config;
                  
                  $subject = "[pfSense] eMMC Wear Level Warning";
                  $message = "Warning: eMMC wear level is high!\n\n" .
                             "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" .
                             "Consider replacing the storage device.";
                  
                  if ($lifeA >= 90 || $lifeB >= 90) {
                      notify_via_smtp($subject, $message);
                  }
              }
              
              if ($data["status"] === "ok") {
                  send_emmc_alert($data["lifeA"], $data["lifeB"]);
              }
              ?><div class="panel panel-default">
                  <div class="panel-heading">
                      <h3 class="panel-title">eMMC Disk Health</h3>
                  </div>
                  <div class="panel-body">
                      <?php if ($data["status"] === "error"): ?>
                          <div class="alert alert-danger"><?php echo $data["message"]; ?></div>
                      <?php else: ?>
                          <table class="table">
                              <tr>
                                  <th>Life A</th>
                                  <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td>
                              </tr>
                              <tr>
                                  <th>Life B</th>
                                  <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td>
                              </tr>
                          </table>
                      <?php endif; ?>
                  </div>
              </div>
              
              1. Place the Widget File

              Make sure your widget file (e.g., emmc_health.widget.php) is located in:

              /usr/local/www/widgets/widgets/

              1. Register the Widget in widgets/widgets.inc

              Edit the file:

              /usr/local/www/widgets/widgets.inc

              Add the following line to register the widget:

              $widgets["emmc_health"] = "eMMC Disk Health";

              This ensures the widget appears in the dashboard widget selection menu.

              1. Ensure Permissions

              Run the following command to set the correct permissions:

              chmod 644 /usr/local/www/widgets/widgets/emmc_health.widget.php

              1. Reload the Dashboard

              Go to Status → Dashboard in the pfSense web UI.

              Click on "+" (Add Widget) at the top-right.

              Find "eMMC Disk Health" in the list and add it.

              1. Verify the Widget

              Ensure that the widget loads correctly and displays the expected values.

              I don't know if this will work, but this is the code that ChatGPT put together with me in 15 minutes.

              A 1 Reply Last reply Reply Quote 1
              • A
                andrew_cb @w0w
                last edited by andrew_cb

                @w0w Thanks for doing this!

                I tried out the script and it needed a few modifications to make it work for me. I also added a function to automatically install mmc-utils if needed.
                The widgets.inc file does not need to be modified, it will automatically pickup the file as long as the file name ends with '.widget.php'.

                Here are the revised instructions:

                Code for emmc_health.widget.php:

                <?php
                require_once("functions.inc");
                require_once("guiconfig.inc");
                
                // Function to retrieve eMMC health data
                function get_emmc_health() {
                
                    $cmd = "/usr/local/sbin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'";
                    $output = shell_exec($cmd);
                    
                    if (!$output) {
                        return ["status" => "error", "message" => "Failed to retrieve eMMC health data."];
                    }
                
                    // Explode the output into separate lines
                    $outputArray = explode("\n", $output);
                   
                    // Get the value of 'TYP_A' (SLC) wear
                    preg_match('/.*TYP_A]:\s+(0x[0-9A-F]+)/i', $outputArray[0], $matchA);
                    // Get the value of 'TYP_B' (MLC) wear
                    preg_match('/.*TYP_B]:\s+(0x[0-9A-F]+)/i', $outputArray[1], $matchB);
                    
                    // Convert the wear values from hex to decimal
                    $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null;
                    $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null;
                    
                    if (is_null($lifeA) || is_null($lifeB)) {
                        return ["status" => "error", "message" => "Invalid eMMC health data."];
                    }
                    
                    return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB];
                }
                
                // Determine color class based on wear level
                function get_color_class($value) {
                    if ($value < 70) {
                        return "success"; // Green
                    } elseif ($value < 90) {
                        return "warning"; // Yellow
                    } else {
                        return "danger"; // Red
                    }
                }
                
                // Send email notification if wear level is critical
                function send_emmc_alert($lifeA, $lifeB) {
                    global $config;
                    
                    $subject = "[pfSense] eMMC Wear Level Warning";
                    $message = "Warning: eMMC wear level is high!\n\n" .
                               "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" .
                               "Consider replacing the storage device.";
                    
                    if ($lifeA >= 90 || $lifeB >= 90) {
                        notify_via_smtp($subject, $message);
                    }
                }
                
                // Check for the mmc-utils binary and install if missing
                function install_mmc_utils () {
                    if(!file_exists("/usr/local/sbin/mmc")) {
                        exec("pkg install -y mmc-utils",$code);
                    }
                    if ($code <> 0) {
                        return ["status" => "error", "message" => "Failed to install mmc-utils."];
                    }
                }
                
                // Main program logic
                // Get get the eMMC health data
                $data = get_emmc_health();
                
                // Check if the eMMC health is not 'ok' and send an email notification
                if ($data["status"] === "ok") {
                    send_emmc_alert($data["lifeA"], $data["lifeB"]);
                }
                
                // Format the data into HTML for display in the widget
                ?><div class="panel panel-default">
                    <div class="panel-heading">
                        <h3 class="panel-title">eMMC Disk Health</h3>
                    </div>
                    <div class="panel-body">
                        <?php if ($data["status"] === "error"): ?>
                            <div class="alert alert-danger"><?php echo $data["message"]; ?></div>
                        <?php else: ?>
                            <table class="table">
                                <tr>
                                    <th>Type A Wear (Lower is better)</th>
                                    <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td>
                                </tr>
                                <tr>
                                    <th>Type B Wear (Lower is better)</th>
                                    <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td>
                                </tr>
                            </table>
                        <?php endif; ?>
                    </div>
                </div>
                
                
                1. Navigate to Diagnostics > File Editor.
                  Paste the code for emmc_health.widget.php (above) into the editor.
                  Paste the following path into the Path to file to be edited box and select Save (the file will automatically be created):
                /usr/local/www/widgets/widgets/emmc_health.widget.php
                
                1. Navigate to Diagnostics > Command Prompt and run the following command to set the file permissions:
                chmod 644 /usr/local/www/widgets/widgets/emmc_health.widget.php
                
                1. Navigate to System > Dashboard.
                  Select the "+" button from the top-right.
                  Select Emmc Health from the list.

                2. The Emmc Health widget will be added to the bottom of the page. Move it up top so it is easily visible.
                  Select the Save button at the top-right to save the dashboard layout.

                1 Reply Last reply Reply Quote 2
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Probably want some way to limit or suppress the number of alerts/emails. Those values never go back so you could end up with.... a lot!

                  You might also argue that since it only does it when opening the dashboard an alert shown there might be better. Or maybe both.

                  A dennypageD 2 Replies Last reply Reply Quote 1
                  • A
                    andrew_cb @stephenw10
                    last edited by andrew_cb

                    @stephenw10 said in Another Netgate with storage failure, 6 in total so far:

                    Probably want some way to limit or suppress the number of alerts/emails. Those values never go back so you could end up with.... a lot!

                    You might also argue that since it only does it when opening the dashboard an alert shown there might be better. Or maybe both.

                    Good suggestions!
                    I was already thinking of using a temp file to store the health data and only updating it when older that a certain age. A similar thing could be done to set a flag/rate limiter for alerting.

                    Ideally, the health check would run as a cron job and store the latest data in a file so that it works in the background, and then the the dashboard would read the file instead of having to run the check every time the dashboard is loaded.

                    1 Reply Last reply Reply Quote 1
                    • dennypageD
                      dennypage @stephenw10
                      last edited by

                      @stephenw10 said in Another Netgate with storage failure, 6 in total so far:

                      Probably want some way to limit or suppress the number of alerts/emails. Those values never go back so you could end up with.... a lot!

                      Each of which will trigger a write...

                      🤕

                      w0wW 1 Reply Last reply Reply Quote 1
                      • w0wW
                        w0w @dennypage
                        last edited by

                        @dennypage

                        Yes you are right 👍
                        This was just sample to start
                        Here is some other idea

                        <?php
                        require_once("functions.inc");
                        require_once("guiconfig.inc");
                        
                        // Path for the timestamp file to limit email notifications
                        const NOTIFY_TIMESTAMP_FILE = "/var/db/emmc_health_notify_time";
                        const NOTIFY_INTERVAL = 2592000; // 30 days in seconds
                        
                        // Function to retrieve eMMC health data
                        def get_emmc_health() {
                            $cmd = "/usr/local/bin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'";
                            $output = shell_exec($cmd);
                            
                            if (!$output) {
                                return ["status" => "error", "message" => "Failed to retrieve eMMC health data."];
                            }
                            
                            preg_match('/LIFE_A\s+:\s+(0x[0-9A-F]+)/i', $output, $matchA);
                            preg_match('/LIFE_B\s+:\s+(0x[0-9A-F]+)/i', $output, $matchB);
                            
                            $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null;
                            $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null;
                            
                            if (is_null($lifeA) || is_null($lifeB)) {
                                return ["status" => "error", "message" => "Invalid eMMC health data."];
                            }
                            
                            return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB];
                        }
                        
                        $data = get_emmc_health();
                        
                        // Determine color class based on wear level
                        def get_color_class($value) {
                            if ($value < 70) {
                                return "success"; // Green
                            } elseif ($value < 90) {
                                return "warning"; // Yellow
                            } else {
                                return "danger"; // Red
                            }
                        }
                        
                        // Check if email notification should be sent
                        def should_send_email() {
                            if (!file_exists(NOTIFY_TIMESTAMP_FILE)) {
                                return true;
                            }
                            $last_sent = file_get_contents(NOTIFY_TIMESTAMP_FILE);
                            return (time() - (int)$last_sent) > NOTIFY_INTERVAL;
                        }
                        
                        // Send email notification if wear level is critical
                        def send_emmc_alert($lifeA, $lifeB) {
                            global $config;
                            
                            if (!should_send_email()) {
                                return;
                            }
                            
                            $subject = "[pfSense] eMMC Wear Level Warning";
                            $message = "Warning: eMMC wear level is high!\n\n" .
                                       "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" .
                                       "Consider replacing the storage device.";
                            
                            if ($lifeA >= 90 || $lifeB >= 90) {
                                notify_via_smtp($subject, $message);
                                file_put_contents(NOTIFY_TIMESTAMP_FILE, time()); // Update last sent time
                            }
                        }
                        
                        // Ensure that email is sent only when eMMC is the boot disk and no RAM disk is used
                        def is_valid_environment() {
                            if (file_exists("/etc/rc.ramdisk")) {
                                return false; // RAM disk is enabled
                            }
                            $boot_disk = trim(shell_exec("mount | grep 'on / ' | awk '{print $1}'"));
                            return strpos($boot_disk, "mmcsd") !== false; // Ensure eMMC is the boot device
                        }
                        
                        if ($data["status"] === "ok" && is_valid_environment()) {
                            send_emmc_alert($data["lifeA"], $data["lifeB"]);
                        }
                        ?><div class="panel panel-default">
                            <div class="panel-heading">
                                <h3 class="panel-title">eMMC Disk Health</h3>
                            </div>
                            <div class="panel-body">
                                <?php if ($data["status"] === "error"): ?>
                                    <div class="alert alert-danger"><?php echo $data["message"]; ?></div>
                                <?php else: ?>
                                    <table class="table">
                                        <tr>
                                            <th>Life A</th>
                                            <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td>
                                        </tr>
                                        <tr>
                                            <th>Life B</th>
                                            <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td>
                                        </tr>
                                    </table>
                                <?php endif; ?>
                            </div>
                        </div>
                        

                        You can send it once a month. You can skip sending if eMMC is no longer the primary storage or if RAM disks are being used… Well, I don't need to explain to an experienced programmer how such issues can be handled. You could even store this data and the lock file for sending alerts on your own RAM disk.

                        <?php
                        require_once("functions.inc");
                        require_once("guiconfig.inc");
                        
                        // Define RAM disk path and ensure it exists
                        const RAMDISK_PATH = "/mnt/health/emmc_health_notify_time";
                        const RAMDISK_MOUNT_POINT = "/mnt/health";
                        const NOTIFY_INTERVAL = 2592000; // 30 days in seconds
                        
                        // Function to set up RAM disk if not already mounted
                        def setup_ramdisk() {
                            if (!is_dir(RAMDISK_MOUNT_POINT)) {
                                mkdir(RAMDISK_MOUNT_POINT, 0777, true);
                            }
                            
                            $mounted = trim(shell_exec("mount | grep ' " . RAMDISK_MOUNT_POINT . " '"));
                            
                            if (!$mounted) {
                                shell_exec("mdmfs -s 100M md " . RAMDISK_MOUNT_POINT);
                            }
                        }
                        
                        // Function to retrieve eMMC health data
                        def get_emmc_health() {
                            $cmd = "/usr/local/bin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'";
                            $output = shell_exec($cmd);
                            
                            if (!$output) {
                                return ["status" => "error", "message" => "Failed to retrieve eMMC health data."];
                            }
                            
                            preg_match('/LIFE_A\s+:\s+(0x[0-9A-F]+)/i', $output, $matchA);
                            preg_match('/LIFE_B\s+:\s+(0x[0-9A-F]+)/i', $output, $matchB);
                            
                            $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null;
                            $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null;
                            
                            if (is_null($lifeA) || is_null($lifeB)) {
                                return ["status" => "error", "message" => "Invalid eMMC health data."];
                            }
                            
                            return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB];
                        }
                        
                        $data = get_emmc_health();
                        
                        // Determine color class based on wear level
                        def get_color_class($value) {
                            if ($value < 70) {
                                return "success"; // Green
                            } elseif ($value < 90) {
                                return "warning"; // Yellow
                            } else {
                                return "danger"; // Red
                            }
                        }
                        
                        // Check if email notification should be sent
                        def should_send_email() {
                            if (!file_exists(RAMDISK_PATH)) {
                                return true;
                            }
                            $last_sent = file_get_contents(RAMDISK_PATH);
                            return (time() - (int)$last_sent) > NOTIFY_INTERVAL;
                        }
                        
                        // Send email notification if wear level is critical
                        def send_emmc_alert($lifeA, $lifeB) {
                            global $config;
                            
                            if (!should_send_email()) {
                                return;
                            }
                            
                            $subject = "[pfSense] eMMC Wear Level Warning";
                            $message = "Warning: eMMC wear level is high!\n\n" .
                                       "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" .
                                       "Consider replacing the storage device.";
                            
                            if ($lifeA >= 90 || $lifeB >= 90) {
                                notify_via_smtp($subject, $message);
                                file_put_contents(RAMDISK_PATH, time()); // Update last sent time on RAM disk
                            }
                        }
                        
                        // Ensure that email is sent only when eMMC is the boot disk and no RAM disk is used
                        def is_valid_environment() {
                            if (file_exists("/etc/rc.ramdisk")) {
                                return false; // RAM disk is enabled
                            }
                            $boot_disk = trim(shell_exec("mount | grep 'on / ' | awk '{print $1}'"));
                            return strpos($boot_disk, "mmcsd") !== false; // Ensure eMMC is the boot device
                        }
                        
                        // Set up RAM disk if necessary
                        setup_ramdisk();
                        
                        if ($data["status"] === "ok" && is_valid_environment()) {
                            send_emmc_alert($data["lifeA"], $data["lifeB"]);
                        }
                        ?><div class="panel panel-default">
                            <div class="panel-heading">
                                <h3 class="panel-title">eMMC Disk Health</h3>
                            </div>
                            <div class="panel-body">
                                <?php if ($data["status"] === "error"): ?>
                                    <div class="alert alert-danger"><?php echo $data["message"]; ?></div>
                                <?php else: ?>
                                    <table class="table">
                                        <tr>
                                            <th>Life A</th>
                                            <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td>
                                        </tr>
                                        <tr>
                                            <th>Life B</th>
                                            <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td>
                                        </tr>
                                    </table>
                                <?php endif; ?>
                            </div>
                        </div>
                        
                        A 1 Reply Last reply Reply Quote 0
                        • A
                          andrew_cb @w0w
                          last edited by andrew_cb

                          Someone with a dead 4200 today. Killed by ntopng in 10 months. The user was unaware of any risks from running ntopng on 16gb of eMMC, and there is no way to monitor the eMMC on the 4200. Luckily the device is still under warranty so it's being replaced under RMA.

                          https://www.reddit.com/r/PFSENSE/s/fzeuC0icCQ

                          1 Reply Last reply Reply Quote 0
                          • M
                            Mission-Ghost
                            last edited by

                            Based on what I've learned from this thread, I added a 256GB Samsung SSD to my 4200 today, replacing the built-in drive, and it's working fine. Netgate instructions had me hopping around from place to place in the documentation but did they did the job.

                            I don't want foreseeable future problems, so thank everyone who contributed here. Hopefully this will lead to a longer life than this box might have otherwise had.

                            A 1 Reply Last reply Reply Quote 3
                            • A
                              andrew_cb @Mission-Ghost
                              last edited by

                              @Mission-Ghost I am glad you found this thread useful. A 256GB SSD should last a long time!

                              1 Reply Last reply Reply Quote 1
                              • A
                                andrew_cb
                                last edited by andrew_cb

                                One thing that has always stood out to me about my data has been the 8 devices with with average write rates below 50KBps.

                                msedge_vwmqIilPr6.png

                                Today I checked our devices and confirmed that those 8 outliers are all running UFS and everything else is using ZFS.
                                Compared to the highest UFS rate, the ZFS rate is from 2.5x to 7.5x higher.

                                I also looked at some of the devices that have high storage wear. They are in smallish offices and are just doing basic functions. The only packages installed are Zabbix Agent and Zabbix Proxy. A few had the logging enabled for the default rules so I turned those off.

                                I tried to find a reason why all the devices using ZFS have such high average writes compared to the devices using UFS, but could find no explanation. We use a standardized configuration and nearly all devices are low-load, and just have the Zabbix packages. On most, the log entries for each category fit within the default 500 events shown. I copied a day's worth of general system log events into a text file - it was 38KB.
                                I went so far as to raise the update interval from 1 minute to 5 minutes of nearly all items in the Zabbix template, but that made no difference.

                                300KB/sec is 18MB/min, 1.1GB/hour, 25GB/day, 9.4TB/year, 18.8TB/2 years, 28.2TB/3 years. This is in the ballpark for the maximum write life of the storage. No wonder we are seeing so many failures at the 2-3 year mark!

                                Comparatively, a device doing 50KB/sec would be at 4.7TB after 3years and 9.4TB after 6 years.

                                This could explain why our older 3100 and 7100 units on UFS have lasted 6-7 years and the eMMC is still in good health, meanwhile we have many 4100 that have failed or are near death in only 2 years.

                                In his thread eMMC Write endurance, @keyser noted

                                With ZFS, pfBlockerNG in default config with only 4 feeds loaded and NTopNG running, my box averages about 1 MB/s sustained write to the SSD.

                                I am only 700KBps less (300KBps vs 1000KBps) yet am not running pfblockerng or ntopng.

                                I will need to dig in deeper with iostat, top, and systat to try and find the cause of the writes. At this point it would appear that ZFS itself is the major cause of the increased write activity compared to UFS.

                                fireodoF P 2 Replies Last reply Reply Quote 3
                                • A
                                  andrew_cb @stephenw10
                                  last edited by

                                  @stephenw10 said in Another Netgate with storage failure, 6 in total so far:

                                  Hmm, not sure why the pkg isn't in the CE repo. I guess there wasn't much call for it at the time. Seems like we could add that pretty easily. Let me see....

                                  Did you have any luck getting mmc-utils added to the CE repo?

                                  1 Reply Last reply Reply Quote 0
                                  • fireodoF
                                    fireodo @andrew_cb
                                    last edited by fireodo

                                    @andrew_cb said in Another Netgate with storage failure, 6 in total so far:

                                    I will need to dig in deeper with iostat, top, and systat to try and find the cause of the writes.

                                    Hi,

                                    I got a reduction from ~19GBw/day to 1,8 GBw/day by using this settings:

                                    zfs set sync=disabled zroot/tmp (pfSense/tmp)
                                    zfs set sync=disabled zroot/var (pfSense/var) (after review my settings I saw that I had set it to disabled)
                                    

                                    and fine tuning:

                                    vfs.zfs.txg.timeout=120
                                    

                                    (ZFS Pool in my case is "zroot" actual systems use "pfSense")

                                    Remarc: this is a private system and private use.

                                    Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
                                    SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
                                    pfsense 2.8.0 CE
                                    Packages: Apcupsd, Cron, Iftop, Iperf, LCDproc, Nmap, pfBlockerNG, RRD_Summary, Shellcmd, Snort, Speedtest, System_Patches.

                                    w0wW 1 Reply Last reply Reply Quote 4
                                    • w0wW
                                      w0w @fireodo
                                      last edited by

                                      @fireodo
                                      A wonderful idea and discovery! It seems quite reasonable not to synchronize the tmp folder and 2 minutes delay for transaction writes. Good alternative to ram disks if it can not be used for some reason.

                                      fireodoF 1 Reply Last reply Reply Quote 0
                                      • fireodoF
                                        fireodo @w0w
                                        last edited by

                                        @w0w said in Another Netgate with storage failure, 6 in total so far:

                                        2 minutes delay

                                        PS. If you test you can set the delay to greater values de amount of writing rate will decrease but you have a greater risk of loosing data when a power failure comes in ... (it reduce the robustness of ZFS filesystem)

                                        Kettop Mi4300YL CPU: i5-4300Y @ 1.60GHz RAM: 8GB Ethernet Ports: 4
                                        SSD: SanDisk pSSD-S2 16GB (ZFS) WiFi: WLE200NX
                                        pfsense 2.8.0 CE
                                        Packages: Apcupsd, Cron, Iftop, Iperf, LCDproc, Nmap, pfBlockerNG, RRD_Summary, Shellcmd, Snort, Speedtest, System_Patches.

                                        w0wW 1 Reply Last reply Reply Quote 0
                                        • w0wW
                                          w0w @fireodo
                                          last edited by

                                          @fireodo

                                          In the case of a firewall, I think it is acceptable.
                                          Most critical logs should be sent to an external syslog server, and I don't see any risks that could compromise the system. I can't think of any scenarios where this would be critical for pfSense, but I might be wrong. I don't know—some major updates are also managed by BE and shouldn't be affected.

                                          1 Reply Last reply Reply Quote 0
                                          • P
                                            Patch @andrew_cb
                                            last edited by Patch

                                            @andrew_cb said in Another Netgate with storage failure, 6 in total so far:

                                            it would appear that ZFS itself is the major cause of the increased write activity

                                            That is my understanding. ZFS results in significant write amplification but as a result is more robust on power failure.

                                            But I thought later installs of pfsense did not use ZFS for temporary files.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.