Another Netgate with storage failure, 6 in total so far
-
@w0w
I appreciate your input. My comments below are not targeted at your specifically, but believe they are helpful for illustrating why disagree with any "should have known" arguments.it would be reasonable to assume that people buying it have some understanding of what they're purchasing. However, it seems that the topic of storage has somehow passed by a significant portion of users.
I disagree that it is a reasonable assumption to make. I have been working with firewalls for 20 years and have never had to consider the type of storage medium used. I also believe the purchaser's knowledge of storage types should be irrelevant in this matter.
Looking at the product page for the 6100, the two choices are as follows:
BASE 8GB Memory 16GB Storage
MAX 8GB Memory 128GB Storage
Further down, the storage options are clarified:
Storage: 16 GB eMMC (or optional 128 GB NVMe M.2 SSD)
and
Storage 16 GB eMMC (onboard - soldered) upgradeable to 128 GB NVMe M.2 SSD with 6100 Max
That is all the store page says with regard to storage.
The rest of the page is filled with performance ratings and all the great things that pfSense can do when using various packages.
Not including the header and footer, there are 1333 words on the page.
411 words, or 40%, are about all the capabilities and benefits of pfSense A mere 32 words, or 2%, are in the sentences related to storage.There is absolutely nothing on the page that
- Indicates that there are any differences between eMMC and regular SSD storage
- Indicates that some features/packages require an SSD and are not recommended for use with eMMC storage
- Gives endurance ratings for the eMMC and SSD storage to highlight the difference between them.
- Provide the purchaser with additional information to help inform and guide their purchasing decision.
Would you agree that if the choice of which type of storage to get is so critical, it should be significantly more prominent on the page?
We're talking about a complex network device
A major reason for purchasing a pre-built firewall from a vendor is to avoid the hassle and deep knowledge involved with building a custom device. Firewalls are a commodity item nowadays, and other firewall vendors can do IDS and IPS for years without storage failures. I have seen many 10+ year old Sonicwall and Sophos firewalls do this without any issues.
If we revisit jwt's statements regarding storage media:
- The principle difference between eMMC and NVMe or SSD device is the amount of flash present on a typical eMMC .vs SSD or NVMe drive.
- Larger devices have more sectors and as a direct result, can engage "wear leveling" algorithms in the controller to spread the erase cycles across more sectors.
- Larger devices also cost more, due largely to market dynamics.
- Used within its limitations, eMMC is a good solution. Your phone likely has eMMC inside it. Many network devices, even from companies such as Cisco and HP/Juniper have eMMC inside them for storage.
- our [high] level of effort and engagement with Silicom
Which we can reduce down to:
- No major difference between eMMC and NVMe storage other than capacity
- Larger storage devices can wear-level better
- Larger storage devices cost more
- Netgate works closely with Silicom on the hardware that is used in their devices
Taking the above into consideration, in the absence of any stated warnings, cautions, limitations, recommendations, or disclaimers, a purchaser should be able to trust that what the vendor is offering is capable of performing the advertised functions.
Why should a purchaser or user be concerned about the difference when Netgate themselves is arguing that eMMC storage is just as good as NVMe storage and makes no effort to distinguish the two other than capacity?
The product page of the 1100 describes it as
the ideal microdevice for the home and small office network
It does not sound like the target market for the 1100 is people with many years of storage technology and Unix filesystem knowledge.
Yet the 1100, which is only available with eMMC storage and cannot be upgraded to an SSD, lists all the exact same pfSense features as the 8300 MAX.But how can that be? Is it possible that there are some inaccuracies or that important information has been forgotten on the product pages?
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
I disagree that it is a reasonable assumption to make. I have been working with firewalls for 20 years and have never had to consider the type of storage medium used. I also believe the purchaser's knowledge of storage types should be irrelevant in this matter.
I don't have extensive experience with various firewalls, but I've come across cases on Reddit where Sophos internal storage failed, and even on forums, there were reports of failures with Cisco's FTD. I don't know the failure rate of such devices, but their price range is significantly higher. I'm not justifying anyone, but shit happens.
It also probably depends on usage conditions, settings, and many other factors.
Larger devices have more sectors and as a direct result, can engage "wear leveling" algorithms in the controller to spread the erase cycles across more sectors.
I would also note that if the minimum eMMC size were 16GB, we probably wouldn't be having this discussion right now.
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
Used within its limitations, eMMC is a good solution. Your phone likely has eMMC inside it.
Actually eMMC is going away from phones. UFS3.1 is a next level. But this is a bit off topic.
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
The product page of the 1100 describes it as
the ideal microdevice for the home and small office network
It does not sound like the target market for the 1100 is people with many years of storage technology and Unix filesystem knowledge.
Yet the 1100, which is only available with eMMC storage and cannot be upgraded to an SSD, lists all the exact same pfSense features as the 8300 MAX.But how can that be? Is it possible that there are some inaccuracies or that important information has been forgotten on the product pages?
You can include it in the product description, but that falls under marketing.
And today's marketing trend is: never tell the customer something they didn't ask about.
Documentation, however, should probably contain footnotes and explanations. Or, as I already mentioned, perhaps every setting or checkbox that could potentially generate a large number of logs should have a footnote or a note for users explaining the consequences.
-
@w0w said in
I would also note that if the minimum eMMC size were 16GB, we probably wouldn't be having this discussion right now.
I think you meant to say "if the minimum eMMC size were NOT 16GB, we probably wouldn't be having this discussion right now.
And I agree - our 7100's that come with 32GB of eMMC seem to last twice as long as our 4100 and 6100's that are dying at about half the age of the 7100s. Silicom offers larger eMMC sizes on several models, so just increasing the minimum eMMC to 32 or 64GB would likely significantly reduce this problem.Actually eMMC is going away from phones. UFS3.1 is a next level. But this is a bit off topic.
That is interesting to know!
You can include it in the product description, but that falls under marketing.
And today's marketing trend is: never tell the customer something they didn't ask about.
This is the #1 issue that is causing this whole problem. A lack of any useful information, but when the storage fails, everyone is quick to blame the user for not knowing.
Documentation, however, should probably contain footnotes and explanations. Or, as I already mentioned, perhaps every setting or checkbox that could potentially generate a large number of logs should have a footnote or a note for users explaining the consequences.
I completely agree. I think both you and I have mentioned this several times.
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
I think you meant to say "if the minimum eMMC size were NOT 16GB
The 1100 and 2100 base units have 8 GB.
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
I think you meant to say "if the minimum eMMC size were NOT 16GB, we probably wouldn't be having this discussion right now.
Exactly!
I would even rephrase it to say that 32GB would likely be the minimum sufficient for something else to fail first, such as the power supply. -
emmc_health.widget.php
<?php require_once("functions.inc"); require_once("guiconfig.inc"); // Function to retrieve eMMC health data def get_emmc_health() { $cmd = "/usr/local/bin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'"; $output = shell_exec($cmd); if (!$output) { return ["status" => "error", "message" => "Failed to retrieve eMMC health data."]; } preg_match('/LIFE_A\s+:\s+(0x[0-9A-F]+)/i', $output, $matchA); preg_match('/LIFE_B\s+:\s+(0x[0-9A-F]+)/i', $output, $matchB); $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null; $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null; if (is_null($lifeA) || is_null($lifeB)) { return ["status" => "error", "message" => "Invalid eMMC health data."]; } return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB]; } $data = get_emmc_health(); // Determine color class based on wear level def get_color_class($value) { if ($value < 70) { return "success"; // Green } elseif ($value < 90) { return "warning"; // Yellow } else { return "danger"; // Red } } // Send email notification if wear level is critical def send_emmc_alert($lifeA, $lifeB) { global $config; $subject = "[pfSense] eMMC Wear Level Warning"; $message = "Warning: eMMC wear level is high!\n\n" . "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" . "Consider replacing the storage device."; if ($lifeA >= 90 || $lifeB >= 90) { notify_via_smtp($subject, $message); } } if ($data["status"] === "ok") { send_emmc_alert($data["lifeA"], $data["lifeB"]); } ?><div class="panel panel-default"> <div class="panel-heading"> <h3 class="panel-title">eMMC Disk Health</h3> </div> <div class="panel-body"> <?php if ($data["status"] === "error"): ?> <div class="alert alert-danger"><?php echo $data["message"]; ?></div> <?php else: ?> <table class="table"> <tr> <th>Life A</th> <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td> </tr> <tr> <th>Life B</th> <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td> </tr> </table> <?php endif; ?> </div> </div>
- Place the Widget File
Make sure your widget file (e.g., emmc_health.widget.php) is located in:
/usr/local/www/widgets/widgets/
- Register the Widget in widgets/widgets.inc
Edit the file:
/usr/local/www/widgets/widgets.inc
Add the following line to register the widget:
$widgets["emmc_health"] = "eMMC Disk Health";
This ensures the widget appears in the dashboard widget selection menu.
- Ensure Permissions
Run the following command to set the correct permissions:
chmod 644 /usr/local/www/widgets/widgets/emmc_health.widget.php
- Reload the Dashboard
Go to Status → Dashboard in the pfSense web UI.
Click on "+" (Add Widget) at the top-right.
Find "eMMC Disk Health" in the list and add it.
- Verify the Widget
Ensure that the widget loads correctly and displays the expected values.
I don't know if this will work, but this is the code that ChatGPT put together with me in 15 minutes.
-
@w0w Thanks for doing this!
I tried out the script and it needed a few modifications to make it work for me. I also added a function to automatically install mmc-utils if needed.
The widgets.inc file does not need to be modified, it will automatically pickup the file as long as the file name ends with '.widget.php'.Here are the revised instructions:
Code for emmc_health.widget.php:
<?php require_once("functions.inc"); require_once("guiconfig.inc"); // Function to retrieve eMMC health data function get_emmc_health() { $cmd = "/usr/local/sbin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'"; $output = shell_exec($cmd); if (!$output) { return ["status" => "error", "message" => "Failed to retrieve eMMC health data."]; } // Explode the output into separate lines $outputArray = explode("\n", $output); // Get the value of 'TYP_A' (SLC) wear preg_match('/.*TYP_A]:\s+(0x[0-9A-F]+)/i', $outputArray[0], $matchA); // Get the value of 'TYP_B' (MLC) wear preg_match('/.*TYP_B]:\s+(0x[0-9A-F]+)/i', $outputArray[1], $matchB); // Convert the wear values from hex to decimal $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null; $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null; if (is_null($lifeA) || is_null($lifeB)) { return ["status" => "error", "message" => "Invalid eMMC health data."]; } return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB]; } // Determine color class based on wear level function get_color_class($value) { if ($value < 70) { return "success"; // Green } elseif ($value < 90) { return "warning"; // Yellow } else { return "danger"; // Red } } // Send email notification if wear level is critical function send_emmc_alert($lifeA, $lifeB) { global $config; $subject = "[pfSense] eMMC Wear Level Warning"; $message = "Warning: eMMC wear level is high!\n\n" . "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" . "Consider replacing the storage device."; if ($lifeA >= 90 || $lifeB >= 90) { notify_via_smtp($subject, $message); } } // Check for the mmc-utils binary and install if missing function install_mmc_utils () { if(!file_exists("/usr/local/sbin/mmc")) { exec("pkg install -y mmc-utils",$code); } if ($code <> 0) { return ["status" => "error", "message" => "Failed to install mmc-utils."]; } } // Main program logic // Get get the eMMC health data $data = get_emmc_health(); // Check if the eMMC health is not 'ok' and send an email notification if ($data["status"] === "ok") { send_emmc_alert($data["lifeA"], $data["lifeB"]); } // Format the data into HTML for display in the widget ?><div class="panel panel-default"> <div class="panel-heading"> <h3 class="panel-title">eMMC Disk Health</h3> </div> <div class="panel-body"> <?php if ($data["status"] === "error"): ?> <div class="alert alert-danger"><?php echo $data["message"]; ?></div> <?php else: ?> <table class="table"> <tr> <th>Type A Wear (Lower is better)</th> <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td> </tr> <tr> <th>Type B Wear (Lower is better)</th> <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td> </tr> </table> <?php endif; ?> </div> </div>
- Navigate to Diagnostics > File Editor.
Paste the code for emmc_health.widget.php (above) into the editor.
Paste the following path into the Path to file to be edited box and select Save (the file will automatically be created):
/usr/local/www/widgets/widgets/emmc_health.widget.php
- Navigate to Diagnostics > Command Prompt and run the following command to set the file permissions:
chmod 644 /usr/local/www/widgets/widgets/emmc_health.widget.php
-
Navigate to System > Dashboard.
Select the "+" button from the top-right.
Select Emmc Health from the list. -
The Emmc Health widget will be added to the bottom of the page. Move it up top so it is easily visible.
Select the Save button at the top-right to save the dashboard layout.
- Navigate to Diagnostics > File Editor.
-
Probably want some way to limit or suppress the number of alerts/emails. Those values never go back so you could end up with.... a lot!
You might also argue that since it only does it when opening the dashboard an alert shown there might be better. Or maybe both.
-
@stephenw10 said in Another Netgate with storage failure, 6 in total so far:
Probably want some way to limit or suppress the number of alerts/emails. Those values never go back so you could end up with.... a lot!
You might also argue that since it only does it when opening the dashboard an alert shown there might be better. Or maybe both.
Good suggestions!
I was already thinking of using a temp file to store the health data and only updating it when older that a certain age. A similar thing could be done to set a flag/rate limiter for alerting.Ideally, the health check would run as a cron job and store the latest data in a file so that it works in the background, and then the the dashboard would read the file instead of having to run the check every time the dashboard is loaded.
-
@stephenw10 said in Another Netgate with storage failure, 6 in total so far:
Probably want some way to limit or suppress the number of alerts/emails. Those values never go back so you could end up with.... a lot!
Each of which will trigger a write...
-
Yes you are right
This was just sample to start
Here is some other idea<?php require_once("functions.inc"); require_once("guiconfig.inc"); // Path for the timestamp file to limit email notifications const NOTIFY_TIMESTAMP_FILE = "/var/db/emmc_health_notify_time"; const NOTIFY_INTERVAL = 2592000; // 30 days in seconds // Function to retrieve eMMC health data def get_emmc_health() { $cmd = "/usr/local/bin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'"; $output = shell_exec($cmd); if (!$output) { return ["status" => "error", "message" => "Failed to retrieve eMMC health data."]; } preg_match('/LIFE_A\s+:\s+(0x[0-9A-F]+)/i', $output, $matchA); preg_match('/LIFE_B\s+:\s+(0x[0-9A-F]+)/i', $output, $matchB); $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null; $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null; if (is_null($lifeA) || is_null($lifeB)) { return ["status" => "error", "message" => "Invalid eMMC health data."]; } return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB]; } $data = get_emmc_health(); // Determine color class based on wear level def get_color_class($value) { if ($value < 70) { return "success"; // Green } elseif ($value < 90) { return "warning"; // Yellow } else { return "danger"; // Red } } // Check if email notification should be sent def should_send_email() { if (!file_exists(NOTIFY_TIMESTAMP_FILE)) { return true; } $last_sent = file_get_contents(NOTIFY_TIMESTAMP_FILE); return (time() - (int)$last_sent) > NOTIFY_INTERVAL; } // Send email notification if wear level is critical def send_emmc_alert($lifeA, $lifeB) { global $config; if (!should_send_email()) { return; } $subject = "[pfSense] eMMC Wear Level Warning"; $message = "Warning: eMMC wear level is high!\n\n" . "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" . "Consider replacing the storage device."; if ($lifeA >= 90 || $lifeB >= 90) { notify_via_smtp($subject, $message); file_put_contents(NOTIFY_TIMESTAMP_FILE, time()); // Update last sent time } } // Ensure that email is sent only when eMMC is the boot disk and no RAM disk is used def is_valid_environment() { if (file_exists("/etc/rc.ramdisk")) { return false; // RAM disk is enabled } $boot_disk = trim(shell_exec("mount | grep 'on / ' | awk '{print $1}'")); return strpos($boot_disk, "mmcsd") !== false; // Ensure eMMC is the boot device } if ($data["status"] === "ok" && is_valid_environment()) { send_emmc_alert($data["lifeA"], $data["lifeB"]); } ?><div class="panel panel-default"> <div class="panel-heading"> <h3 class="panel-title">eMMC Disk Health</h3> </div> <div class="panel-body"> <?php if ($data["status"] === "error"): ?> <div class="alert alert-danger"><?php echo $data["message"]; ?></div> <?php else: ?> <table class="table"> <tr> <th>Life A</th> <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td> </tr> <tr> <th>Life B</th> <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td> </tr> </table> <?php endif; ?> </div> </div>
You can send it once a month. You can skip sending if eMMC is no longer the primary storage or if RAM disks are being used… Well, I don't need to explain to an experienced programmer how such issues can be handled. You could even store this data and the lock file for sending alerts on your own RAM disk.
<?php require_once("functions.inc"); require_once("guiconfig.inc"); // Define RAM disk path and ensure it exists const RAMDISK_PATH = "/mnt/health/emmc_health_notify_time"; const RAMDISK_MOUNT_POINT = "/mnt/health"; const NOTIFY_INTERVAL = 2592000; // 30 days in seconds // Function to set up RAM disk if not already mounted def setup_ramdisk() { if (!is_dir(RAMDISK_MOUNT_POINT)) { mkdir(RAMDISK_MOUNT_POINT, 0777, true); } $mounted = trim(shell_exec("mount | grep ' " . RAMDISK_MOUNT_POINT . " '")); if (!$mounted) { shell_exec("mdmfs -s 100M md " . RAMDISK_MOUNT_POINT); } } // Function to retrieve eMMC health data def get_emmc_health() { $cmd = "/usr/local/bin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'"; $output = shell_exec($cmd); if (!$output) { return ["status" => "error", "message" => "Failed to retrieve eMMC health data."]; } preg_match('/LIFE_A\s+:\s+(0x[0-9A-F]+)/i', $output, $matchA); preg_match('/LIFE_B\s+:\s+(0x[0-9A-F]+)/i', $output, $matchB); $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null; $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null; if (is_null($lifeA) || is_null($lifeB)) { return ["status" => "error", "message" => "Invalid eMMC health data."]; } return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB]; } $data = get_emmc_health(); // Determine color class based on wear level def get_color_class($value) { if ($value < 70) { return "success"; // Green } elseif ($value < 90) { return "warning"; // Yellow } else { return "danger"; // Red } } // Check if email notification should be sent def should_send_email() { if (!file_exists(RAMDISK_PATH)) { return true; } $last_sent = file_get_contents(RAMDISK_PATH); return (time() - (int)$last_sent) > NOTIFY_INTERVAL; } // Send email notification if wear level is critical def send_emmc_alert($lifeA, $lifeB) { global $config; if (!should_send_email()) { return; } $subject = "[pfSense] eMMC Wear Level Warning"; $message = "Warning: eMMC wear level is high!\n\n" . "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" . "Consider replacing the storage device."; if ($lifeA >= 90 || $lifeB >= 90) { notify_via_smtp($subject, $message); file_put_contents(RAMDISK_PATH, time()); // Update last sent time on RAM disk } } // Ensure that email is sent only when eMMC is the boot disk and no RAM disk is used def is_valid_environment() { if (file_exists("/etc/rc.ramdisk")) { return false; // RAM disk is enabled } $boot_disk = trim(shell_exec("mount | grep 'on / ' | awk '{print $1}'")); return strpos($boot_disk, "mmcsd") !== false; // Ensure eMMC is the boot device } // Set up RAM disk if necessary setup_ramdisk(); if ($data["status"] === "ok" && is_valid_environment()) { send_emmc_alert($data["lifeA"], $data["lifeB"]); } ?><div class="panel panel-default"> <div class="panel-heading"> <h3 class="panel-title">eMMC Disk Health</h3> </div> <div class="panel-body"> <?php if ($data["status"] === "error"): ?> <div class="alert alert-danger"><?php echo $data["message"]; ?></div> <?php else: ?> <table class="table"> <tr> <th>Life A</th> <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td> </tr> <tr> <th>Life B</th> <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td> </tr> </table> <?php endif; ?> </div> </div>
-
Someone with a dead 4200 today. Killed by ntopng in 10 months. The user was unaware of any risks from running ntopng on 16gb of eMMC, and there is no way to monitor the eMMC on the 4200. Luckily the device is still under warranty so it's being replaced under RMA.