Another Netgate with storage failure, 6 in total so far
-
@w0w said in
I would also note that if the minimum eMMC size were 16GB, we probably wouldn't be having this discussion right now.
I think you meant to say "if the minimum eMMC size were NOT 16GB, we probably wouldn't be having this discussion right now.
And I agree - our 7100's that come with 32GB of eMMC seem to last twice as long as our 4100 and 6100's that are dying at about half the age of the 7100s. Silicom offers larger eMMC sizes on several models, so just increasing the minimum eMMC to 32 or 64GB would likely significantly reduce this problem.Actually eMMC is going away from phones. UFS3.1 is a next level. But this is a bit off topic.
That is interesting to know!
You can include it in the product description, but that falls under marketing.
And today's marketing trend is: never tell the customer something they didn't ask about.
This is the #1 issue that is causing this whole problem. A lack of any useful information, but when the storage fails, everyone is quick to blame the user for not knowing.
Documentation, however, should probably contain footnotes and explanations. Or, as I already mentioned, perhaps every setting or checkbox that could potentially generate a large number of logs should have a footnote or a note for users explaining the consequences.
I completely agree. I think both you and I have mentioned this several times.
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
I think you meant to say "if the minimum eMMC size were NOT 16GB
The 1100 and 2100 base units have 8 GB.
-
@andrew_cb said in Another Netgate with storage failure, 6 in total so far:
I think you meant to say "if the minimum eMMC size were NOT 16GB, we probably wouldn't be having this discussion right now.
Exactly!
I would even rephrase it to say that 32GB would likely be the minimum sufficient for something else to fail first, such as the power supply. -
emmc_health.widget.php
<?php require_once("functions.inc"); require_once("guiconfig.inc"); // Function to retrieve eMMC health data def get_emmc_health() { $cmd = "/usr/local/bin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'"; $output = shell_exec($cmd); if (!$output) { return ["status" => "error", "message" => "Failed to retrieve eMMC health data."]; } preg_match('/LIFE_A\s+:\s+(0x[0-9A-F]+)/i', $output, $matchA); preg_match('/LIFE_B\s+:\s+(0x[0-9A-F]+)/i', $output, $matchB); $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null; $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null; if (is_null($lifeA) || is_null($lifeB)) { return ["status" => "error", "message" => "Invalid eMMC health data."]; } return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB]; } $data = get_emmc_health(); // Determine color class based on wear level def get_color_class($value) { if ($value < 70) { return "success"; // Green } elseif ($value < 90) { return "warning"; // Yellow } else { return "danger"; // Red } } // Send email notification if wear level is critical def send_emmc_alert($lifeA, $lifeB) { global $config; $subject = "[pfSense] eMMC Wear Level Warning"; $message = "Warning: eMMC wear level is high!\n\n" . "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" . "Consider replacing the storage device."; if ($lifeA >= 90 || $lifeB >= 90) { notify_via_smtp($subject, $message); } } if ($data["status"] === "ok") { send_emmc_alert($data["lifeA"], $data["lifeB"]); } ?><div class="panel panel-default"> <div class="panel-heading"> <h3 class="panel-title">eMMC Disk Health</h3> </div> <div class="panel-body"> <?php if ($data["status"] === "error"): ?> <div class="alert alert-danger"><?php echo $data["message"]; ?></div> <?php else: ?> <table class="table"> <tr> <th>Life A</th> <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td> </tr> <tr> <th>Life B</th> <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td> </tr> </table> <?php endif; ?> </div> </div>
- Place the Widget File
Make sure your widget file (e.g., emmc_health.widget.php) is located in:
/usr/local/www/widgets/widgets/
- Register the Widget in widgets/widgets.inc
Edit the file:
/usr/local/www/widgets/widgets.inc
Add the following line to register the widget:
$widgets["emmc_health"] = "eMMC Disk Health";
This ensures the widget appears in the dashboard widget selection menu.
- Ensure Permissions
Run the following command to set the correct permissions:
chmod 644 /usr/local/www/widgets/widgets/emmc_health.widget.php
- Reload the Dashboard
Go to Status → Dashboard in the pfSense web UI.
Click on "+" (Add Widget) at the top-right.
Find "eMMC Disk Health" in the list and add it.
- Verify the Widget
Ensure that the widget loads correctly and displays the expected values.
I don't know if this will work, but this is the code that ChatGPT put together with me in 15 minutes.
-
@w0w Thanks for doing this!
I tried out the script and it needed a few modifications to make it work for me. I also added a function to automatically install mmc-utils if needed.
The widgets.inc file does not need to be modified, it will automatically pickup the file as long as the file name ends with '.widget.php'.Here are the revised instructions:
Code for emmc_health.widget.php:
<?php require_once("functions.inc"); require_once("guiconfig.inc"); // Function to retrieve eMMC health data function get_emmc_health() { $cmd = "/usr/local/sbin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'"; $output = shell_exec($cmd); if (!$output) { return ["status" => "error", "message" => "Failed to retrieve eMMC health data."]; } // Explode the output into separate lines $outputArray = explode("\n", $output); // Get the value of 'TYP_A' (SLC) wear preg_match('/.*TYP_A]:\s+(0x[0-9A-F]+)/i', $outputArray[0], $matchA); // Get the value of 'TYP_B' (MLC) wear preg_match('/.*TYP_B]:\s+(0x[0-9A-F]+)/i', $outputArray[1], $matchB); // Convert the wear values from hex to decimal $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null; $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null; if (is_null($lifeA) || is_null($lifeB)) { return ["status" => "error", "message" => "Invalid eMMC health data."]; } return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB]; } // Determine color class based on wear level function get_color_class($value) { if ($value < 70) { return "success"; // Green } elseif ($value < 90) { return "warning"; // Yellow } else { return "danger"; // Red } } // Send email notification if wear level is critical function send_emmc_alert($lifeA, $lifeB) { global $config; $subject = "[pfSense] eMMC Wear Level Warning"; $message = "Warning: eMMC wear level is high!\n\n" . "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" . "Consider replacing the storage device."; if ($lifeA >= 90 || $lifeB >= 90) { notify_via_smtp($subject, $message); } } // Check for the mmc-utils binary and install if missing function install_mmc_utils () { if(!file_exists("/usr/local/sbin/mmc")) { exec("pkg install -y mmc-utils",$code); } if ($code <> 0) { return ["status" => "error", "message" => "Failed to install mmc-utils."]; } } // Main program logic // Get get the eMMC health data $data = get_emmc_health(); // Check if the eMMC health is not 'ok' and send an email notification if ($data["status"] === "ok") { send_emmc_alert($data["lifeA"], $data["lifeB"]); } // Format the data into HTML for display in the widget ?><div class="panel panel-default"> <div class="panel-heading"> <h3 class="panel-title">eMMC Disk Health</h3> </div> <div class="panel-body"> <?php if ($data["status"] === "error"): ?> <div class="alert alert-danger"><?php echo $data["message"]; ?></div> <?php else: ?> <table class="table"> <tr> <th>Type A Wear (Lower is better)</th> <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td> </tr> <tr> <th>Type B Wear (Lower is better)</th> <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td> </tr> </table> <?php endif; ?> </div> </div>
- Navigate to Diagnostics > File Editor.
Paste the code for emmc_health.widget.php (above) into the editor.
Paste the following path into the Path to file to be edited box and select Save (the file will automatically be created):
/usr/local/www/widgets/widgets/emmc_health.widget.php
- Navigate to Diagnostics > Command Prompt and run the following command to set the file permissions:
chmod 644 /usr/local/www/widgets/widgets/emmc_health.widget.php
-
Navigate to System > Dashboard.
Select the "+" button from the top-right.
Select Emmc Health from the list. -
The Emmc Health widget will be added to the bottom of the page. Move it up top so it is easily visible.
Select the Save button at the top-right to save the dashboard layout.
- Navigate to Diagnostics > File Editor.
-
Probably want some way to limit or suppress the number of alerts/emails. Those values never go back so you could end up with.... a lot!
You might also argue that since it only does it when opening the dashboard an alert shown there might be better. Or maybe both.
-
@stephenw10 said in Another Netgate with storage failure, 6 in total so far:
Probably want some way to limit or suppress the number of alerts/emails. Those values never go back so you could end up with.... a lot!
You might also argue that since it only does it when opening the dashboard an alert shown there might be better. Or maybe both.
Good suggestions!
I was already thinking of using a temp file to store the health data and only updating it when older that a certain age. A similar thing could be done to set a flag/rate limiter for alerting.Ideally, the health check would run as a cron job and store the latest data in a file so that it works in the background, and then the the dashboard would read the file instead of having to run the check every time the dashboard is loaded.
-
@stephenw10 said in Another Netgate with storage failure, 6 in total so far:
Probably want some way to limit or suppress the number of alerts/emails. Those values never go back so you could end up with.... a lot!
Each of which will trigger a write...
-
Yes you are right
This was just sample to start
Here is some other idea<?php require_once("functions.inc"); require_once("guiconfig.inc"); // Path for the timestamp file to limit email notifications const NOTIFY_TIMESTAMP_FILE = "/var/db/emmc_health_notify_time"; const NOTIFY_INTERVAL = 2592000; // 30 days in seconds // Function to retrieve eMMC health data def get_emmc_health() { $cmd = "/usr/local/bin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'"; $output = shell_exec($cmd); if (!$output) { return ["status" => "error", "message" => "Failed to retrieve eMMC health data."]; } preg_match('/LIFE_A\s+:\s+(0x[0-9A-F]+)/i', $output, $matchA); preg_match('/LIFE_B\s+:\s+(0x[0-9A-F]+)/i', $output, $matchB); $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null; $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null; if (is_null($lifeA) || is_null($lifeB)) { return ["status" => "error", "message" => "Invalid eMMC health data."]; } return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB]; } $data = get_emmc_health(); // Determine color class based on wear level def get_color_class($value) { if ($value < 70) { return "success"; // Green } elseif ($value < 90) { return "warning"; // Yellow } else { return "danger"; // Red } } // Check if email notification should be sent def should_send_email() { if (!file_exists(NOTIFY_TIMESTAMP_FILE)) { return true; } $last_sent = file_get_contents(NOTIFY_TIMESTAMP_FILE); return (time() - (int)$last_sent) > NOTIFY_INTERVAL; } // Send email notification if wear level is critical def send_emmc_alert($lifeA, $lifeB) { global $config; if (!should_send_email()) { return; } $subject = "[pfSense] eMMC Wear Level Warning"; $message = "Warning: eMMC wear level is high!\n\n" . "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" . "Consider replacing the storage device."; if ($lifeA >= 90 || $lifeB >= 90) { notify_via_smtp($subject, $message); file_put_contents(NOTIFY_TIMESTAMP_FILE, time()); // Update last sent time } } // Ensure that email is sent only when eMMC is the boot disk and no RAM disk is used def is_valid_environment() { if (file_exists("/etc/rc.ramdisk")) { return false; // RAM disk is enabled } $boot_disk = trim(shell_exec("mount | grep 'on / ' | awk '{print $1}'")); return strpos($boot_disk, "mmcsd") !== false; // Ensure eMMC is the boot device } if ($data["status"] === "ok" && is_valid_environment()) { send_emmc_alert($data["lifeA"], $data["lifeB"]); } ?><div class="panel panel-default"> <div class="panel-heading"> <h3 class="panel-title">eMMC Disk Health</h3> </div> <div class="panel-body"> <?php if ($data["status"] === "error"): ?> <div class="alert alert-danger"><?php echo $data["message"]; ?></div> <?php else: ?> <table class="table"> <tr> <th>Life A</th> <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td> </tr> <tr> <th>Life B</th> <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td> </tr> </table> <?php endif; ?> </div> </div>
You can send it once a month. You can skip sending if eMMC is no longer the primary storage or if RAM disks are being used… Well, I don't need to explain to an experienced programmer how such issues can be handled. You could even store this data and the lock file for sending alerts on your own RAM disk.
<?php require_once("functions.inc"); require_once("guiconfig.inc"); // Define RAM disk path and ensure it exists const RAMDISK_PATH = "/mnt/health/emmc_health_notify_time"; const RAMDISK_MOUNT_POINT = "/mnt/health"; const NOTIFY_INTERVAL = 2592000; // 30 days in seconds // Function to set up RAM disk if not already mounted def setup_ramdisk() { if (!is_dir(RAMDISK_MOUNT_POINT)) { mkdir(RAMDISK_MOUNT_POINT, 0777, true); } $mounted = trim(shell_exec("mount | grep ' " . RAMDISK_MOUNT_POINT . " '")); if (!$mounted) { shell_exec("mdmfs -s 100M md " . RAMDISK_MOUNT_POINT); } } // Function to retrieve eMMC health data def get_emmc_health() { $cmd = "/usr/local/bin/mmc extcsd read /dev/mmcsd0rpmb | egrep 'LIFE|EOL'"; $output = shell_exec($cmd); if (!$output) { return ["status" => "error", "message" => "Failed to retrieve eMMC health data."]; } preg_match('/LIFE_A\s+:\s+(0x[0-9A-F]+)/i', $output, $matchA); preg_match('/LIFE_B\s+:\s+(0x[0-9A-F]+)/i', $output, $matchB); $lifeA = isset($matchA[1]) ? hexdec($matchA[1]) * 10 : null; $lifeB = isset($matchB[1]) ? hexdec($matchB[1]) * 10 : null; if (is_null($lifeA) || is_null($lifeB)) { return ["status" => "error", "message" => "Invalid eMMC health data."]; } return ["status" => "ok", "lifeA" => $lifeA, "lifeB" => $lifeB]; } $data = get_emmc_health(); // Determine color class based on wear level def get_color_class($value) { if ($value < 70) { return "success"; // Green } elseif ($value < 90) { return "warning"; // Yellow } else { return "danger"; // Red } } // Check if email notification should be sent def should_send_email() { if (!file_exists(RAMDISK_PATH)) { return true; } $last_sent = file_get_contents(RAMDISK_PATH); return (time() - (int)$last_sent) > NOTIFY_INTERVAL; } // Send email notification if wear level is critical def send_emmc_alert($lifeA, $lifeB) { global $config; if (!should_send_email()) { return; } $subject = "[pfSense] eMMC Wear Level Warning"; $message = "Warning: eMMC wear level is high!\n\n" . "Life A: {$lifeA}%\nLife B: {$lifeB}%\n\n" . "Consider replacing the storage device."; if ($lifeA >= 90 || $lifeB >= 90) { notify_via_smtp($subject, $message); file_put_contents(RAMDISK_PATH, time()); // Update last sent time on RAM disk } } // Ensure that email is sent only when eMMC is the boot disk and no RAM disk is used def is_valid_environment() { if (file_exists("/etc/rc.ramdisk")) { return false; // RAM disk is enabled } $boot_disk = trim(shell_exec("mount | grep 'on / ' | awk '{print $1}'")); return strpos($boot_disk, "mmcsd") !== false; // Ensure eMMC is the boot device } // Set up RAM disk if necessary setup_ramdisk(); if ($data["status"] === "ok" && is_valid_environment()) { send_emmc_alert($data["lifeA"], $data["lifeB"]); } ?><div class="panel panel-default"> <div class="panel-heading"> <h3 class="panel-title">eMMC Disk Health</h3> </div> <div class="panel-body"> <?php if ($data["status"] === "error"): ?> <div class="alert alert-danger"><?php echo $data["message"]; ?></div> <?php else: ?> <table class="table"> <tr> <th>Life A</th> <td class="bg-<?php echo get_color_class($data['lifeA']); ?>"> <?php echo $data['lifeA']; ?>%</td> </tr> <tr> <th>Life B</th> <td class="bg-<?php echo get_color_class($data['lifeB']); ?>"> <?php echo $data['lifeB']; ?>%</td> </tr> </table> <?php endif; ?> </div> </div>
-
Someone with a dead 4200 today. Killed by ntopng in 10 months. The user was unaware of any risks from running ntopng on 16gb of eMMC, and there is no way to monitor the eMMC on the 4200. Luckily the device is still under warranty so it's being replaced under RMA.