Multiple network failures after dirty upgrade to 23.01
-
Yeah that's a drive error. It looks like the eMMC.
You could fit an m.2 SSD and reinstall to it instead:
https://docs.netgate.com/pfsense/en/latest/solutions/sg-5100/m-2-sata-installation.htmlSteve
-
@stephenw10 said in Multiple network failures after dirty upgrade to 23.01:
Yeah that's a drive error. It looks like the eMMC.
SteveThank you Steve for the quick response!
One thing I forgot to mention; after the upgrade, I was troubleshooting the recent NUT package problem where it would not connect to the UPS. I plugged a USB hub into one of the USB ports on the 5100 which completely took down the 5100; all LEDs went dark. I had to disconnect the power supply to recover. Thanks to running a zfs disk structure, everything recovered... That, to me, almost seems like a power supply issue.
Is that expected behavior?
I didn't think a USB 3.0 hub would draw that much power...
-
Hmm, yeah I would not expect to see that ever. Are you sure that USB hub is good?
-
@stephenw10 said in Multiple network failures after dirty upgrade to 23.01:
Hmm, yeah I would not expect to see that ever. Are you sure that USB hub is good?
Yeah, it's brand new. I also verified that it works on my Windows machine.
Do you think I should submit a trouble ticket?
-
Well if it works fine without that hub attached it's hard to say its a problem with the 5100. I assume it works with the UPS connected directly to the 5100?
Is the hub powered? It might be exceeding the current rating of the port without external power connected.Steve
-
Yeah the 5100 worked fine with the UPS plugged into the USB port.
Yes this USB hub is powered from the port power. I don't need to use it, I was just trying that to see if it helped the NUT pkg connectivity error.
I went ahead and tested the eMMC with the command:
mmc extcsd read /dev/mmcsd0rpmbThe results were:
============================================= Extended CSD rev 1.7 (MMC 5.0) ============================================= Card Supported Command sets [S_CMD_SET: 0x01] HPI Features [HPI_FEATURE: 0x01]: implementation based on CMD13 Background operations support [BKOPS_SUPPORT: 0x01] Max Packet Read Cmd [MAX_PACKED_READS: 0x00] Max Packet Write Cmd [MAX_PACKED_WRITES: 0x3c] Data TAG support [DATA_TAG_SUPPORT: 0x01] Data TAG Unit Size [TAG_UNIT_SIZE: 0x03] Tag Resources Size [TAG_RES_SIZE: 0x00] Context Management Capabilities [CONTEXT_CAPABILITIES: 0x05] Large Unit Size [LARGE_UNIT_SIZE_M1: 0x07] Extended partition attribute support [EXT_SUPPORT: 0x03] Generic CMD6 Timer [GENERIC_CMD6_TIME: 0x19] Power off notification [POWER_OFF_LONG_TIME: 0xff] Cache Size [CACHE_SIZE] is 128 KiB Background operations status [BKOPS_STATUS: 0x00] 1st Initialisation Time after programmed sector [INI_TIMEOUT_AP: 0x64] Power class for 52MHz, DDR at 3.6V [PWR_CL_DDR_52_360: 0x00] Power class for 52MHz, DDR at 1.95V [PWR_CL_DDR_52_195: 0x00] Power class for 200MHz at 3.6V [PWR_CL_200_360: 0x00] Power class for 200MHz, at 1.95V [PWR_CL_200_195: 0x00] Minimum Performance for 8bit at 52MHz in DDR mode: [MIN_PERF_DDR_W_8_52: 0x00] [MIN_PERF_DDR_R_8_52: 0x00] TRIM Multiplier [TRIM_MULT: 0x11] Secure Feature support [SEC_FEATURE_SUPPORT: 0x55] Boot Information [BOOT_INFO: 0x07] Device supports alternative boot method Device supports dual data rate during boot Device supports high speed timing during boot Boot partition size [BOOT_SIZE_MULTI: 0x20] Access size [ACC_SIZE: 0x07] High-capacity erase unit size [HC_ERASE_GRP_SIZE: 0x01] i.e. 512 KiB High-capacity erase timeout [ERASE_TIMEOUT_MULT: 0x11] Reliable write sector count [REL_WR_SEC_C: 0x01] High-capacity W protect group size [HC_WP_GRP_SIZE: 0x10] i.e. 8192 KiB Sleep current (VCC) [S_C_VCC: 0x08] Sleep current (VCCQ) [S_C_VCCQ: 0x08] Sleep/awake timeout [S_A_TIMEOUT: 0x13] Sector Count [SEC_COUNT: 0x00e90000] Device is block-addressed Minimum Write Performance for 8bit: [MIN_PERF_W_8_52: 0x08] [MIN_PERF_R_8_52: 0x08] [MIN_PERF_W_8_26_4_52: 0x08] [MIN_PERF_R_8_26_4_52: 0x08] Minimum Write Performance for 4bit: [MIN_PERF_W_4_26: 0x08] [MIN_PERF_R_4_26: 0x08] Power classes registers: [PWR_CL_26_360: 0x00] [PWR_CL_52_360: 0x00] [PWR_CL_26_195: 0x00] [PWR_CL_52_195: 0x00] Partition switching timing [PARTITION_SWITCH_TIME: 0x03] Out-of-interrupt busy timing [OUT_OF_INTERRUPT_TIME: 0x04] I/O Driver Strength [DRIVER_STRENGTH: 0x1f] Enhanced Strobe mode [STROBE_SUPPORT: 0x00] Card Type [CARD_TYPE: 0x57] HS400 Dual Data Rate eMMC @200MHz 1.8VI/O HS200 Single Data Rate eMMC @200MHz 1.8VI/O HS Dual Data Rate eMMC @52MHz 1.8V or 3VI/O HS eMMC @52MHz - at rated device voltage(s) HS eMMC @26MHz - at rated device voltage(s) CSD structure version [CSD_STRUCTURE: 0x02] Command set [CMD_SET: 0x00] Command set revision [CMD_SET_REV: 0x00] Power class [POWER_CLASS: 0x00] High-speed interface timing [HS_TIMING: 0x01] Erased memory content [ERASED_MEM_CONT: 0x00] Boot configuration bytes [PARTITION_CONFIG: 0x03] Not boot enable R/W Replay Protected Memory Block (RPMB) Boot config protection [BOOT_CONFIG_PROT: 0x00] Boot bus Conditions [BOOT_BUS_CONDITIONS: 0x00] High-density erase group definition [ERASE_GROUP_DEF: 0x01] Boot write protection status registers [BOOT_WP_STATUS]: 0x00 Boot Area Write protection [BOOT_WP]: 0x00 Power ro locking: possible Permanent ro locking: possible partition 0 ro lock status: not locked partition 1 ro lock status: not locked User area write protection register [USER_WP]: 0x00 FW configuration [FW_CONFIG]: 0x00 RPMB Size [RPMB_SIZE_MULT]: 0x20 Write reliability setting register [WR_REL_SET]: 0x1f user area: the device protects existing data if a power failure occurs during a write operation partition 1: the device protects existing data if a power failure occurs during a write operation partition 2: the device protects existing data if a power failure occurs during a write operation partition 3: the device protects existing data if a power failure occurs during a write operation partition 4: the device protects existing data if a power failure occurs during a write operation Write reliability parameter register [WR_REL_PARAM]: 0x04 Device supports the enhanced def. of reliable write Enable background operations handshake [BKOPS_EN]: 0x00 H/W reset function [RST_N_FUNCTION]: 0x00 HPI management [HPI_MGMT]: 0x00 Partitioning Support [PARTITIONING_SUPPORT]: 0x07 Device support partitioning feature Device can have enhanced tech. Max Enhanced Area Size [MAX_ENH_SIZE_MULT]: 0x0001d2 i.e. 3817472 KiB Partitions attribute [PARTITIONS_ATTRIBUTE]: 0x00 Partitioning Setting [PARTITION_SETTING_COMPLETED]: 0x00 Device partition setting NOT complete General Purpose Partition Size [GP_SIZE_MULT_4]: 0x000000 [GP_SIZE_MULT_3]: 0x000000 [GP_SIZE_MULT_2]: 0x000000 [GP_SIZE_MULT_1]: 0x000000 Enhanced User Data Area Size [ENH_SIZE_MULT]: 0x000000 i.e. 0 KiB Enhanced User Data Start Address [ENH_START_ADDR]: 0x00000000 i.e. 0 bytes offset Bad Block Management mode [SEC_BAD_BLK_MGMNT]: 0x00 Periodic Wake-up [PERIODIC_WAKEUP]: 0x00 Program CID/CSD in DDR mode support [PROGRAM_CID_CSD_DDR_SUPPORT]: 0x01 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[127]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[126]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[125]]: 0x20 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[124]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[123]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[122]]: 0x20 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[121]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[120]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[119]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[118]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[117]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[116]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[115]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[114]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[113]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[112]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[111]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[110]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[109]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[108]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[107]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[106]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[105]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[104]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[103]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[102]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[101]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[100]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[99]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[98]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[97]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[96]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[95]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[94]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[93]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[92]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[91]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[90]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[89]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[88]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[87]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[86]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[85]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[84]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[83]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[82]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[81]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[80]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[79]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[78]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[77]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[76]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[75]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[74]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[73]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[72]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[71]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[70]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[69]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[68]]: 0x01 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[67]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[66]]: 0x07 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[65]]: 0xa9 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[64]]: 0x03 Native sector size [NATIVE_SECTOR_SIZE]: 0x00 Sector size emulation [USE_NATIVE_SECTOR]: 0x00 Sector size [DATA_SECTOR_SIZE]: 0x00 1st initialization after disabling sector size emulation [INI_TIMEOUT_EMU]: 0x00 Class 6 commands control [CLASS_6_CTRL]: 0x00 Number of addressed group to be Released[DYNCAP_NEEDED]: 0x00 Exception events control [EXCEPTION_EVENTS_CTRL]: 0x0000 Exception events status[EXCEPTION_EVENTS_STATUS]: 0x0000 Extended Partitions Attribute [EXT_PARTITIONS_ATTRIBUTE]: 0x0000 Context configuration [CONTEXT_CONF[51]]: 0x00 Context configuration [CONTEXT_CONF[50]]: 0x00 Context configuration [CONTEXT_CONF[49]]: 0x00 Context configuration [CONTEXT_CONF[48]]: 0x00 Context configuration [CONTEXT_CONF[47]]: 0x00 Context configuration [CONTEXT_CONF[46]]: 0x00 Context configuration [CONTEXT_CONF[45]]: 0x00 Context configuration [CONTEXT_CONF[44]]: 0x00 Context configuration [CONTEXT_CONF[43]]: 0x00 Context configuration [CONTEXT_CONF[42]]: 0x00 Context configuration [CONTEXT_CONF[41]]: 0x00 Context configuration [CONTEXT_CONF[40]]: 0x00 Context configuration [CONTEXT_CONF[39]]: 0x00 Context configuration [CONTEXT_CONF[38]]: 0x00 Context configuration [CONTEXT_CONF[37]]: 0x00 Packed command status [PACKED_COMMAND_STATUS]: 0x00 Packed command failure index [PACKED_FAILURE_INDEX]: 0x00 Power Off Notification [POWER_OFF_NOTIFICATION]: 0x00 Control to turn the Cache ON/OFF [CACHE_CTRL]: 0x01 eMMC Firmware Version: R eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x0b eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01 Secure Removal Type [SECURE_REMOVAL_TYPE]: 0x01 information is configured to be removed by an erase of the physical memory Supported Secure Removal Type: information removed by an erase of the physical memory
So yeah, this eMMC is EOL...
Ironically, running the memory test caused it to crash. -
@azdeltawye said in Multiple network failures after dirty upgrade to 23.01:
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b
Yes, I would get an SSD in there ASAP.
-
@azdeltawye I have a Netgate 2100 and I'm experiencing similar issues as yourself. DHCP seems to fail, can't log into the firewall, and console is unresponsive. After a very long time, console spit out the following "Solaris: WARNING: Pool 'pfSense' has encountered an uncorrectable I/O failure and has been suspended.", which is how I found your post. This all happened after the 23.01 update. For my unit, a reboot doesn't fix the issue, a reflash of the firmware and reload of the config is what gets the firewall going for an additional 2-3 days. I've opened a ticket with Netgate to troubleshoot this issue and they basically said that internal eMMC is dying. I provided these logs when even flashing the firmware was giving me a hard time :
(100s of these lines)
*mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cacheLoader variables:
vfs.root.mountfrom=zfs:pfSense/ROOT/defaultManual root filesystem specification:
<fstype>:<device> [options]
Mount <device> using filesystem <fstype>
and with the specified (optional) option list.eg. ufs:/dev/da0s1a
zfs:zroot/ROOT/default
cd9660:/dev/cd0 ro
(which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)? List valid disk boot devices
. Yield 1 second (for background tasks)
<empty line> Abort manual inputmountroot>*
So yeah, I guess this is confirmation that I need to get a new SSD in there. Not sure why this only happened after 23.01 tho.
-
@packetsniffer
Yeah, my Netgate appliance eventually became unreachable via RJ45 ports and console serial port. I replaced, or actually added, the approved SATA SSD in an attempt to recover the unit but that did not help. The unit was toast...Takeaway lesson: Next Netgate appliance purchase - pony up the extra $$ for the MAX version. Preferably something without eMMC!
-
@packetsniffer Most likely because an update writes a lot to disk. For the record there is this:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
and RAM disks can save writing: https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-writes.htmlhttps://www.netgate.com/supported-pfsense-plus-packages has remarks on whether an SSD is recommended for certain packages that can have intense logging depending on configuration.
-
@SteveITS Thank you.
I followed a few links to test the onboard memory, and it turns out mine was pretty dead.
I threw a new Transcend 512GB (TS512GMTS430S) in the Netgate 2100, flashed 23.01 on there, restored my config, and I've been solid for 2+ weeks.
I will be looking into cleaning up my logging to reduce wear and tear.
Thanks everyone!