Multiple network failures after dirty upgrade to 23.01
-
Hello,
About two weeks ago, I upgraded my SG-5100 to 23.01. Since then, I have had several instances where I completely loose network connectivity. I cannot log into the pfSense web UI and multiple devices on my network have fallen back to local link IP addresses (169.254.x.x). It almost seems like the DHCP server in pfSense has failed. During these outages, I also cannot access the serial console via putty on my Windows PC. The 5100, from the outside, looks fine; green LEDs and flashing link lights on the network jacks... The only way to recover from this state is to cycle power to the 5100. After a power cycle, everything appears to work correctly: I can access the web UI and console interface and all the network clients get assigned IP addresses.
What I meant by 'dirty upgrade' in the subject title was simply clicking on the upgrade button in the web UI without un-installing any packages.
I did happen to have the console window up during one of these meltdowns and was able to capture the following error message:
0) Logout (SSH only) 9) pfTop 1) Assign Interfaces 10) Filter Logs 2) Set interface(s) IP address 11) Restart webConfigurator 3) Reset webConfigurator password 12) PHP shell + Netgate pfSense Plus tools 4) Reset to factory defaults 13) Update from console 5) Reboot system 14) Disable Secure Shell (sshd) 6) Halt system 15) Restore recent configuration 7) Ping host 16) Restart PHP-FPM 8) Shell Enter an option: sdhci_pci0-slot0: Controller timeout sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f381000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000000 sdhci_pci0-slot0: Argument: 0x00dace80 | Trn mode: 0x00000027 sdhci_pci0-slot0: Present: 0x1fef0106 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000001 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003a sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout sdhci_pci0-slot0: Got data interrupt 0x00600000, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00e8f410 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x01008000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000001 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000001 sdhci_pci0-slot0: =========================================== sdhci_pci0-slot0: Got AutoCMD12 error 0x0001, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00e8f410 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== sdhci_pci0-slot0: Got AutoCMD12 error 0x0001, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00e8f610 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== sdhci_pci0-slot0: Got AutoCMD12 error 0x0001, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00200a10 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== Solaris: WARNING: Pool 'pfSense' has encountered an uncorrectable I/O failure an d has been suspended. Enter an option: sdhci_pci0-slot0: Controller timeout sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f381000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000000 sdhci_pci0-slot0: Argument: 0x00dace80 | Trn mode: 0x00000027 sdhci_pci0-slot0: Present: 0x1fef0106 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000001 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003a sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout sdhci_pci0-slot0: Got data interrupt 0x00600000, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00e8f410 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x01008000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000001 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000001 sdhci_pci0-slot0: =========================================== sdhci_pci0-slot0: Got AutoCMD12 error 0x0001, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00e8f410 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== sdhci_pci0-slot0: Got AutoCMD12 error 0x0001, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00e8f610 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== sdhci_pci0-slot0: Got AutoCMD12 error 0x0001, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00200a10 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== Solaris: WARNING: Pool 'pfSense' has encountered an uncorrectable I/O failure an d has been suspended. sdhci_pci0-slot0: =========================================== Solaris: WARNING: Pool 'pfSense' has encountered an uncorrectable I/O failure an d has been suspended. Enter an option: sdhci_pci0-slot0: Controller timeout sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f381000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000000 sdhci_pci0-slot0: Argument: 0x00dace80 | Trn mode: 0x00000027 sdhci_pci0-slot0: Present: 0x1fef0106 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000001 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003a sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout sdhci_pci0-slot0: Got data interrupt 0x00600000, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00e8f410 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x01008000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000001 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000001 sdhci_pci0-slot0: =========================================== sdhci_pci0-slot0: Got AutoCMD12 error 0x0001, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00e8f410 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== sdhci_pci0-slot0: Got AutoCMD12 error 0x0001, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00e8f610 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== sdhci_pci0-slot0: Got AutoCMD12 error 0x0001, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x7f380000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000010 sdhci_pci0-slot0: Argument: 0x00200a10 | Trn mode: 0x00000037 sdhci_pci0-slot0: Present: 0x1fef0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000207 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000000c sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== Solaris: WARNING: Pool 'pfSense' has encountered an uncorrectable I/O failure an d has been suspended. ^CSolaris: WARNING: Pool 'pfSense' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'pfSense' has encountered an uncorrectable I/O failure and has been suspended. Solaris: WARNING: Pool 'pfSense' has encountered an uncorrectable I/O failure and has been suspended. d has been suspended.
Since the last outage, I have performed a clean upgrade via USB memstick with a fresh image from Netgate TAC support.
Any idea on what is going on? Do I need to start thinking about getting a new Netgate appliance? Did the fresh install of 23.01 do the trick?
I work from home so these outages are very disruptive...
Thanks in advance!
-
Yeah that's a drive error. It looks like the eMMC.
You could fit an m.2 SSD and reinstall to it instead:
https://docs.netgate.com/pfsense/en/latest/solutions/sg-5100/m-2-sata-installation.htmlSteve
-
@stephenw10 said in Multiple network failures after dirty upgrade to 23.01:
Yeah that's a drive error. It looks like the eMMC.
SteveThank you Steve for the quick response!
One thing I forgot to mention; after the upgrade, I was troubleshooting the recent NUT package problem where it would not connect to the UPS. I plugged a USB hub into one of the USB ports on the 5100 which completely took down the 5100; all LEDs went dark. I had to disconnect the power supply to recover. Thanks to running a zfs disk structure, everything recovered... That, to me, almost seems like a power supply issue.
Is that expected behavior?
I didn't think a USB 3.0 hub would draw that much power...
-
Hmm, yeah I would not expect to see that ever. Are you sure that USB hub is good?
-
@stephenw10 said in Multiple network failures after dirty upgrade to 23.01:
Hmm, yeah I would not expect to see that ever. Are you sure that USB hub is good?
Yeah, it's brand new. I also verified that it works on my Windows machine.
Do you think I should submit a trouble ticket?
-
Well if it works fine without that hub attached it's hard to say its a problem with the 5100. I assume it works with the UPS connected directly to the 5100?
Is the hub powered? It might be exceeding the current rating of the port without external power connected.Steve
-
Yeah the 5100 worked fine with the UPS plugged into the USB port.
Yes this USB hub is powered from the port power. I don't need to use it, I was just trying that to see if it helped the NUT pkg connectivity error.
I went ahead and tested the eMMC with the command:
mmc extcsd read /dev/mmcsd0rpmbThe results were:
============================================= Extended CSD rev 1.7 (MMC 5.0) ============================================= Card Supported Command sets [S_CMD_SET: 0x01] HPI Features [HPI_FEATURE: 0x01]: implementation based on CMD13 Background operations support [BKOPS_SUPPORT: 0x01] Max Packet Read Cmd [MAX_PACKED_READS: 0x00] Max Packet Write Cmd [MAX_PACKED_WRITES: 0x3c] Data TAG support [DATA_TAG_SUPPORT: 0x01] Data TAG Unit Size [TAG_UNIT_SIZE: 0x03] Tag Resources Size [TAG_RES_SIZE: 0x00] Context Management Capabilities [CONTEXT_CAPABILITIES: 0x05] Large Unit Size [LARGE_UNIT_SIZE_M1: 0x07] Extended partition attribute support [EXT_SUPPORT: 0x03] Generic CMD6 Timer [GENERIC_CMD6_TIME: 0x19] Power off notification [POWER_OFF_LONG_TIME: 0xff] Cache Size [CACHE_SIZE] is 128 KiB Background operations status [BKOPS_STATUS: 0x00] 1st Initialisation Time after programmed sector [INI_TIMEOUT_AP: 0x64] Power class for 52MHz, DDR at 3.6V [PWR_CL_DDR_52_360: 0x00] Power class for 52MHz, DDR at 1.95V [PWR_CL_DDR_52_195: 0x00] Power class for 200MHz at 3.6V [PWR_CL_200_360: 0x00] Power class for 200MHz, at 1.95V [PWR_CL_200_195: 0x00] Minimum Performance for 8bit at 52MHz in DDR mode: [MIN_PERF_DDR_W_8_52: 0x00] [MIN_PERF_DDR_R_8_52: 0x00] TRIM Multiplier [TRIM_MULT: 0x11] Secure Feature support [SEC_FEATURE_SUPPORT: 0x55] Boot Information [BOOT_INFO: 0x07] Device supports alternative boot method Device supports dual data rate during boot Device supports high speed timing during boot Boot partition size [BOOT_SIZE_MULTI: 0x20] Access size [ACC_SIZE: 0x07] High-capacity erase unit size [HC_ERASE_GRP_SIZE: 0x01] i.e. 512 KiB High-capacity erase timeout [ERASE_TIMEOUT_MULT: 0x11] Reliable write sector count [REL_WR_SEC_C: 0x01] High-capacity W protect group size [HC_WP_GRP_SIZE: 0x10] i.e. 8192 KiB Sleep current (VCC) [S_C_VCC: 0x08] Sleep current (VCCQ) [S_C_VCCQ: 0x08] Sleep/awake timeout [S_A_TIMEOUT: 0x13] Sector Count [SEC_COUNT: 0x00e90000] Device is block-addressed Minimum Write Performance for 8bit: [MIN_PERF_W_8_52: 0x08] [MIN_PERF_R_8_52: 0x08] [MIN_PERF_W_8_26_4_52: 0x08] [MIN_PERF_R_8_26_4_52: 0x08] Minimum Write Performance for 4bit: [MIN_PERF_W_4_26: 0x08] [MIN_PERF_R_4_26: 0x08] Power classes registers: [PWR_CL_26_360: 0x00] [PWR_CL_52_360: 0x00] [PWR_CL_26_195: 0x00] [PWR_CL_52_195: 0x00] Partition switching timing [PARTITION_SWITCH_TIME: 0x03] Out-of-interrupt busy timing [OUT_OF_INTERRUPT_TIME: 0x04] I/O Driver Strength [DRIVER_STRENGTH: 0x1f] Enhanced Strobe mode [STROBE_SUPPORT: 0x00] Card Type [CARD_TYPE: 0x57] HS400 Dual Data Rate eMMC @200MHz 1.8VI/O HS200 Single Data Rate eMMC @200MHz 1.8VI/O HS Dual Data Rate eMMC @52MHz 1.8V or 3VI/O HS eMMC @52MHz - at rated device voltage(s) HS eMMC @26MHz - at rated device voltage(s) CSD structure version [CSD_STRUCTURE: 0x02] Command set [CMD_SET: 0x00] Command set revision [CMD_SET_REV: 0x00] Power class [POWER_CLASS: 0x00] High-speed interface timing [HS_TIMING: 0x01] Erased memory content [ERASED_MEM_CONT: 0x00] Boot configuration bytes [PARTITION_CONFIG: 0x03] Not boot enable R/W Replay Protected Memory Block (RPMB) Boot config protection [BOOT_CONFIG_PROT: 0x00] Boot bus Conditions [BOOT_BUS_CONDITIONS: 0x00] High-density erase group definition [ERASE_GROUP_DEF: 0x01] Boot write protection status registers [BOOT_WP_STATUS]: 0x00 Boot Area Write protection [BOOT_WP]: 0x00 Power ro locking: possible Permanent ro locking: possible partition 0 ro lock status: not locked partition 1 ro lock status: not locked User area write protection register [USER_WP]: 0x00 FW configuration [FW_CONFIG]: 0x00 RPMB Size [RPMB_SIZE_MULT]: 0x20 Write reliability setting register [WR_REL_SET]: 0x1f user area: the device protects existing data if a power failure occurs during a write operation partition 1: the device protects existing data if a power failure occurs during a write operation partition 2: the device protects existing data if a power failure occurs during a write operation partition 3: the device protects existing data if a power failure occurs during a write operation partition 4: the device protects existing data if a power failure occurs during a write operation Write reliability parameter register [WR_REL_PARAM]: 0x04 Device supports the enhanced def. of reliable write Enable background operations handshake [BKOPS_EN]: 0x00 H/W reset function [RST_N_FUNCTION]: 0x00 HPI management [HPI_MGMT]: 0x00 Partitioning Support [PARTITIONING_SUPPORT]: 0x07 Device support partitioning feature Device can have enhanced tech. Max Enhanced Area Size [MAX_ENH_SIZE_MULT]: 0x0001d2 i.e. 3817472 KiB Partitions attribute [PARTITIONS_ATTRIBUTE]: 0x00 Partitioning Setting [PARTITION_SETTING_COMPLETED]: 0x00 Device partition setting NOT complete General Purpose Partition Size [GP_SIZE_MULT_4]: 0x000000 [GP_SIZE_MULT_3]: 0x000000 [GP_SIZE_MULT_2]: 0x000000 [GP_SIZE_MULT_1]: 0x000000 Enhanced User Data Area Size [ENH_SIZE_MULT]: 0x000000 i.e. 0 KiB Enhanced User Data Start Address [ENH_START_ADDR]: 0x00000000 i.e. 0 bytes offset Bad Block Management mode [SEC_BAD_BLK_MGMNT]: 0x00 Periodic Wake-up [PERIODIC_WAKEUP]: 0x00 Program CID/CSD in DDR mode support [PROGRAM_CID_CSD_DDR_SUPPORT]: 0x01 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[127]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[126]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[125]]: 0x20 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[124]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[123]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[122]]: 0x20 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[121]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[120]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[119]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[118]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[117]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[116]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[115]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[114]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[113]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[112]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[111]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[110]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[109]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[108]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[107]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[106]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[105]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[104]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[103]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[102]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[101]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[100]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[99]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[98]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[97]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[96]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[95]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[94]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[93]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[92]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[91]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[90]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[89]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[88]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[87]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[86]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[85]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[84]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[83]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[82]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[81]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[80]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[79]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[78]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[77]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[76]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[75]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[74]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[73]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[72]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[71]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[70]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[69]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[68]]: 0x01 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[67]]: 0x00 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[66]]: 0x07 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[65]]: 0xa9 Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[64]]: 0x03 Native sector size [NATIVE_SECTOR_SIZE]: 0x00 Sector size emulation [USE_NATIVE_SECTOR]: 0x00 Sector size [DATA_SECTOR_SIZE]: 0x00 1st initialization after disabling sector size emulation [INI_TIMEOUT_EMU]: 0x00 Class 6 commands control [CLASS_6_CTRL]: 0x00 Number of addressed group to be Released[DYNCAP_NEEDED]: 0x00 Exception events control [EXCEPTION_EVENTS_CTRL]: 0x0000 Exception events status[EXCEPTION_EVENTS_STATUS]: 0x0000 Extended Partitions Attribute [EXT_PARTITIONS_ATTRIBUTE]: 0x0000 Context configuration [CONTEXT_CONF[51]]: 0x00 Context configuration [CONTEXT_CONF[50]]: 0x00 Context configuration [CONTEXT_CONF[49]]: 0x00 Context configuration [CONTEXT_CONF[48]]: 0x00 Context configuration [CONTEXT_CONF[47]]: 0x00 Context configuration [CONTEXT_CONF[46]]: 0x00 Context configuration [CONTEXT_CONF[45]]: 0x00 Context configuration [CONTEXT_CONF[44]]: 0x00 Context configuration [CONTEXT_CONF[43]]: 0x00 Context configuration [CONTEXT_CONF[42]]: 0x00 Context configuration [CONTEXT_CONF[41]]: 0x00 Context configuration [CONTEXT_CONF[40]]: 0x00 Context configuration [CONTEXT_CONF[39]]: 0x00 Context configuration [CONTEXT_CONF[38]]: 0x00 Context configuration [CONTEXT_CONF[37]]: 0x00 Packed command status [PACKED_COMMAND_STATUS]: 0x00 Packed command failure index [PACKED_FAILURE_INDEX]: 0x00 Power Off Notification [POWER_OFF_NOTIFICATION]: 0x00 Control to turn the Cache ON/OFF [CACHE_CTRL]: 0x01 eMMC Firmware Version: R eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x0b eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01 Secure Removal Type [SECURE_REMOVAL_TYPE]: 0x01 information is configured to be removed by an erase of the physical memory Supported Secure Removal Type: information removed by an erase of the physical memory
So yeah, this eMMC is EOL...
Ironically, running the memory test caused it to crash. -
@azdeltawye said in Multiple network failures after dirty upgrade to 23.01:
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b
Yes, I would get an SSD in there ASAP.
-
@azdeltawye I have a Netgate 2100 and I'm experiencing similar issues as yourself. DHCP seems to fail, can't log into the firewall, and console is unresponsive. After a very long time, console spit out the following "Solaris: WARNING: Pool 'pfSense' has encountered an uncorrectable I/O failure and has been suspended.", which is how I found your post. This all happened after the 23.01 update. For my unit, a reboot doesn't fix the issue, a reflash of the firmware and reload of the config is what gets the firewall going for an additional 2-3 days. I've opened a ticket with Netgate to troubleshoot this issue and they basically said that internal eMMC is dying. I provided these logs when even flashing the firmware was giving me a hard time :
(100s of these lines)
*mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cache
mmcsd0: failed to flush cacheLoader variables:
vfs.root.mountfrom=zfs:pfSense/ROOT/defaultManual root filesystem specification:
<fstype>:<device> [options]
Mount <device> using filesystem <fstype>
and with the specified (optional) option list.eg. ufs:/dev/da0s1a
zfs:zroot/ROOT/default
cd9660:/dev/cd0 ro
(which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)? List valid disk boot devices
. Yield 1 second (for background tasks)
<empty line> Abort manual inputmountroot>*
So yeah, I guess this is confirmation that I need to get a new SSD in there. Not sure why this only happened after 23.01 tho.
-
@packetsniffer
Yeah, my Netgate appliance eventually became unreachable via RJ45 ports and console serial port. I replaced, or actually added, the approved SATA SSD in an attempt to recover the unit but that did not help. The unit was toast...Takeaway lesson: Next Netgate appliance purchase - pony up the extra $$ for the MAX version. Preferably something without eMMC!
-
@packetsniffer Most likely because an update writes a lot to disk. For the record there is this:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-lifetime.html
and RAM disks can save writing: https://docs.netgate.com/pfsense/en/latest/troubleshooting/disk-writes.htmlhttps://www.netgate.com/supported-pfsense-plus-packages has remarks on whether an SSD is recommended for certain packages that can have intense logging depending on configuration.
-
@SteveITS Thank you.
I followed a few links to test the onboard memory, and it turns out mine was pretty dead.
I threw a new Transcend 512GB (TS512GMTS430S) in the Netgate 2100, flashed 23.01 on there, restored my config, and I've been solid for 2+ weeks.
I will be looking into cleaning up my logging to reduce wear and tear.
Thanks everyone!