SG-1100 boot fails after power outage and can't reach network for reinstall
-
Tried reinstalling to see if that would fix the trouble but the installation fails early on when it appears to be partitioning/committing changes to the filesystem. It says:
mmcsd0: Error indicated: 1 Timeout
The end of the dmsg output is below:pcib0: <Marvell Armada 3700 PCIe Bus Controller> mem 0xd0070000-0xd008ffff irq 5 on simplebus0 pcib0: link never came up pci0: <OFW PCI bus> on pcib0 gpioled0: <GPIO LEDs> on ofwbus0 armv8crypto0: <AES-CBC,AES-XTS,AES-GCM> Timecounters tick every 1.000 msec mvneta0: link state changed to UP spibus0: <OFW SPI bus> on spi0 mx25l0: <M25Pxx Flash Family> at cs 0 mode 0 on spibus0 mx25l0: device type mx25u3235f, size 4096K in 64 sectors of 64K, erase size 4K usbus0: 5.0Gbps Super Speed USB v3.0 usbus1: 480Mbps High Speed USB v2.0 ugen0.1: <Generic XHCI root HUB> at usbus0 uhub0 on usbus0 uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 ugen1.1: <Marvell EHCI root HUB> at usbus1 uhub1 on usbus1 uhub1: <Marvell EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 mmc0: Failed to set VCCQ for card at relative address 2 mmcsd0: 8GB <MMCHC SEM08G 3.10 SN B1C929AA MFG 04/2017 by 69 0x0000> at mmc0 50.0MHz/8bit/65535-block mmcsd0boot0: 2MB partition 1 at mmcsd0 mmcsd0boot1: 2MB partition 2 at mmcsd0 mmcsd0rpmb: 2MB partition 3 at mmcsd0 Trying to mount root from ufs:/dev/ufs/pfSense_Install [ro,noatime]... CPU 0: ARM Cortex-A53 r0p4 affinity: 0 Cache Type = <64 byte D-cacheline,64 byte I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG> Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL> Instruction Set Attributes 1 = <> Instruction Set Attributes 2 = <> Processor Features 0 = <GIC,AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32> Processor Features 1 = <> Processor Features 2 = <> Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,1TB PA> Memory Model Features 1 = <8bit VMID> Memory Model Features 2 = <32bit CCIDX,48bit VA> Memory Model Features 3 = <> Memory Model Features 4 = <> Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8> Debug Features 1 = <> Auxiliary Features 0 = <> Auxiliary Features 1 = <> AArch32 Instruction Set Attributes 5 = <CRC32,SHA2,SHA1,AES+VMULL,SEVL> AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> CPU 1: ARM Cortex-A53 r0p4 affinity: 1 gic0: using for IPIs Release APs...done TCP_ratelimit: Is now initialized uhub0: 2 ports with 2 removable, self powered uhub1: 1 port with 1 removable, self powered e6000sw0port1: link state changed to DOWN e6000sw0port2: link state changed to DOWN e6000sw0port3: link state changed to UP Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 ugen1.2: <General UDisk> at usbus1 umass0 on uhub1 umass0: <General UDisk, class 0/0, rev 2.00/1.00, addr 2> on usbus1 mountroot: waiting for device /dev/ufs/pfSense_Install... da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 da0: <General UDisk 5.00> Removable Direct Access SCSI-2 device da0: Serial Number 2312131935211302379518 da0: 40.000MB/s transfers da0: 15000MB (30720000 512 byte sectors) da0: quirks=0x2<NO_6_BYTE> Warning: no time-of-day clock registered, system time will not be set accurately Dual Console: Video Primary, Serial Secondary random: unblocking device. lo0: link state changed to UP e6000sw0port3: link state changed to DOWN e6000sw0port3: link state changed to UP ZFS filesystem version: 5 ZFS storage pool version: features support (5000) sdhci_xenon1-slot0: Got AutoCMD12 error 0x0001, but there is no active command. mmcsd0: Error indicated: 1 Timeout sdhci_xenon1-slot0: ============== REGISTER DUMP ============== sdhci_xenon1-slot0: Sys addr: 0x08080000 | Version: 0x00000002 sdhci_xenon1-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000008 sdhci_xenon1-slot0: Argument: 0x007f32c9 | Trn mode: 0x00000037 sdhci_xenon1-slot0: Present: 0x01f20000 | Host ctl: 0x00000025 sdhci_xenon1-slot0: Power: 0x0000000f | Blk gap: 0x00000000 sdhci_xenon1-slot0: Wake-up: 0x00000000 | Clock: 0x00000407 sdhci_xenon1-slot0: Timeout: 0x0000000c | Int stat: 0x00000000 sdhci_xenon1-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_xenon1-slot0: AC12 err: 0x00000000 | Host ctl2:0x00000000 sdhci_xenon1-slot0: Caps: 0x25ec0099 | Caps2: 0x0000af77 sdhci_xenon1-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_xenon1-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_xenon1-slot0: =========================================== mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout e6000sw0port3: link state changed to DOWN root@pfSense-install:~ #
Is there anything to be done about this or is something broken beyond repair?
-
Yep .. Your internal storage device has died.
It has been debated that the kind of hardware used in your device was a poor design, as its very hard to replace (soldered to board), and are prone to wearouts over time.
More modern designs use an ordinary SSD in its place, - replaceable in a minute, if ever need be.
-
I believe error 6 is a general IO issue. Given that ZFS makes it very unlikely that a power outage would cause filesystem corruption and given the timeout message, it does seem to be a hardware issue with storage. I suggest checking with TAC since you have support with your Netgate appliance.
-
@m-d-frederiksen I'd be interested in your 3100 but isn't it end-of-life?
-
It is indeed EOL, hence the fair price.
Its however a very low milage speciment .. It only really filtered my old moms browsing for crosswords on Sundays :-)
BUT .. as Marcosm suggests above, you might be eligeble for a more hardcore kind of support, i.e. hardware replacement on warranty or some something ..
Suggest you follow that path first. Get back to me, if you so desire at a later point !
-
@marcosm Yeah, they come with support now but mine didn't. 5 yrs ago (purchased 1/2019) they sold them with 'community support' rather than TAC Lite. Thanks though.
-
@Jeffx123 If you're referring to the dashboard showing community support instead, it's not a problem - you still have TAC Lite support bundled with the appliance. Though given the date you've provided it's out of warranty and TAC will likely come to the same conclusion.
-
@Jeffx123 The Netgate Installer apparently lets you install onto USB on the NG1100. This is a change from the old installer which would only use the internal storage.
Maybe not ideal compared to emmc but it would get you back up and running as a stopgap if necessary.
-
@bigsy Thanks, that's an interesting idea, I'll dig into it.
-
@bigsy Took your suggestion and installed onto a USB drive. The device boots nicely. Interestingly the internal storage came back to life for a couple of boots before failing again. I'm able to use the USB 3.0 port and the performance seems fine (eg. time to boot), or at least as good as it was before. Apparently there are questions as to how long a usb drive will survive the write load, we'll see. Thanks again for the suggestion, if it works out, it seems a nice way to give the hardware a new lease on life.