SG-1100 boot fails after power outage and can't reach network for reinstall
-
@m-d-frederiksen Yes, I'm trying to use the latest image. I did get a new copy of the installer from Netgate and that's what I'm booting. The part that's failing is where it then connects to the network to download the new image.
I did some more digging and checked the network with ipconfig. It shows the wan port as mvneta0.4091 with an address of 192.168.1.1. There are notes in messages.log that 192.169.1.1 is inuse (by the current gateway)
Sep 19 17:10:37 pfSense-install kernel: arp: ac:91:9b:64:8f:52 is using my IP address 192.168.1.1 on mvneta0.4091! Sep 19 17:11:01 pfSense-install syslogd: last message repeated 3 times
I set the address with ifconfig to 192.168.1.2/24. With this I can ping other machines on my network although they cannot ping the SG1100. The SG1100 cannot ping machines outside of my network (eg. 8.8.8.8)
Unfortunately the install does not proceed which isn't all that surprising as I couldn't reach outside of my network from the command line either. Also, when I exit back to the installer, this setting is lost.
Original config before I forced the ipaddr to 192.168.1.2:
mvneta0.4091: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=3<RXCSUM,TXCSUM> ether f0:ad:4e:08:6c:1c inet 192.168.1.1 netmask 0xffffff00 broadcast 192.168.1.255 inet6 fe80::f2ad:4eff:fe08:6c1c%mvneta0.4091 prefixlen 64 scopeid 0xa groups: vlan vlan: 4091 vlanproto: 802.1q vlanpcp: 0 parent interface: mvneta0 media: Ethernet 1000baseT <full-duplex> status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> mvneta0.4090: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=3<RXCSUM,TXCSUM> ether f0:ad:4e:08:6c:1c inet 192.168.1.161 netmask 0xffffff00 broadcast 192.168.1.255 inet6 fe80::f2ad:4eff:fe08:6c1c%mvneta0.4090 prefixlen 64 scopeid 0xb groups: vlan vlan: 4090 vlanproto: 802.1q vlanpcp: 0 parent interface: mvneta0 media: Ethernet 1000baseT <full-duplex> status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Any thoughts about how to restore network connectivity?
Also, it seems somewhat odd that vlan 4090 has a local address as if it is the LAN port and 4091 started with the .1.1 address (WAN port) but in the installer it seems to think the LAN is on 4091
│ │ │ Detected: Netgate 1100 │ │ │ │ Please confirm the interface assignment to continue with the │ │ installation. │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ LAN mvneta0 (active) vlan 4091 │ │ │ │ WAN mvneta0 (active) vlan 4090
-
Warning again - I am no expert !
But .. It would be rather helpful with diagnosis, if you were to capture
a log of some sorts from the entire recovery-process.You see - the installer is designed to leave behind a basic (but WORKING) setup of both a LAN and a WAN .. across all platforms.
And the stuff you describe must be caused by either a real hardware-error induced by your original "happening", or the above mentioned log should allow us to pinpoint where things go wrong.
If you cant get to log the entire recovery process, consider filming it on your phone in a decent quality, such that it can be studied in details .. frame by frame if need be :-)
Making the video available to this audience in here should be possible with a simple youtube-link or some such, perhaps ?
Anyway - best of luck whatever you end up doing.
-
@m-d-frederiksen Thanks. I ended up replacing the working router with the SG1100 so that the SG1100 could have 192.168.1.1. Now it can contact the Netgate server and the installation process proceeded past that point. Unfortunately it looks like I may have some hardware issue. After asking a few questions it said it was "Partitioning" and "Committing changes" and then a series of
mmcsd0: Error indicated: 1 Timeout
before it reported the installation was aborted.
-
The recovery process offered to erase and format the storage you intended to install on .. but failed ?
Yep - hardware error - move to trashcan :-)
I am selling my SD-3100 cheap, as I got my hands on a SD-4200 Max
Say when :-)
-
Tried reinstalling to see if that would fix the trouble but the installation fails early on when it appears to be partitioning/committing changes to the filesystem. It says:
mmcsd0: Error indicated: 1 Timeout
The end of the dmsg output is below:pcib0: <Marvell Armada 3700 PCIe Bus Controller> mem 0xd0070000-0xd008ffff irq 5 on simplebus0 pcib0: link never came up pci0: <OFW PCI bus> on pcib0 gpioled0: <GPIO LEDs> on ofwbus0 armv8crypto0: <AES-CBC,AES-XTS,AES-GCM> Timecounters tick every 1.000 msec mvneta0: link state changed to UP spibus0: <OFW SPI bus> on spi0 mx25l0: <M25Pxx Flash Family> at cs 0 mode 0 on spibus0 mx25l0: device type mx25u3235f, size 4096K in 64 sectors of 64K, erase size 4K usbus0: 5.0Gbps Super Speed USB v3.0 usbus1: 480Mbps High Speed USB v2.0 ugen0.1: <Generic XHCI root HUB> at usbus0 uhub0 on usbus0 uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 ugen1.1: <Marvell EHCI root HUB> at usbus1 uhub1 on usbus1 uhub1: <Marvell EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 mmc0: Failed to set VCCQ for card at relative address 2 mmcsd0: 8GB <MMCHC SEM08G 3.10 SN B1C929AA MFG 04/2017 by 69 0x0000> at mmc0 50.0MHz/8bit/65535-block mmcsd0boot0: 2MB partition 1 at mmcsd0 mmcsd0boot1: 2MB partition 2 at mmcsd0 mmcsd0rpmb: 2MB partition 3 at mmcsd0 Trying to mount root from ufs:/dev/ufs/pfSense_Install [ro,noatime]... CPU 0: ARM Cortex-A53 r0p4 affinity: 0 Cache Type = <64 byte D-cacheline,64 byte I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG> Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL> Instruction Set Attributes 1 = <> Instruction Set Attributes 2 = <> Processor Features 0 = <GIC,AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32> Processor Features 1 = <> Processor Features 2 = <> Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,1TB PA> Memory Model Features 1 = <8bit VMID> Memory Model Features 2 = <32bit CCIDX,48bit VA> Memory Model Features 3 = <> Memory Model Features 4 = <> Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8> Debug Features 1 = <> Auxiliary Features 0 = <> Auxiliary Features 1 = <> AArch32 Instruction Set Attributes 5 = <CRC32,SHA2,SHA1,AES+VMULL,SEVL> AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> CPU 1: ARM Cortex-A53 r0p4 affinity: 1 gic0: using for IPIs Release APs...done TCP_ratelimit: Is now initialized uhub0: 2 ports with 2 removable, self powered uhub1: 1 port with 1 removable, self powered e6000sw0port1: link state changed to DOWN e6000sw0port2: link state changed to DOWN e6000sw0port3: link state changed to UP Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 Root mount waiting for: usbus1 ugen1.2: <General UDisk> at usbus1 umass0 on uhub1 umass0: <General UDisk, class 0/0, rev 2.00/1.00, addr 2> on usbus1 mountroot: waiting for device /dev/ufs/pfSense_Install... da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 da0: <General UDisk 5.00> Removable Direct Access SCSI-2 device da0: Serial Number 2312131935211302379518 da0: 40.000MB/s transfers da0: 15000MB (30720000 512 byte sectors) da0: quirks=0x2<NO_6_BYTE> Warning: no time-of-day clock registered, system time will not be set accurately Dual Console: Video Primary, Serial Secondary random: unblocking device. lo0: link state changed to UP e6000sw0port3: link state changed to DOWN e6000sw0port3: link state changed to UP ZFS filesystem version: 5 ZFS storage pool version: features support (5000) sdhci_xenon1-slot0: Got AutoCMD12 error 0x0001, but there is no active command. mmcsd0: Error indicated: 1 Timeout sdhci_xenon1-slot0: ============== REGISTER DUMP ============== sdhci_xenon1-slot0: Sys addr: 0x08080000 | Version: 0x00000002 sdhci_xenon1-slot0: Blk size: 0x00007200 | Blk cnt: 0x00000008 sdhci_xenon1-slot0: Argument: 0x007f32c9 | Trn mode: 0x00000037 sdhci_xenon1-slot0: Present: 0x01f20000 | Host ctl: 0x00000025 sdhci_xenon1-slot0: Power: 0x0000000f | Blk gap: 0x00000000 sdhci_xenon1-slot0: Wake-up: 0x00000000 | Clock: 0x00000407 sdhci_xenon1-slot0: Timeout: 0x0000000c | Int stat: 0x00000000 sdhci_xenon1-slot0: Int enab: 0x01ff003b | Sig enab: 0x01ff003b sdhci_xenon1-slot0: AC12 err: 0x00000000 | Host ctl2:0x00000000 sdhci_xenon1-slot0: Caps: 0x25ec0099 | Caps2: 0x0000af77 sdhci_xenon1-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_xenon1-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_xenon1-slot0: =========================================== mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout mmcsd0: Error indicated: 1 Timeout e6000sw0port3: link state changed to DOWN root@pfSense-install:~ #
Is there anything to be done about this or is something broken beyond repair?
-
Yep .. Your internal storage device has died.
It has been debated that the kind of hardware used in your device was a poor design, as its very hard to replace (soldered to board), and are prone to wearouts over time.
More modern designs use an ordinary SSD in its place, - replaceable in a minute, if ever need be.
-
I believe error 6 is a general IO issue. Given that ZFS makes it very unlikely that a power outage would cause filesystem corruption and given the timeout message, it does seem to be a hardware issue with storage. I suggest checking with TAC since you have support with your Netgate appliance.
-
@m-d-frederiksen I'd be interested in your 3100 but isn't it end-of-life?
-
It is indeed EOL, hence the fair price.
Its however a very low milage speciment .. It only really filtered my old moms browsing for crosswords on Sundays :-)
BUT .. as Marcosm suggests above, you might be eligeble for a more hardcore kind of support, i.e. hardware replacement on warranty or some something ..
Suggest you follow that path first. Get back to me, if you so desire at a later point !
-
@marcosm Yeah, they come with support now but mine didn't. 5 yrs ago (purchased 1/2019) they sold them with 'community support' rather than TAC Lite. Thanks though.
-
@Jeffx123 If you're referring to the dashboard showing community support instead, it's not a problem - you still have TAC Lite support bundled with the appliance. Though given the date you've provided it's out of warranty and TAC will likely come to the same conclusion.
-
@Jeffx123 The Netgate Installer apparently lets you install onto USB on the NG1100. This is a change from the old installer which would only use the internal storage.
Maybe not ideal compared to emmc but it would get you back up and running as a stopgap if necessary.
-
@bigsy Thanks, that's an interesting idea, I'll dig into it.