SG-5100 lost all ix ports after power outage
-
@stephenw10
Here is a section of boot log pertaining to the ix0 interface. I don't see reference to ix1, ix2, or ix3 maybe it it gives up if doesn't see the first one.AHCI v1.31 with 1 6Gbps ports, Port Multiplier supported ahcich8: <AHCI channel> at channel 7 on ahci1 ahciem1: <AHCI enclosure management bridge> on ahci1 xhci0: <Intel Denverton USB 3.0 controller> mem 0xdff80000-0xdff8ffff irq 19 at device 21.0 on pci0 xhci0: 32 bytes context size, 64-bit DMA usbus0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pcib6: <ACPI PCI-PCI bridge> irq 16 at device 22.0 on pci0 pci6: <ACPI PCI bus> on pcib6 pci6: <network, ethernet> at device 0.0 (no driver attached) pci6: <network, ethernet> at device 0.1 (no driver attached) pcib7: <ACPI PCI-PCI bridge> at device 23.0 on pci0 pci7: <ACPI PCI bus> on pcib7 ix0: <Intel(R) X553 L (1GbE)> mem 0xdf400000-0xdf5fffff,0xdf604000-0xdf607fff irq 16 at device 0.0 on pci7 ix0: Hardware initialization failed ix0: IFDI_ATTACH_PRE failed 5 device_attach: ix0 attach returned 5 ix0: <Intel(R) X553 L (1GbE)> mem 0xdf200000-0xdf3fffff,0xdf600000-0xdf603fff irq 17 at device 0.1 on pci7 ix0: Hardware initialization failed ix0: IFDI_ATTACH_PRE failed 5 device_attach: ix0 attach returned 5 pci0: <simple comms> at device 24.0 (no driver attached) uart2: <Intel Denverton UART> port 0xe080-0xe087 mem 0xdff9d000-0xdff9d0ff irq 16 at device 26.0 on pci0 uart2: Using 1 MSI message
Here is output of pciconf -lv and I notice there is none6,7,8,9 with X553... 1GbE near bottom.
igb0@pci0:3:0:0: class=0x020000 card=0x0000ffff chip=0x15338086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'I210 Gigabit Network Connection' class = network subclass = ethernet igb1@pci0:4:0:0: class=0x020000 card=0x0000ffff chip=0x15338086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'I210 Gigabit Network Connection' class = network subclass = ethernet none6@pci0:6:0:0: class=0x020000 card=0x00008086 chip=0x13068086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet none7@pci0:6:0:1: class=0x020000 card=0x00008086 chip=0x13068086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet none8@pci0:8:0:0: class=0x020000 card=0x00008086 chip=0x15e58086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X553 1GbE' class = network subclass = ethernet none9@pci0:8:0:1: class=0x020000 card=0x00008086 chip=0x15e58086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X553 1GbE' class = network subclass = ethernet
-
Hmm, well that's.... odd! Two of them appear to have somehow changed device ID. And the others are failing to respond to the driver...
They should appear as:ix0@pci0:6:0:0: class=0x020000 card=0x00008086 chip=0x15e48086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X553 1GbE' class = network subclass = ethernet ix1@pci0:6:0:1: class=0x020000 card=0x00008086 chip=0x15e48086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X553 1GbE' class = network subclass = ethernet ix2@pci0:8:0:0: class=0x020000 card=0x00008086 chip=0x15e58086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X553 1GbE' class = network subclass = ethernet ix3@pci0:8:0:1: class=0x020000 card=0x00008086 chip=0x15e58086 rev=0x11 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X553 1GbE' class = network subclass = ethernet
-
@stephenw10 Ok, so is probably some hardware issue I gather.
-
It looks like it. I'm just trying to see if we've ever seen anything like that previously.
I assume it has been through a complete power cycle since the outage? -
@stephenw10 Yes, several power cycles.
When we get the 6100 next week I will be able to do more experiments on it.
-
Was your extended power outage due to weather perhaps? If so, I would suspect damaged hardware via a transient surge. Ethernet cabling can make a dandy antenna for picking up EMF surges caused from nearby lightning strikes. I've had switch ports destroyed that way in the past.
-
@bmeeks said in SG-5100 lost all ix ports after power outage:
tro
It was a scheduled outage due to building maintenance.
-
When you are able there is something we might try to rewrite the NIC eeprom contents in case it has somehow been corrupted. That's about the only thing I could imagine changing the PCI device IDs like that. The driver is not complaining about the eeprom checksum which it usually would but it could be it doesn't get that far.
-
@stephenw10 I just received 6100 today and restored to it. Therefore now I have SG5100 to do whatever you want to try regarding eeprom.
-
You can try to reset the NIC EEPROM on the 5100 as follows:
- Connect to the serial console
- Power on the device
- Hit
Esc
orF12
to get the boot menu and chooseEnter Setup
- Go into the BIOS settings under Advanced > CSM Configuration
- Change Network to a different value such as UEFI
- Save/exit and reboot
That will nudge it to rewrite the NIC EEPROM and then it should boot (but perhaps slower).
If the NICs work again after that, then go back into the BIOS and change that same setting back to Legacy or whatever it was on yours to start with.
If that does not help, then it probably is a hardware failure.
-
@jimp I tried and did not fix. Thanks for your assistance.
I can continue to use as a test/backup router since it has two working ports still.
-
@brians said in SG-5100 lost all ix ports after power outage:
@jimp I tried and did not fix. Thanks for your assistance.
I can continue to use as a test/backup router since it has two working ports still.
Oh well, it was worth a shot.
Out of curiosity, was the error in the system log still the same as before?
-
@jimp Yes the error was same.
-
Incase somebody finds this thread in the future, there does appear to be issues with some netgate hardware and hard power downs.
Part of our final testing for a new datacenter deployment we perform a hard failover of all equipment by removing the power on each of the redundant pairs of equipment. After removing the power on our 1537 the unit booted back up and ix0 and ix1 were no longer registered. The two onboard copper ports and the expansion slot ports all worked.
pciconf -lv showed the PCI device, but like the OP, it was registered to none@pci. We tried the EEPROM rewrite and that didn't help. Netgate support collected a full status dump of the device, then issued us an RMA for replacement.
Hopefully this saves somebody some time, this hardware appears to be sensitive to unclean shutdowns.
-
Hmm, that's weird. I've seen SFP ports behave oddly across a power cycle, especially with connecting module before vs after boot. But the ix driver was patched to still allow it to attach.