SFP+ Problem after 22.02 upgrade on 1537
-
Good evening,
I am having an issue with one of my XG-1537 units after upgrading to 22.02. It seems as if the LAN side SFP+ is no longer being recognized, and shows the following in it's boot sequence:
ix1: Unsupported SFP+ module detected!ifconfig -v ix1 shows:
ix1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: OPT9 options=e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:00:00:00:00:00 inet6 fe80::3eec:efff:fe30:b24e%ix1 prefixlen 64 scopeid 0x2 media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 10G Base-LRM (LC) vendor: FS PN: SFP-10GLRM-31 SN: F2031327657 DATE: 2021-05-11 module temperature: 35.88 C Voltage: 3.31 Volts RX: 0.69 mW (-1.61 dBm) TX: 0.52 mW (-2.79 dBm)
This is one unit of an HA pair in a co-lo facility, so, naturally, fixing this with a site visit is going to be a nightmare of paperwork if I need to swap out SFP+ modules. Our second unit is still working fine on 21.05.2
The working unit shows the following when running ifconfig -v ix1:
ix1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether 3c:ec:ef:3d:46:05 inet6 fe80::3eec:efff:fe3d:4605%ix1 prefixlen 64 scopeid 0x2 media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 10G Base-LRM (LC) vendor: FS PN: SFP-10GLRM-31 SN: F2031327658 DATE: 2021-05-11 module temperature: 36.15 C Voltage: 3.28 Volts RX: 0.77 mW (-1.08 dBm) TX: 0.55 mW (-2.53 dBm)
I have another 2 units in a different co-lo facility that ran fine, when I checked the interfaces, they are using a different model as that facility uses SMF as opposed to MMF.
Where the message shows in the boot logs, I am thinking this might be a FreeBSD issue between 12.2 and 12.3 possibly. The same unit having problems on it's LAN side is still working on the WAN side with 1G SMF optics, same vendor. If this is a FreeBSD change, it's strange this SFP module is no longer supported, as it's a VERY common module.
Am I just shooting out into left field here, or is my hunch somewhat valid? If it is just the SFP module, I can probably ship our provider a replacement module and have them swap them out for a 'nominal' charge.
Thanks - Marc
-
I'm not actually seeing a significant difference there. Both show as active and linked.
Is it not passing any traffic?You can try adding the loader variable:
hw.ix.unsupported_sfp=1
Create the file /boot/loader.conf.local and put that in it.
That shouldn't be required though.Steve
-
Are there SFP modules in both ports? If so, are you using them both or just that one?
I can only think of one thing that might be relevant. In previous versions, if there was a completely unsupported SFP (e.g. a media type that was invalid for the card) the entire device might not have initialized or showed up, but that is different in 22.02. Now it initializes but wouldn't link. That could maybe change the probe order of the interfaces depending on how you had things plugged in, but I'd expect it to be similar otherwise.
Check the
dmesg
output on the old vs the new, and theifconfig -vvvv
output for eachix<x>
interface on the old and new version and compare. -
@jimp Yes, both ix0 and ix1 have SFP modules. WAN port is on ix0 using an SFP, LAN is ix1 using the SFP that no longer works. ifconfig results are below, I added -v for additional info.
@stephenw10 sadly that option did now work. All LAN side networks are VLANs, they do all show as 'up' on the interfaces list, but the MAC is now shown as 00:00:00:00:00:00, the other HA unit shows the MAC properly.
ifconfig -v ix0 on the non-functioning unit(this interface does work, it is SFP based, but using a 1G as opposed to 10G:
I diffed the 2 ix1 ifconfig interfaces, and the only consequential difference between the two now is the MAC. The diff on the dmesg is a differnt story, the driver now reports as "Intel(R) X552 (SFP+)" on the non-functioning unit, while the functioning unit shows "Intel(R) PRO/10GbE PCI-Express Network Driver"
ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: WAN_BID options=e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether 3c:ec:ef:30:b2:4e inet6 fe80::3eec:efff:fe30:b24e%ix0 prefixlen 64 scopeid 0x1 inet6 IPv6_WAN::1b prefixlen 125 inet6 IPv6_WAN::1a prefixlen 125 vhid 101 inet6 IPv6_WAN2::140 prefixlen 64 vhid 101 inet6 IPv6_WAN2::141 prefixlen 64 vhid 101 inet6 IPv6_WAN2::142 prefixlen 64 vhid 101 inet6 IPv6_WAN2::143 prefixlen 64 vhid 101 inet6 IPv6_WAN2::144 prefixlen 64 vhid 101 inet6 IPv6_WAN2::145 prefixlen 64 vhid 101 inet6 IPv6_WAN2::146 prefixlen 64 vhid 101 inet6 IPv6_WAN2::155 prefixlen 64 vhid 101 inet6 IPv6_WAN2::139 prefixlen 64 vhid 101 inet6 IPv6_WAN2::158 prefixlen 64 vhid 101 inet IPv4_WAN.131 netmask 0xfffffff8 broadcast IPv4_WAN.135 inet IPv4_WAN.130 netmask 0xfffffff8 broadcast IPv4_WAN.135 vhid 201 inet IPv4_WAN.133 netmask 0xfffffff8 broadcast IPv4_WAN.135 vhid 201 inet IPv4_WAN.134 netmask 0xfffffff8 broadcast IPv4_WAN.135 vhid 201 inet IPv4_WAN2.131 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.140 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.141 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.142 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.143 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.144 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.145 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.146 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.155 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.132 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.139 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.158 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 carp: BACKUP vhid 201 advbase 1 advskew 254 carp: BACKUP vhid 101 advbase 1 advskew 254 media: Ethernet autoselect (1000baseSX <full-duplex,rxpause,txpause>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 1000BASE-SX (LC) vendor: FS PN: SFP1G-SX-85 SN: F2031086594 DATE: 2021-09-29 module temperature: 34.09 C Voltage: 3.22 Volts RX: 0.36 mW (-4.41 dBm) TX: 0.30 mW (-5.13 dBm)
ifconfig -v ix1 on the non-functioning unit(this is the broken interface)
ix1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:00:00:00:00:00 inet6 fe80::3eec:efff:fe30:b24e%ix1 prefixlen 64 scopeid 0x2 media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 10G Base-LRM (LC) vendor: FS PN: SFP-10GLRM-31 SN: F2031327657 DATE: 2021-05-11 module temperature: 35.72 C Voltage: 3.31 Volts RX: 0.69 mW (-1.61 dBm) TX: 0.52 mW (-2.79 dBm)
dmesg interface portion of the non-functioning unit:
ix0: <Intel(R) X552 (SFP+)> mem 0xf9a00000-0xf9bfffff,0xf9c04000-0xf9c07fff irq 11 at device 0.0 on pci4 ix0: Using 2048 TX descriptors and 2048 RX descriptors ix0: Using 8 RX queues 8 TX queues ix0: Using MSI-X interrupts with 9 vectors ix0: allocated for 8 queues ix0: allocated for 8 rx queues ix0: Ethernet address: 3c:ec:ef:30:b2:4e ix0: eTrack 0x800005b9 ix0: netmap queues/slots: TX 8/2048, RX 8/2048 ix1: <Intel(R) X552 (SFP+)> mem 0xf9800000-0xf99fffff,0xf9c00000-0xf9c03fff irq 10 at device 0.1 on pci4 ix1: Unsupported SFP+ module detected! ix1: Using 2048 TX descriptors and 2048 RX descriptors ix1: Using 8 RX queues 8 TX queues ix1: Using MSI-X interrupts with 9 vectors ix1: allocated for 8 queues ix1: allocated for 8 rx queues ix1: eTrack 0x800005b9 ix1: netmap queues/slots: TX 8/2048, RX 8/2048
ifconfig -v ix0 on the functioning unit
ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: WAN_BID options=e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether 3c:ec:ef:3d:46:04 inet6 fe80::3eec:efff:fe3d:4604%ix0 prefixlen 64 scopeid 0x1 inet6 IPv6_WAN::1c prefixlen 125 inet6 IPv6_WAN::1a prefixlen 125 vhid 101 inet6 IPv6_WAN2::140 prefixlen 64 vhid 101 inet6 IPv6_WAN2::141 prefixlen 64 vhid 101 inet6 IPv6_WAN2::142 prefixlen 64 vhid 101 inet6 IPv6_WAN2::143 prefixlen 64 vhid 101 inet6 IPv6_WAN2::144 prefixlen 64 vhid 101 inet6 IPv6_WAN2::145 prefixlen 64 vhid 101 inet6 IPv6_WAN2::146 prefixlen 64 vhid 101 inet6 IPv6_WAN2::155 prefixlen 64 vhid 101 inet6 IPv6_WAN2::139 prefixlen 64 vhid 101 inet6 IPv6_WAN2::158 prefixlen 64 vhid 101 inet IPv4_WAN.132 netmask 0xfffffff8 broadcast IPv4_WAN.135 inet IPv4_WAN.130 netmask 0xfffffff8 broadcast IPv4_WAN.135 vhid 201 inet IPv4_WAN.133 netmask 0xfffffff8 broadcast IPv4_WAN.135 vhid 201 inet IPv4_WAN.134 netmask 0xfffffff8 broadcast IPv4_WAN.135 vhid 201 inet IPv4_WAN2.131 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.140 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.141 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.142 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.143 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.144 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.145 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.146 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.155 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.132 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.139 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 inet IPv4_WAN2.158 netmask 0xffffffe0 broadcast IPv4_WAN2.159 vhid 201 carp: MASTER vhid 201 advbase 1 advskew 100 carp: MASTER vhid 101 advbase 1 advskew 100 media: Ethernet autoselect (1000baseSX <full-duplex,rxpause,txpause>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 1000BASE-SX (LC) vendor: FS PN: SFP1G-SX-85 SN: F2031086595 DATE: 2021-09-29 module temperature: 30.05 C Voltage: 3.26 Volts RX: 0.44 mW (-3.53 dBm) TX: 0.29 mW (-5.26 dBm)
ifconfig -v ix1 on the functioning unit
ix1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> ether 3c:ec:ef:3d:46:05 inet6 fe80::3eec:efff:fe3d:4605%ix1 prefixlen 64 scopeid 0x2 media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 10G Base-LRM (LC) vendor: FS PN: SFP-10GLRM-31 SN: F2031327658 DATE: 2021-05-11 module temperature: 35.52 C Voltage: 3.28 Volts RX: 0.78 mW (-1.05 dBm) TX: 0.55 mW (-2.53 dBm)
dmesg interface portion of the non-functioning unit:
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver> mem 0xf9a00000-0xf9bfffff,0xf9c04000-0xf9c07fff irq 11 at device 0.0 on pci4 ix0: Using 2048 TX descriptors and 2048 RX descriptors ix0: Using 8 RX queues 8 TX queues ix0: Using MSI-X interrupts with 9 vectors ix0: allocated for 8 queues ix0: allocated for 8 rx queues ix0: Ethernet address: 3c:ec:ef:3d:46:04 ix0: netmap queues/slots: TX 8/2048, RX 8/2048 ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver> mem 0xf9800000-0xf99fffff,0xf9c00000-0xf9c03fff irq 10 at device 0.1 on pci4 ix1: Using 2048 TX descriptors and 2048 RX descriptors ix1: Using 8 RX queues 8 TX queues ix1: Using MSI-X interrupts with 9 vectors ix1: allocated for 8 queues ix1: allocated for 8 rx queues ix1: Ethernet address: 3c:ec:ef:3d:46:05 ix1: netmap queues/slots: TX 8/2048, RX 8/2048
-
Ah, sorry I completely missed the invalid MAC address.
Hmm, are you able to test swapping the modules?
You could try spoofing the MAC address on the NIC back to the real one.
I'm not sure that will help though with a low level issue like that.Steve
-
@stephenw10 I'll see about spoofing the MAC later tonight when I get home, but as you mentioned, this sounds lower leveled than that.
As for swapping the module, this is a production unit in a UTI Tier IV co-lo facility, not an easy task, when the hardware was upgraded to the 1537s from 7100s back in December, I almost wasn't even permitted to go perform the work due to a change in ownership of the facility.
-
Hmm, the fact it only did it on the LAN side with seemingly identical modules sure seems like a module issue.
One thing I have seen on other hardware is that once a NIC gets into this mode it can be stuck there until you fully power cycle the unit. A reboot may not be sufficient to reset it. So if you can do that and haven't tried yet I would do that first.
Steve
-
Ok, full power cycle was a no-go. I powered the unit down via IPMI, and powered it back on, no go.
Spoofing the MAC was a no-go as well. That however was an interesting experience. Where all of our LAN side networks are VLANs, there was no untagged ix1 interface, which didn't allow me to spoof the MAC, I had to add a new ix1 interface to spoof the MAC.
I think at this point, I am going to assume it's got something to do with the compatibility with the SFP+ in something related to the update. The working WAN SFP+ is a SFP1G-SX-85(MMF) and seems to be working fine. My second datacenter site has another pair of HA 1537 units, and both updated flawlessly, and they each have 2x SFP-10GSR-85(MMF) modules. The unit that has lost LAN connectivity has an SFP-10GLRM-31(SMF) module. I have ordered some new E10GSFPLR Intel coded SFP-10GSR-85(SMF), I'm hoping my co-lo provider will be a little more helpful this time around, and do an SFP+ swap with minimal whining. If all else fails, I'll see if they'd be willing to switch the run to their core switch from SMF to MMF and use SFP-10GSR-85 modules that I know work. In all honesty, using SMF inside the DC is just a waste of money as the transceivers are more costly.
Thanks again for the help,
- Marc
On a side note, I checked pciconf -lv for good measure, and all 4 1537 units show the exact same information(3 on 22.01 and one still on 21.05.2):
ix0@pci0:3:0:0: class=0x020000 card=0x15ac15d9 chip=0x15ac8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X552 10 GbE SFP+' class = network subclass = ethernet ix1@pci0:3:0:1: class=0x020000 card=0x15ac15d9 chip=0x15ac8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection X552 10 GbE SFP+' class = network subclass = ethernet
Once I get this solved, all that's left is my VTI IPSec problem, and I'm all set!
-
@mmapplebeck said in SFP+ Problem after 22.02 upgrade on 1537:
I powered the unit down via IPMI, and powered it back on, no go.
Mmm, that may not be sufficient. Because the NIC remains at some level still powered when the unit is shutdown it may not clear that state. We have seen devices they required removing power entirely to reset. Obviously that would only be possible if you have remote access switched power of some kind. or with on-site hands.
It seems more like the module itself but I can't see why it wouldn't be supported in 22.01 when it was in 21.05.Steve