Netgate 6100 SFP+ connection error rate of 0.0055%. Should I be worried?
-
I have recently deployed a 10 GbE SFP+ uplink between my Netgate 6100 and the rest of my network. the other end of the link is a NetGear XS512EM 10 GbE switch. The SFP+ connection is using an SFP+ active optical cable.
Everything seems to be working fine, but I have noticed that netstat -I (or -I) shows the following:
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll ix1 9000 <Link#6> 90:ec:77:7f:c9:d5 30121206 1437 0 70844120 0 0 ix1 - fe80::%ix1/64 fe80::92ec:77ff:fe7f:c9d5%ix1 705 - - 67 - -
The Ierrs value is slowly increasing and, on average, equals around 0.0055% of the Ipkts value, with some small variation. The switch on the other end of the connection does not show any errors on this connection.
Is this normal? If not, does it represent an issue with the SFP+ cable/hardware? If not then what else might be causing this? Any way I can get more detailed info on what kind of errors cause this counter to increment?
Thanks for any insights...
-
Yup check:
sysctl dev.ix.1.mac_stats
That shows the errors as different types. -
@stephenw10 Thanks. So here is what I see:
$ netstat -I ix1 Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll ix1 9000 <Link#6> 90:ec:77:7f:c9:d5 31488071 1452 0 74590982 0 0 ix1 - fe80::%ix1/64 fe80::92ec:77ff:fe7f:c9d5%ix1 1060 - - 98 - - $ sysctl dev.ix.1.mac_stats | grep err dev.ix.1.mac_stats.checksum_errs: 95 dev.ix.1.mac_stats.rec_len_errs: 1452 dev.ix.1.mac_stats.byte_errs: 0 dev.ix.1.mac_stats.ill_errs: 0 dev.ix.1.mac_stats.crc_errs: 0 dev.ix.1.mac_stats.rx_errs: 1452
So the errors pretty much all seem to be 'rec_len_errs', whatever they are... Still unclear if this a genuine issue or due to the complex configuration:
bridge0 includes ix0 (SFP+ not currently connected), ix1 (10 GbE SFP+ link to main switch), igc0 (2.5 GbE connection to a macOS system), igc1 (not currently connected). All connections are set to 'autoselect' and work as expected when connected. I also have 3 VLANs (100, 200 and 1003) defined across the bridged interfaces. Again, everything works and there are no obvious issues other than this very small error count.
-
I'd have to assume those are packets too large to be received. Anything on that bridge set for jumbo frames?
It's likely nothing to worry about though as you say.
-
@stephenw10 Yes, VLAN 200 (my storage network) uses jumbo frames (mtu 9000). To accommodate this the bridge and the various associated interfaces are set to mtu 9000. That VLAN is the only one which has any traffic with an mtg larger than default. This setup is working fine and IPv[46] traffic on VLAN 200 which traverses the Netgate 6100 bridge has the expected mss value.
Here's the ifconfig output for all the relevant interfaces:
igc0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 9000 description: LAN1 options=4e020bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether 90:ec:77:7f:c9:d6 inet6 fe80::92ec:77ff:fe7f:c9d6%igc0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (2500Base-T <full-duplex>) status: active nd6 options=1<PERFORMNUD> igc1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000 description: LAN2 options=4e020bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether 90:ec:77:7f:c9:d7 inet6 fe80::92ec:77ff:fe7f:c9d7%igc1 prefixlen 64 scopeid 0x2 media: Ethernet autoselect status: no carrier nd6 options=1<PERFORMNUD> ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000 description: LAN5 options=4e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether 90:ec:77:7f:c9:d4 inet6 fe80::92ec:77ff:fe7f:c9d4%ix0 prefixlen 64 scopeid 0x5 media: Ethernet autoselect status: no carrier nd6 options=1<PERFORMNUD> ix1: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 9000 description: LAN6 options=4e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether 90:ec:77:7f:c9:d5 inet6 fe80::92ec:77ff:fe7f:c9d5%ix1 prefixlen 64 scopeid 0x6 media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>) status: active nd6 options=1<PERFORMNUD> ix1.1003: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 description: GUEST options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d5 inet 172.16.200.1 netmask 0xffffff00 broadcast 172.16.200.255 inet6 fe80::92ec:77ff:fe7f:c9d5%ix1.1003 prefixlen 64 scopeid 0xe inet6 fdff::1 prefixlen 64 inet6 2001:470:6ac9:ffff::1 prefixlen 64 groups: vlan vlan: 1003 vlanproto: 802.1q vlanpcp: 0 parent interface: ix1 media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> igc0.100: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d6 inet6 fe80::92ec:77ff:fe7f:c9d6%igc0.100 prefixlen 64 scopeid 0xf groups: vlan vlan: 100 vlanproto: 802.1q vlanpcp: 0 parent interface: igc0 media: Ethernet autoselect (2500Base-T <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> igc1.100: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d7 inet6 fe80::92ec:77ff:fe7f:c9d7%igc1.100 prefixlen 64 scopeid 0x10 groups: vlan vlan: 100 vlanproto: 802.1q vlanpcp: 0 parent interface: igc1 media: Ethernet autoselect status: no carrier nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ix0.100: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d4 inet6 fe80::92ec:77ff:fe7f:c9d4%ix0.100 prefixlen 64 scopeid 0x11 groups: vlan vlan: 100 vlanproto: 802.1q vlanpcp: 0 parent interface: ix0 media: Ethernet autoselect status: no carrier nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ix1.100: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d5 inet6 fe80::92ec:77ff:fe7f:c9d5%ix1.100 prefixlen 64 scopeid 0x12 groups: vlan vlan: 100 vlanproto: 802.1q vlanpcp: 0 parent interface: ix1 media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ix0.1003: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d4 inet6 fe80::92ec:77ff:fe7f:c9d4%ix0.1003 prefixlen 64 scopeid 0x13 groups: vlan vlan: 1003 vlanproto: 802.1q vlanpcp: 0 parent interface: ix0 media: Ethernet autoselect status: no carrier nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ix1.200: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 9000 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d5 inet6 fe80::92ec:77ff:fe7f:c9d5%ix1.200 prefixlen 64 scopeid 0x14 groups: vlan vlan: 200 vlanproto: 802.1q vlanpcp: 0 parent interface: ix1 media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> ix0.200: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d4 inet6 fe80::92ec:77ff:fe7f:c9d4%ix0.200 prefixlen 64 scopeid 0x15 groups: vlan vlan: 200 vlanproto: 802.1q vlanpcp: 0 parent interface: ix0 media: Ethernet autoselect status: no carrier nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> igc0.200: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 9000 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d6 inet6 fe80::92ec:77ff:fe7f:c9d6%igc0.200 prefixlen 64 scopeid 0x16 groups: vlan vlan: 200 vlanproto: 802.1q vlanpcp: 0 parent interface: igc0 media: Ethernet autoselect (2500Base-T <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> igc1.200: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000 options=4600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6,MEXTPG> ether 90:ec:77:7f:c9:d7 inet6 fe80::92ec:77ff:fe7f:c9d7%igc1.200 prefixlen 64 scopeid 0x17 groups: vlan vlan: 200 vlanproto: 802.1q vlanpcp: 0 parent interface: igc1 media: Ethernet autoselect status: no carrier nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> bridge0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 9000 description: LAN options=0 ether 58:9c:fc:10:ff:9a inet 10.0.200.1 netmask 0xffffff00 broadcast 10.0.200.255 inet6 fd00::1 prefixlen 64 inet6 2001:470:1f09:2df::1 prefixlen 64 inet6 2001:470:6ac9::1 prefixlen 64 inet6 fe80::5a9c:fcff:fe10:ff9a%bridge0 prefixlen 64 scopeid 0x19 id 90:ec:77:7f:c9:d4 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 18:0f:76:5f:ca:6e priority 28672 ifcost 2000 port 6 member: igc0 flags=1c7<LEARNING,DISCOVER,STP,AUTOEDGE,PTP,AUTOPTP> ifmaxaddr 0 port 1 priority 128 path cost 200000 proto rstp role designated state forwarding member: igc1 flags=147<LEARNING,DISCOVER,STP,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 2 priority 128 path cost 2000000 proto rstp role disabled state discarding member: ix1 flags=1c7<LEARNING,DISCOVER,STP,AUTOEDGE,PTP,AUTOPTP> ifmaxaddr 0 port 6 priority 128 path cost 2000 proto rstp role root state forwarding member: ix0 flags=147<LEARNING,DISCOVER,STP,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 5 priority 128 path cost 2000 proto rstp role disabled state discarding groups: bridge nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
-
Hmm, do those VLANs function? I would have expected having their parent interfaces as bridge members to break the replies there. Though I don't think I've tested that in 23.09.1.
Historically bridges and VLANs have not combined well. I could absolutely imagine that causing those errors.
-
@stephenw10 Yup, all of the VLANS (100, 200 and 1003) are working just fine. For the one with jumbo frames (200), connections show the expected (large) mss and throughout is consistent with the increased frame size.
Apart from the weird receive length errors on the SFP+ uplink I have only notice one other anomaly which I have detailed here:
Again, though annoying this is not a dealbreaker.
Seems like FreeBSD and/or pfSense have a few potential bugettes to squash (or at least areas to enhance)...
-
Yeah when you combine VLANs and bridges you are in a grey area! I would want to confirm that mtu issue without the bridge first.
-
@stephenw10 So, I have now arrived at a point where I no longer have a bridge configured. The SFP+ interface in question is the primary (only) LAN interface. It does have one VLAN defined on it. Interface and VLAN MTU is 1500. I am still seeing a very low, but gradually increasing receive error count (tending towards ~0.005%) and all of the errors are 'rec_len' errors. This is very bemusing...
-
What is that NIC connected to? Any errors logged at the other end?
-
@stephenw10 It's connected, via an active optical cable, to an SFP+ 10 GB port on a TP-Link TL-SG3452X switch. That isn't reporting any errors at all for this link (or any others).
-
If it's actual fiber it can be worth cleaning it. Though I'd expect far more errors if it really was a dirt issue.
Other than that I'm not sure what else can be done.
You could try switching it to ix0.
-
@stephenw10 Active optical cables have integrated transceivers so there isn't really anything to clean. The error rate is very low so I will live with it for now. In a few weeks I will be rearranging some things so I will then try a DAC cable and/or ix0 instead of ix1 to see if that changes anything.
-
@ChrisJenk I just wanted to come back to this as it is still troubling me.
I have rearranged things and now the Netgate 6100 is connected to my primary switch via a 1m SFP+ DAC cable rather than a 15m SFP+ AOC cable. I'm also using different ports on the NetGate (ix0 now instead of ix1) and the switch. However I am still seeing moderate / sporadic busts of receive errors reported on the NetGate (netstat -i). Using sysctl again most of them seem to be rec_len errors. Again the switch end does not report any errors at all.
This is very perplexing... I am wondering if there may be a bug in the SFP+ driver, or some such?
-
@ChrisJenk said in Netgate 6100 SFP+ connection error rate of 0.0055%. Should I be worried?:
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
ix1 9000 <Link#6> 90:ec:77:7f:c9:d5 31488071 1452 0 74590982 0 0
ix1 - fe80::%ix1/64 fe80::92ec:77ff:fe7f:c9d5%ix1 1060 - - 98 - -Is that your WAN or LAN? While there's no problem with jumbo frames on the LAN, assuming other devices can handle them, you shouldn't be sending them to the WAN. PfSense should be sending ICMP too big messages when a jumbo frame tries to leave your LAN. You shouldn't be using jumbo frames on the WAN side, with the possible exception of if you're on Internet2.
-
@ChrisJenk said in Netgate 6100 SFP+ connection error rate of 0.0055%. Should I be worried?:
I am wondering if there may be a bug in the SFP+ driver, or some such?
That's always possible but it seems more likely it's actually packets that cannot be received correctly since the vast majority of ix installs do not see that.
@JKnott said in Netgate 6100 SFP+ connection error rate of 0.0055%. Should I be worried?:
You shouldn't be using jumbo frames on the WAN side
Yup that's true. Though I wouldn't expect to see receive errors generated by that.
-
@JKnott said in Netgate 6100 SFP+ connection error rate of 0.0055%. Should I be worried?:
@ChrisJenk said in Netgate 6100 SFP+ connection error rate of 0.0055%. Should I be worried?:
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
ix1 9000 <Link#6> 90:ec:77:7f:c9:d5 31488071 1452 0 74590982 0 0
ix1 - fe80::%ix1/64 fe80::92ec:77ff:fe7f:c9d5%ix1 1060 - - 98 - -Is that your WAN or LAN? While there's no problem with jumbo frames on the LAN, assuming other devices can handle them, you shouldn't be sending them to the WAN. PfSense should be sending ICMP too big messages when a jumbo frame tries to leave your LAN. You shouldn't be using jumbo frames on the WAN side, with the possible exception of if you're on Internet2.
It's my LAN and I do use Jumbo frames on that (carefully). However, the jumbo frames are restricted to two VLANs neither of which are configured on the NetGate so those frames should actually never reach the unit. I think that MTU is a hangover from an older config; I will set it back to the default.
However, in my current setup the LAN is now ix0 and it has an MTU of 1500.
Name Mtu Network Address Ipkts Ierrs Idrop Opkts **Oerrs** Coll ix0 1500 <Link#5> 90:ec:77:7f:c9:d4 17034814 **387** 0 20616901 0 0 ix0 - fe80::%ix0/64 fe80::92ec:77ff:fe7f:c9d4%ix0 4078 - - 18875 - - ix0 - 10.0.200.0/24 router 8993 - - 22104 - - ix0 - fd00::/64 router 8936 - - 9272 - - ix0 - xxxxxxxxx::/64 router.xxxxxxxxxxxx 0 - - 25628 - - ix0 - yyyyyyyyy::/64 yyyyyyyyyyyyyy::1 0 - - 42 - -