XG-2758 / 22.05: Packet Errors on 10G-SFP+
-
OK, if you run
ifconfig -vm ix0
does it show hardware checksum actually disabled?If it does it could actually be bad checksums of course.
-
ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6>
VLAN_HWCSUM - so I guess the answer is "no"? It's checked in the settings though and I did reboot - both machines actually and still it's the same on all four interfaces ...
-
Are you using VLANs there though?
RXCSUM and TXCSUM are disabled as expected at least.
-
Well, not sure how BSD handles that to be honest. We have about 25 VLAN on the lagg interface and I would expect that the underlying interfaces see them, too?
-
Check what shows as enabled on lagg0.
Try disabling the vlan hardware checksum as a test.
[22.09-DEVELOPMENT][admin@6100.stevew.lan]/root: ifconfig -m ix1 ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: IX1 options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:08:a2:12:17:7f inet6 fe80::208:a2ff:fe12:177f%ix1 prefixlen 64 scopeid 0x6 inet 192.168.79.2 netmask 0xffffff00 broadcast 192.168.79.255 media: Ethernet autoselect (1000baseSX <full-duplex,rxpause,txpause>) status: active supported media: media autoselect media 1000baseSX media 10Gbase-SR nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> [22.09-DEVELOPMENT][admin@6100.stevew.lan]/root: ifconfig ix1 -vlanhwcsum [22.09-DEVELOPMENT][admin@6100.stevew.lan]/root: ifconfig -m ix1 ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: IX1 options=813838<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6> ether 00:08:a2:12:17:7f inet6 fe80::208:a2ff:fe12:177f%ix1 prefixlen 64 scopeid 0x6 inet 192.168.79.2 netmask 0xffffff00 broadcast 192.168.79.255 media: Ethernet autoselect (1000baseSX <full-duplex,rxpause,txpause>) status: active supported media: media autoselect media 1000baseSX media 10Gbase-SR nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
-
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6>
No improvements by removing VLAN_HWSUM. Plus, it comes back automatically after restarting.
-
Yeah you would need to add a command to remove it at each boot if it helped.
Did you try removing it from the ix0/1 as well as lagg0?
It might not be a problem in pfSense of course. Those might be be real checksum errors arriving.
I assume the switch isn't showing errors?
-
Tried that but to no avail. All three removed but the error count still increased.
The switch showed some errors before I switched to the SR modules but is currently not showing any.
Where could the errors come from if not from pfSense or the hardware stack? I ask this as a serious question and don't mean to blame pfSense, but the switch works fine for all other devices and it happens for both pfSenses at roughly the same rate, on all four ports ...
Thanks
Sebastian -
Mmm, that's a good question!
Are you actually seeing dropped packets across the link? Throughput issues etc?
Either something is incorrectly calculating checksums or the errors really exist exist. Does the switch specifically show checksum errors or just general errors?
The error rate looks to be ~1% from the netstat output. You could try grabbing a pcap on one of the interfaces and seeing if wireguard reports checksum errors. You would need to disable hardware checksum off loads for that since that usually causes pcaps to see them all as incorrect.
Steve
-
No dropped packages, but at least the switch claims that while it found errors it didn't drop any packages. I can't say for sure if I have performance issues since the network load is very low in general at the moment. And while we had some issues in the past under high load, that was primarily due to the ISP's router (it just locked up and didn't respond at all), so I cannot say for sure if there are any actual performance issues with the pfSense itself.
The switch also only shows a very low number of FCS errors at the moment on a single port (it probably reset during the weekend):
Totals (Since boot or last clear) : Bytes Rx : 43,788,201,529 Bytes Tx : 37,277,703,386 Unicast Rx : 108,192,845 Unicast Tx : 41,951,493 Bcast/Mcast Rx : 55,424 Bcast/Mcast Tx : 2,652,035 Errors (Since boot or last clear) : FCS Rx : 2201 Drops Tx : 0 Alignment Rx : 0 Collisions Tx : 0 Runts Rx : 0 Late Colln Tx : 0 Giants Rx : 25 Excessive Colln : 0 Total Rx Errors : 2226 Deferred Tx : 0
I recorded network traffic on both ix0 and ix1 and both dumps showed 0 % error rate (according to Wireshark (with checksum check enabled)).
-
Hmm, well it could be an incorrect error report. Or it could be correcting the errors maybe? I wouldn't expect that though. Ethernet frames failing the crc check should just be dropped. Though I very rarely dig that deep!
I suspect that these errors have always been present and are not actually causing a problem. The previous driver versions simply did not report them and you weren't logging it anyway.Steve
-
@stephenw10 Okay, I will monitor this again when we have higher loads in order to ensure that this is not causing issues. If not, I probably just have to ignore it.
Thank you for your support!