Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    XG-2758 / 22.05: Packet Errors on 10G-SFP+

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    21 Posts 2 Posters 2.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • X
      XRM
      last edited by XRM

      Hi *,

      we're running a pair of XG-2758 with two 10G SFP+ modules each. We recently started monitoring our network equipment in depth and noticed packet errors on all four SFP+ modules:
      pfsense_errors.png

      The firewalls are connected to an Aruba 5406 switch and that one also reports errors on the ports that are connected to the XGs, but only on these (there are about 20 systems connected and all others work fine).

      Both XGs used AXS13-192-M2 SFP-modules which were sold together with the XGs but the issue did not improve after switching to J9150 modules (we also switched the transceivers on the 5406 of course). The XGs are about 1.5 m away from the switch (i.e. it's not an issue with the cabling).

      Is there anything we can do about this or is this not a real issue (i.e. is the value reported through SNMP correct at all)?? We thought about updating to two Netgate 1537 in the near future but I wonder if we would have the same issues there, too ...

      Thanks
      Sebastian

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Check the output of netstat -i to see if the error count there shows the same thing.

        How are those interfaces configured? Are ix0 and ix1 members of lagg0?
        It seems odd only one shows a significant error rate.

        Note that since 22.01/2.6 the ix driver now reports a number of additional error types on that counter: https://redmine.pfsense.org/issues/12904

        Steve

        X 1 Reply Last reply Reply Quote 0
        • X
          XRM @stephenw10
          last edited by

          @stephenw10 Yes, netstat -i shows the same:

          Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
          ix0    1500 <Link#1>      00:08:a2:0b:13:cf 440743346 4481231     0 63627567     0     0
          ix1    1500 <Link#2>      00:08:a2:0b:13:cf 60468648 471692     0 142420471     0     0
          ...
          lagg0  1500 <Link#11>     00:08:a2:0b:13:cf 501211256 4952922     0 206048038  5601     0
          lagg0     - fe80::%lagg0/ fe80::208:a2ff:fe        0     -     -        1     -     -
          ...
          

          All other interfaces (even the active coppers) show 0 errors.

          Yes, ix0 and ix1 are both members of lagg0, sorry for not mentioning this earlier. We currently have one continuous connection that requires a bit of bandwidth, so I assume that one is constantly routed over one of both interfaces. There's no significant traffic apart from that since we're currently undergoing maintenance and everyone is on holidays.

          I can give you the full output of ifconfig --vvm and netstat -i if that helps.

          Regarding the link: actually it was only discovered since we started using SNMP. And while we did have some issues with high loads in the past, we didn't expect the pfSense to be the reason for them so we have not looked at that value in the past.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Ok check:

            [22.09-DEVELOPMENT][admin@6100.stevew.lan]/root: sysctl dev.ix.0 | grep errs
            dev.ix.0.mac_stats.checksum_errs: 0
            dev.ix.0.mac_stats.rec_len_errs: 0
            dev.ix.0.mac_stats.byte_errs: 0
            dev.ix.0.mac_stats.ill_errs: 0
            dev.ix.0.mac_stats.crc_errs: 0
            dev.ix.0.mac_stats.rx_errs: 0
            
            X 1 Reply Last reply Reply Quote 0
            • X
              XRM @stephenw10
              last edited by

              @stephenw10

              [22.05-RELEASE][root@router1]/root: sysctl dev.ix.0 | grep errs
              dev.ix.0.mac_stats.checksum_errs: 4481020
              dev.ix.0.mac_stats.rec_len_errs: 0
              dev.ix.0.mac_stats.byte_errs: 0
              dev.ix.0.mac_stats.ill_errs: 0
              dev.ix.0.mac_stats.crc_errs: 0
              dev.ix.0.mac_stats.rx_errs: 4481376
              
              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Huh, all checksum errors. In that case the first thing to try is disabling hardware checksum offloading in Sys > Adv > Networking. You will need to reboot or down/up the interfaces to apply that.

                Steve

                X 1 Reply Last reply Reply Quote 0
                • X
                  XRM @stephenw10
                  last edited by

                  [22.05-RELEASE][root@router1]/root: sysctl dev.ix.0 | grep errs
                  dev.ix.0.mac_stats.checksum_errs: 14
                  dev.ix.0.mac_stats.rec_len_errs: 0
                  dev.ix.0.mac_stats.byte_errs: 0
                  dev.ix.0.mac_stats.ill_errs: 0
                  dev.ix.0.mac_stats.crc_errs: 0
                  dev.ix.0.mac_stats.rx_errs: 14
                  

                  Doesn't look like it helped. :/

                  X 1 Reply Last reply Reply Quote 0
                  • X
                    XRM @XRM
                    last edited by XRM

                    Actually, I may have spoken too soon. I expected this would further increase but 10 minutes after the reboot we're still at just 80 errors ... So this probably improved things to an acceptable level. How severe is the drawback of this option?

                    And is this an issue with the internal network cards or with the transceivers? I.e. would I experience the same issues with f.e. a more modern device such as the 1537?

                    X 1 Reply Last reply Reply Quote 0
                    • X
                      XRM @XRM
                      last edited by

                      Scratch that, it only switched over to ix1:

                      [22.05-RELEASE][root@router1]/root: sysctl dev.ix.1 | grep errs
                      dev.ix.1.mac_stats.checksum_errs: 25308
                      dev.ix.1.mac_stats.rec_len_errs: 0
                      dev.ix.1.mac_stats.byte_errs: 0
                      dev.ix.1.mac_stats.ill_errs: 0
                      dev.ix.1.mac_stats.crc_errs: 0
                      dev.ix.1.mac_stats.rx_errs: 25308
                      
                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        OK, if you run ifconfig -vm ix0 does it show hardware checksum actually disabled?

                        If it does it could actually be bad checksums of course.

                        X 1 Reply Last reply Reply Quote 0
                        • X
                          XRM @stephenw10
                          last edited by

                          ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
                          options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER>
                          capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6>
                          

                          VLAN_HWCSUM - so I guess the answer is "no"? It's checked in the settings though and I did reboot - both machines actually and still it's the same on all four interfaces ...

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Are you using VLANs there though?

                            RXCSUM and TXCSUM are disabled as expected at least.

                            X 1 Reply Last reply Reply Quote 0
                            • X
                              XRM @stephenw10
                              last edited by

                              Well, not sure how BSD handles that to be honest. We have about 25 VLAN on the lagg interface and I would expect that the underlying interfaces see them, too?

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Check what shows as enabled on lagg0.

                                Try disabling the vlan hardware checksum as a test.

                                [22.09-DEVELOPMENT][admin@6100.stevew.lan]/root: ifconfig -m ix1
                                ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
                                	description: IX1
                                	options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER>
                                	capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6>
                                	ether 00:08:a2:12:17:7f
                                	inet6 fe80::208:a2ff:fe12:177f%ix1 prefixlen 64 scopeid 0x6
                                	inet 192.168.79.2 netmask 0xffffff00 broadcast 192.168.79.255
                                	media: Ethernet autoselect (1000baseSX <full-duplex,rxpause,txpause>)
                                	status: active
                                	supported media:
                                		media autoselect
                                		media 1000baseSX
                                		media 10Gbase-SR
                                	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                                [22.09-DEVELOPMENT][admin@6100.stevew.lan]/root: ifconfig ix1 -vlanhwcsum
                                [22.09-DEVELOPMENT][admin@6100.stevew.lan]/root: ifconfig -m ix1
                                ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
                                	description: IX1
                                	options=813838<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER>
                                	capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6>
                                	ether 00:08:a2:12:17:7f
                                	inet6 fe80::208:a2ff:fe12:177f%ix1 prefixlen 64 scopeid 0x6
                                	inet 192.168.79.2 netmask 0xffffff00 broadcast 192.168.79.255
                                	media: Ethernet autoselect (1000baseSX <full-duplex,rxpause,txpause>)
                                	status: active
                                	supported media:
                                		media autoselect
                                		media 1000baseSX
                                		media 10Gbase-SR
                                	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                                
                                X 1 Reply Last reply Reply Quote 0
                                • X
                                  XRM @stephenw10
                                  last edited by

                                  lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
                                          options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER>
                                          capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6>
                                  

                                  No improvements by removing VLAN_HWSUM. Plus, it comes back automatically after restarting.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Yeah you would need to add a command to remove it at each boot if it helped.

                                    Did you try removing it from the ix0/1 as well as lagg0?

                                    It might not be a problem in pfSense of course. Those might be be real checksum errors arriving.

                                    I assume the switch isn't showing errors?

                                    X 1 Reply Last reply Reply Quote 0
                                    • X
                                      XRM @stephenw10
                                      last edited by

                                      Tried that but to no avail. All three removed but the error count still increased.

                                      The switch showed some errors before I switched to the SR modules but is currently not showing any.

                                      Where could the errors come from if not from pfSense or the hardware stack? I ask this as a serious question and don't mean to blame pfSense, but the switch works fine for all other devices and it happens for both pfSenses at roughly the same rate, on all four ports ...

                                      Thanks
                                      Sebastian

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        Mmm, that's a good question!

                                        Are you actually seeing dropped packets across the link? Throughput issues etc?

                                        Either something is incorrectly calculating checksums or the errors really exist exist. Does the switch specifically show checksum errors or just general errors?

                                        The error rate looks to be ~1% from the netstat output. You could try grabbing a pcap on one of the interfaces and seeing if wireguard reports checksum errors. You would need to disable hardware checksum off loads for that since that usually causes pcaps to see them all as incorrect.

                                        Steve

                                        1 Reply Last reply Reply Quote 0
                                        • X
                                          XRM
                                          last edited by

                                          No dropped packages, but at least the switch claims that while it found errors it didn't drop any packages. I can't say for sure if I have performance issues since the network load is very low in general at the moment. And while we had some issues in the past under high load, that was primarily due to the ISP's router (it just locked up and didn't respond at all), so I cannot say for sure if there are any actual performance issues with the pfSense itself.

                                          The switch also only shows a very low number of FCS errors at the moment on a single port (it probably reset during the weekend):

                                            Totals (Since boot or last clear) :
                                             Bytes Rx        : 43,788,201,529       Bytes Tx        : 37,277,703,386
                                             Unicast Rx      : 108,192,845          Unicast Tx      : 41,951,493
                                             Bcast/Mcast Rx  : 55,424               Bcast/Mcast Tx  : 2,652,035
                                            Errors (Since boot or last clear) :
                                             FCS Rx          : 2201                 Drops Tx        : 0
                                             Alignment Rx    : 0                    Collisions Tx   : 0
                                             Runts Rx        : 0                    Late Colln Tx   : 0
                                             Giants Rx       : 25                   Excessive Colln : 0
                                             Total Rx Errors : 2226                 Deferred Tx     : 0
                                          

                                          I recorded network traffic on both ix0 and ix1 and both dumps showed 0 % error rate (according to Wireshark (with checksum check enabled)).

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            Hmm, well it could be an incorrect error report. Or it could be correcting the errors maybe? I wouldn't expect that though. Ethernet frames failing the crc check should just be dropped. Though I very rarely dig that deep!
                                            I suspect that these errors have always been present and are not actually causing a problem. The previous driver versions simply did not report them and you weren't logging it anyway.

                                            Steve

                                            X 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.