Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Packet error on physical interface

    Scheduled Pinned Locked Moved Hardware
    14 Posts 2 Posters 831 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Hmm, well looks like they are all checksum errors on the ix NICs. However that was updated int he driver for 2.6 and then further changes were made more recently as some of those were shown to be spurious.

      Looks it's buffer exhaustion on the bce NIC which normally means the NIC is not being serviced by the CPU fast enough for the incoming traffic.
      I would check that value matched the current input error count from netstat though.

      Steve

      W 1 Reply Last reply Reply Quote 0
      • W
        william_vn @stephenw10
        last edited by

        @stephenw10
        here is full output from sysctl and netstat:
        output.txt

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Hmm, OK so those errors are shown as InputDiscards for bge, as com_no_buffers for bce and as checksum_errs in ix.

          Do you have flow control active on any of those links?
          My first though here is that the NICs are just dropping packets because the CPU queues are maxed out.

          Steve

          W 1 Reply Last reply Reply Quote 0
          • W
            william_vn @stephenw10
            last edited by

            @stephenw10
            Do you have flow control active on any of those links?
            --> No, I don't

            My first though here is that the NICs are just dropping packets because the CPU queues are maxed out.
            --> so how can I check CPU queues ? and Is there solution for this?

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Check the boot logs to make sure each NIC is coming up with the expected number of queues initially. Check the CPU usage per core on the firewall using top -HaSP at the CLI. Do you have cores that are maxed out.
              Try enabling flow control. That should prevent packet loss in the even the NIC cannot be serviced fast enough. Though it may introduce other issues.

              W 1 Reply Last reply Reply Quote 0
              • W
                william_vn @stephenw10
                last edited by

                @stephenw10 ,
                top -HaSP

                last pid: 81614;  load averages:  0.49,  0.57,  0.52                                                                                                  up 42+17:32:16  08:22:31
                453 threads:   18 running, 384 sleeping, 51 waiting
                CPU 0:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 1:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 2:   0.0% user,  0.0% nice, 16.7% system,  0.0% interrupt, 83.3% idle
                CPU 3:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 4:   0.0% user,  0.0% nice, 16.7% system,  0.0% interrupt, 83.3% idle
                CPU 5:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 6:   0.0% user,  0.0% nice,  8.3% system,  0.0% interrupt, 91.7% idle
                CPU 7:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 8:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 9:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 10:  0.0% user,  0.0% nice,  7.7% system,  0.0% interrupt, 92.3% idle
                CPU 11:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 12:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 13:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 14:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                CPU 15:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                Mem: 106M Active, 6357M Inact, 1549M Wired, 763M Buf, 23G Free
                Swap: 4096M Total, 4096M Free
                
                  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
                   11 root        155 ki31     0B   256K CPU9     9 1022.9 100.00% [idle{idle: cpu9}]
                   11 root        155 ki31     0B   256K CPU11   11 1022.8 100.00% [idle{idle: cpu11}]
                   11 root        155 ki31     0B   256K CPU15   15 1022.6 100.00% [idle{idle: cpu15}]
                   11 root        155 ki31     0B   256K CPU7     7 1022.4 100.00% [idle{idle: cpu7}]
                   11 root        155 ki31     0B   256K CPU1     1 1021.9 100.00% [idle{idle: cpu1}]
                   11 root        155 ki31     0B   256K CPU8     8 1016.7  96.53% [idle{idle: cpu8}]
                   11 root        155 ki31     0B   256K CPU10   10 1016.9  96.35% [idle{idle: cpu10}]
                   11 root        155 ki31     0B   256K CPU12   12 1016.5  96.32% [idle{idle: cpu12}]
                   11 root        155 ki31     0B   256K CPU14   14 1016.7  95.39% [idle{idle: cpu14}]
                   11 root        155 ki31     0B   256K CPU0     0 1000.3  94.90% [idle{idle: cpu0}]
                   11 root        155 ki31     0B   256K CPU3     3 1021.4  91.30% [idle{idle: cpu3}]
                   11 root        155 ki31     0B   256K CPU4     4 1004.4  91.23% [idle{idle: cpu4}]
                   11 root        155 ki31     0B   256K RUN      6 958.7H  88.47% [idle{idle: cpu6}]
                   11 root        155 ki31     0B   256K CPU2     2 1010.2  85.78% [idle{idle: cpu2}]
                   11 root        155 ki31     0B   256K CPU13   13 1022.7  85.24% [idle{idle: cpu13}]
                   11 root        155 ki31     0B   256K CPU5     5 1021.5  63.86% [idle{idle: cpu5}]
                    0 root        -92    -     0B  1616K -        2 420:57   8.32% [kernel{bge2 taskq}]
                    0 root        -92    -     0B  1616K -        1  61.3H   8.04% [kernel{bge0 taskq}]
                    0 root        -76    -     0B  1616K -        2 381:58   6.41% [kernel{if_io_tqg_2}]
                    0 root        -92    -     0B  1616K -        4 357:50   4.62% [kernel{bge3 taskq}]
                    0 root        -76    -     0B  1616K -       12 450:05   3.25% [kernel{if_io_tqg_12}]
                    0 root        -76    -     0B  1616K -        0 541:11   2.73% [kernel{if_io_tqg_0}]
                    0 root        -92    -     0B  1616K CPU0     0 906:30   2.41% [kernel{bge1 taskq}]
                   12 root        -92    -     0B   816K WAIT     4 502:59   2.20% [intr{irq258: bce2}]
                    0 root        -76    -     0B  1616K -       10 426:32   2.18% [kernel{if_io_tqg_10}]
                    0 root        -76    -     0B  1616K -        4 384:42   2.01% [kernel{if_io_tqg_4}]
                    0 root        -76    -     0B  1616K -        6 374:11   1.83% [kernel{if_io_tqg_6}]
                81614 root         20    0    14M  4512K CPU6     6   0:00   1.74% top -HaSP
                    0 root        -76    -     0B  1616K -       14 430:22   1.57% [kernel{if_io_tqg_14}]
                    0 root        -76    -     0B  1616K -        8 440:59   1.49% [kernel{if_io_tqg_8}]
                
                
                

                cat dmesg.boot

                ix0: <Intel(R) X520 82599ES (SFI/SFP+)> port 0xecc0-0xecdf mem 0xdf300000-0xdf37ffff,0xdf2f8000-0xdf2fbfff irq 35 at device 0.0 on pci5
                ix0: Using 2048 TX descriptors and 2048 RX descriptors
                ix0: Using 8 RX queues 8 TX queues
                ix0: Using MSI-X interrupts with 9 vectors
                ix0: allocated for 8 queues
                ix0: allocated for 8 rx queues
                ix0: Ethernet address: 00:1b:21:bc:77:c0
                ix0: PCI Express Bus: Speed 5.0GT/s Width x4
                ix0: eTrack 0x00012b2c PHY FW V65535
                ix0: netmap queues/slots: TX 8/2048, RX 8/2048
                ix1: <Intel(R) X520 82599ES (SFI/SFP+)> port 0xece0-0xecff mem 0xdf380000-0xdf3fffff,0xdf2fc000-0xdf2fffff irq 46 at device 0.1 on pci5
                ix1: Using 2048 TX descriptors and 2048 RX descriptors
                ix1: Using 8 RX queues 8 TX queues
                ix1: Using MSI-X interrupts with 9 vectors
                ix1: allocated for 8 queues
                ix1: allocated for 8 rx queues
                ix1: Ethernet address: 00:1b:21:bc:77:c1
                ix1: PCI Express Bus: Speed 5.0GT/s Width x4
                ix1: eTrack 0x00012b2c PHY FW V65535
                ix1: netmap queues/slots: TX 8/2048, RX 8/2048
                pcib6: <ACPI PCI-PCI bridge> at device 7.0 on pci0
                pci6: <ACPI PCI bus> on pcib6
                pcib7: <ACPI PCI-PCI bridge> at device 9.0 on pci0
                pci7: <ACPI PCI bus> on pcib7
                
                

                I find the ix0 and ix1 interface using 8 queues.

                Additionally, I've re-checked and found pfsense already enabled flow control:

                sysctl hw.ix

                hw.ix.enable_rss: 1
                hw.ix.enable_fdir: 0
                hw.ix.unsupported_sfp: 0
                hw.ix.enable_msix: 1
                hw.ix.advertise_speed: 0
                hw.ix.flow_control: 3
                hw.ix.max_interrupt_rate: 31250
                

                Thanks!!!

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Flow control is negotiated though, both ends have to support it.

                  Nothing much happening in that top output. Was there any traffic flowing at that point? Were the errors increasing ?
                  What I expect to see is that when traffic through the NIC peaks the CPU core(s) servicing it max out and it drops packets. Assuming that is the cause.

                  Are you able to see the number of queues on the bce/bge NICs? vmstat -i may show it.

                  W 1 Reply Last reply Reply Quote 0
                  • W
                    william_vn @stephenw10
                    last edited by

                    @stephenw10
                    yes, a lot of traffic at that point, and the errors increasing as well.
                    CPU usage is always less than 10%.

                    Screen Shot 2023-02-27 at 07.52.14.png

                    [2.6.0-RELEASE][admin@pfs2]/root: vmstat -i
                    interrupt                          total       rate
                    irq17: uhci0                          24          0
                    irq19: ehci0                     2947641          1
                    irq23: atapci0                   2616920          1
                    cpu0:timer                    1228218976        311
                    cpu1:timer                      43088269         11
                    cpu2:timer                     515777307        131
                    cpu3:timer                      44479430         11
                    cpu4:timer                     665350682        168
                    cpu5:timer                      43841493         11
                    cpu6:timer                    1046412756        265
                    cpu7:timer                      41014489         10
                    cpu8:timer                     163311089         41
                    cpu9:timer                      42267453         11
                    cpu10:timer                    143573306         36
                    cpu11:timer                     43289875         11
                    cpu12:timer                    162167544         41
                    cpu13:timer                     44236981         11
                    cpu14:timer                    152160861         39
                    cpu15:timer                     45152052         11
                    irq257: bce1                     6913402          2
                    irq258: bce2                  2158281404        547
                    irq259: bce3                           1          0
                    irq260: mfi0                     7352305          2
                    irq261: ix0:rxq0              7415677330       1878
                    irq262: ix0:rxq1               709426817        180
                    irq263: ix0:rxq2               756092990        191
                    irq264: ix0:rxq3               701942814        178
                    irq265: ix0:rxq4               771849533        195
                    irq266: ix0:rxq5               709479026        180
                    irq267: ix0:rxq6               813804607        206
                    irq268: ix0:rxq7               701544714        178
                    irq269: ix0:aq                         6          0
                    irq270: ix1:rxq0              7808112326       1977
                    irq271: ix1:rxq1              1041604395        264
                    irq272: ix1:rxq2              1028264841        260
                    irq273: ix1:rxq3              1045890761        265
                    irq274: ix1:rxq4              1094897663        277
                    irq275: ix1:rxq5              1009812406        256
                    irq276: ix1:rxq6              1165305943        295
                    irq277: ix1:rxq7              1047942990        265
                    irq278: ix1:aq                         7          0
                    irq279: bge0                  6303565889       1596
                    irq280: bge1                  3249572863        823
                    irq281: bge2                  1628365893        412
                    irq282: bge3                  1149357028        291
                    Total                        46754965102      11839
                    [2.6.0-RELEASE][admin@pfs2]/root: netstat -i
                    Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
                    bce0*  1500 <Link#1>      b8:ac:6f:97:06:ea        0     0     0        0     0     0
                    bce1   1500 <Link#2>      b8:ac:6f:97:06:ec  7117685     0     0    70405     0     0
                    bce1      - fe80::%bce1/6 fe80::baac:6fff:f        0     -     -        1     -     -
                    bce1      - 192.168.103.0 pfs2               2343399     -     -      515     -     -
                    bce2   1500 <Link#3>      b8:ac:6f:97:06:ee 1868223693 2155425     0 906688134     0     0
                    bce2      - fe80::%bce2/6 fe80::baac:6fff:f        0     -     -        1     -     -
                    bce3   1500 <Link#4>      b8:ac:6f:97:06:f0        0     0     0        0     0     0
                    bce3      - fe80::%bce3/6 fe80::baac:6fff:f        0     -     -        1     -     -
                    bce3      - xxx.xxx.xx.xx xxx.xxx.xx.xx           0     -     -        0     -     -
                    ix0    1500 <Link#5>      00:1b:21:bc:77:c0 6590339737 7248161     0 16393394382     0     0
                    ix1    1500 <Link#6>      00:1b:21:bc:77:c0 10209454971 22546676     0 16548545943     0     0
                    bge0   1500 <Link#7>      00:0a:f7:90:e6:ec 21040184018 25001666     0 8758689519     0     0
                    bge0      - fe80::%bge0/6 fe80::20a:f7ff:fe        0     -     -        1     -     -
                    bge1   1500 <Link#8>      00:0a:f7:90:e6:ed 3801913176 6100037     0 2248351113     0     0
                    bge1      - fe80::%bge1/6 fe80::20a:f7ff:fe        0     -     -        0     -     -
                    bge2   1500 <Link#9>      00:0a:f7:90:e6:ee 1722390896 1717684     0 825525005     0     0
                    bge2      - fe80::%bge2/6 fe80::20a:f7ff:fe        0     -     -        1     -     -
                    bge3   1500 <Link#10>     00:0a:f7:90:e6:ef 1665701913 2291555     0 914404232     0     0
                    bge3      - fe80::%bge3/6 fe80::20a:f7ff:fe        0     -     -        2     -     -
                    
                    
                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, looks like 1 irq per NIC for bce/bge. However I'd expect to see one CPU core at 100% if it was failing to service the NIC fast enough.

                      What are those NICs connected to? Are you seeing errors on the connected devices?

                      W 1 Reply Last reply Reply Quote 0
                      • W
                        william_vn @stephenw10
                        last edited by

                        @stephenw10 ,
                        here is output of top:

                        [2.6.0-RELEASE][admin@pfs2]/root: top -HaSP
                        
                        last pid: 11585;  load averages:  0.58,  0.76,  0.67                                                                                                    up 45+17:40:30  08:30:45
                        450 threads:   17 running, 382 sleeping, 51 waiting
                        CPU 0:   0.0% user,  0.0% nice, 12.8% system,  0.0% interrupt, 87.2% idle
                        CPU 1:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                        CPU 2:   0.0% user,  0.0% nice,  4.3% system,  0.0% interrupt, 95.7% idle
                        CPU 3:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                        CPU 4:   0.0% user,  0.0% nice,  8.0% system,  3.7% interrupt, 88.2% idle
                        CPU 5:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                        CPU 6:   0.0% user,  0.0% nice, 10.7% system,  0.0% interrupt, 89.3% idle
                        CPU 7:   0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                        CPU 8:   0.0% user,  0.0% nice,  2.7% system,  0.0% interrupt, 97.3% idle
                        CPU 9:   0.5% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.5% idle
                        CPU 10:  0.5% user,  0.0% nice,  2.7% system,  0.0% interrupt, 96.8% idle
                        CPU 11:  0.0% user,  0.0% nice,  0.5% system,  0.0% interrupt, 99.5% idle
                        CPU 12:  0.0% user,  0.0% nice,  2.1% system,  0.0% interrupt, 97.9% idle
                        CPU 13:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                        CPU 14:  0.0% user,  0.0% nice,  2.7% system,  0.0% interrupt, 97.3% idle
                        CPU 15:  0.5% user,  0.0% nice,  0.5% system,  0.0% interrupt, 98.9% idle
                        Mem: 104M Active, 6370M Inact, 1570M Wired, 782M Buf, 23G Free
                        Swap: 4096M Total, 4096M Free
                        
                          PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
                           11 root        155 ki31     0B   256K CPU11   11 1094.8 100.00% [idle{idle: cpu11}]
                           11 root        155 ki31     0B   256K CPU13   13 1094.7 100.00% [idle{idle: cpu13}]
                           11 root        155 ki31     0B   256K CPU3     3 1093.3  99.84% [idle{idle: cpu3}]
                           11 root        155 ki31     0B   256K CPU1     1 1093.8  99.77% [idle{idle: cpu1}]
                           11 root        155 ki31     0B   256K CPU15   15 1094.6  99.46% [idle{idle: cpu15}]
                           11 root        155 ki31     0B   256K CPU7     7 1094.3  98.81% [idle{idle: cpu7}]
                           11 root        155 ki31     0B   256K CPU5     5 1093.5  98.73% [idle{idle: cpu5}]
                           11 root        155 ki31     0B   256K RUN      8 1088.5  98.21% [idle{idle: cpu8}]
                           11 root        155 ki31     0B   256K CPU14   14 1088.5  98.10% [idle{idle: cpu14}]
                           11 root        155 ki31     0B   256K CPU10   10 1088.6  97.99% [idle{idle: cpu10}]
                           11 root        155 ki31     0B   256K CPU12   12 1088.2  97.60% [idle{idle: cpu12}]
                           11 root        155 ki31     0B   256K CPU9     9 1094.8  97.09% [idle{idle: cpu9}]
                           11 root        155 ki31     0B   256K CPU2     2 1081.5  94.77% [idle{idle: cpu2}]
                           11 root        155 ki31     0B   256K CPU4     4 1075.0  92.37% [idle{idle: cpu4}]
                           11 root        155 ki31     0B   256K CPU0     0 1071.4  88.04% [idle{idle: cpu0}]
                           11 root        155 ki31     0B   256K CPU6     6 1028.6  85.34% [idle{idle: cpu6}]
                            0 root        -92    -     0B  1616K -        6  63.1H  12.78% [kernel{bge0 taskq}]
                            0 root        -92    -     0B  1616K -        0 938:32   8.50% [kernel{bge1 taskq}]
                            0 root        -76    -     0B  1616K -        0 564:06   3.74% [kernel{if_io_tqg_0}]
                           12 root        -92    -     0B   816K WAIT     4 533:51   3.09% [intr{irq258: bce2}]
                            0 root        -92    -     0B  1616K -        2 446:02   2.95% [kernel{bge2 taskq}]
                            0 root        -92    -     0B  1616K -        4 395:52   2.95% [kernel{bge3 taskq}]
                            0 root        -76    -     0B  1616K -       12 467:41   2.23% [kernel{if_io_tqg_12}]
                            0 root        -76    -     0B  1616K -        6 389:13   2.03% [kernel{if_io_tqg_6}]
                            0 root        -76    -     0B  1616K -        2 398:09   2.03% [kernel{if_io_tqg_2}]
                            0 root        -76    -     0B  1616K -       10 442:43   1.90% [kernel{if_io_tqg_10}]
                            0 root        -76    -     0B  1616K -       14 447:08   1.84% [kernel{if_io_tqg_14}]
                            0 root        -76    -     0B  1616K -        8 457:22   1.69% [kernel{if_io_tqg_8}]
                            0 root        -76    -     0B  1616K -        4 403:09   1.64% [kernel{if_io_tqg_4}]
                        54075 netdata      40   19   350M   213M nanslp  13  69:02   0.48% /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid{PLUGIN[freebsd]}
                        
                        

                        bce/bge is internet lines, so they are connecting converter (ISP devices). I can't check on ISP devices.
                        ix is LAN interface . It is connecting to cisco meraki switch, but I checked on this switch and I don't see any error related to interfaces (those connecting to Pfsense)

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Hmm, hard to say.

                          You might swapping the ix and bce/bge NICs to see if the errors follow them. That might not be possible with the media types you have.

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.