• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Intel I210 low throughput w/ VLANs

Hardware
2
10
930
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T
    thetrevster
    last edited by Nov 1, 2023, 8:29 PM

    I've been spending the last 3 days trying to figure this one out... Just installed a Protectli FW4B with a Celeron J3160 and 8GB RAM running 2.7 CE. No packages except ACME for certs, no IPS/IDS or any of that. Two interfaces being used, igb0 for WAN, igb1 for LAN. Running 21 VLANs under igb1 with the parent/untagged interface not being used. Static IP directly on WAN (no PPPoE or any of that). No routing between VLANs, all traffic from each VLAN hits the VLAN interface then gets 1:1 NAT'd to a public VIP on my /27 I have assigned from my ISP.

    Having trouble routing over 750Mb/s out of the VLANs (igb1.x) towards the Internet (igb0). Running iPerf traffic through the appliance on one of the VLANs to a public iPerf server (using -R), I get around 562Mb/s with 141 retries. CPU stays around 34% utilization which leads me to believe this isn't a CPU-bound issue (I think). Some output of boot log if it's helpful is below. I've tried tweaking with offload and currently have everything unchecked (enabled) except ALTQ support (I know it's advised to leave TSO and LRO checked, but experimenting at this point to resolve this).

    I read somewhere that increasing the RX and TX below from 1024 to 4096 below may help so that may be the next thing I'll try, thoughts? This thing is already deployed and running (stable) just not getting the expected speed so I try to make tweaks in a maintenance window. I did notice while checking system activity that kernel{if_io_tqg_3}] is only eating up a single core during these iPerf tests, maybe something there? Appreciate any insight or tweaks I can try to squeeze more performance! Besides the large number of VLANs, not doing anything special, no intra-VLAN routing. Just basic 1:1 NAT for each IP in the /30 VLAN sub interface and minimal FW rules (like 5 floating and that's it).

    cat /var/log/dmesg.boot | grep igb
    
    igb0: <Intel(R) I210 Flashless (Copper)> port 0xe000-0xe01f mem 0xb1500000-0xb151ffff,0xb1520000-0xb1523fff at device 0.0 on pci1
    igb0: NVM V0.6 imgtype6
    igb0: Using 1024 TX descriptors and 1024 RX descriptors
    igb0: Using 4 RX queues 4 TX queues
    igb0: Using MSI-X interrupts with 5 vectors
    igb0: Ethernet address: 00:e0:67:30:6e:b0
    igb0: netmap queues/slots: TX 4/1024, RX 4/1024
    igb1: <Intel(R) I210 Flashless (Copper)> port 0xd000-0xd01f mem 0xb1400000-0xb141ffff,0xb1420000-0xb1423fff at device 0.0 on pci2
    igb1: NVM V0.6 imgtype6
    igb1: Using 1024 TX descriptors and 1024 RX descriptors
    igb1: Using 4 RX queues 4 TX queues
    igb1: Using MSI-X interrupts with 5 vectors
    igb1: Ethernet address: 00:e0:67:30:6e:b1
    igb1: netmap queues/slots: TX 4/1024, RX 4/1024
    igb2: <Intel(R) I210 Flashless (Copper)> port 0xc000-0xc01f mem 0xb1300000-0xb131ffff,0xb1320000-0xb1323fff at device 0.0 on pci3
    igb2: NVM V0.6 imgtype6
    igb2: Using 1024 TX descriptors and 1024 RX descriptors
    igb2: Using 4 RX queues 4 TX queues
    igb2: Using MSI-X interrupts with 5 vectors
    igb2: Ethernet address: 00:e0:67:30:6e:b2
    igb2: netmap queues/slots: TX 4/1024, RX 4/1024
    igb3: <Intel(R) I210 Flashless (Copper)> port 0xb000-0xb01f mem 0xb1200000-0xb121ffff,0xb1220000-0xb1223fff at device 0.0 on pci4
    igb3: NVM V0.6 imgtype6
    igb3: Using 1024 TX descriptors and 1024 RX descriptors
    igb3: Using 4 RX queues 4 TX queues
    igb3: Using MSI-X interrupts with 5 vectors
    igb3: Ethernet address: 00:e0:67:30:6e:b3
    igb3: netmap queues/slots: TX 4/1024, RX 4/1024
    
    T S 2 Replies Last reply Nov 2, 2023, 12:02 PM Reply Quote 0
    • T
      thetrevster @thetrevster
      last edited by Nov 2, 2023, 12:02 PM

      @thetrevster I should also mention: I've noticed I was able to get throughput to hit over 850Mb/s but it seems to hit 850, then fade down to 600, then ramp right back up to 850, then slowly decline again to around 600 and the process repeats over and over throughout a test, if that's indicative of anything. Possibly the TX and RX adjustments above should be made?

      1 Reply Last reply Reply Quote 0
      • S
        stephenw10 Netgate Administrator @thetrevster
        last edited by Nov 3, 2023, 12:48 AM

        @thetrevster said in Intel I210 low throughput w/ VLANs:

        I did notice while checking system activity that kernel{if_io_tqg_3}] is only eating up a single core during these iPerf tests,

        Right, 34% total CPU use could still be 100% of one core on a 4 core CPU. Try checking the per core usage either at the command line using top -HaSP or in Diag > System Activity in the gui.

        Are you sure the WAN will pass 1G up and down? You can try running iperf on pfSense directly and testing against it to confirm the LAN side is passing 1G.

        Steve

        T 1 Reply Last reply Nov 3, 2023, 4:03 AM Reply Quote 0
        • T
          thetrevster @stephenw10
          last edited by Nov 3, 2023, 4:03 AM

          @stephenw10 it does appear a single core is getting eaten - so I suppose the real question is, is VLAN tagging offloaded to the NICs or is each VLAN processed single threaded within pfSense? I am sure the WAN does 1Gb/s. When I plug directly into the ISP-provided fiber switch, I'm seeing consistent 930Mb/s symmetrical. I did perform the local iPerf test. I ran two tests together, simultaneously, to get better CPU utilization and the average throughput for both tests was 677Mb/s ((332+347)/2). CPU spiked up to around 72% Running a single iPerf test yielded an average of 674Mb/s. CPU at 61%. I would have thought that two iPerf tests running on separate cores could have hit an average of over 900Mb/s. All iPerf tests mentioned were ran with 8 streams (-P8).

          As far as LAN to pfSense as mentioned, I also performed that test and saw similar to above. So this makes me think it's not a specific "side" but more like NIC queuing possibly? VLANs would be out of the equation for the pfSense to Internet iPerf tests as the Internet WAN port is a straight L3 physical with no sub interfaces/tagging. Is it possible I need to tweak something with the I210 NICs?

          T 1 Reply Last reply Nov 3, 2023, 4:47 AM Reply Quote 0
          • T
            thetrevster @thetrevster
            last edited by thetrevster Nov 3, 2023, 4:48 AM Nov 3, 2023, 4:47 AM

            @thetrevster I have stumbled upon dev.igb.0.iflib.override_ntxds and dev.igb.0.iflib.override_ntxds. I'm wondering if those values need to be adjusted for each applicable igb interface in question. I was mainly looking at content in post 15 here: https://hardforum.com/threads/pfsense-2-5-0-upgrade-results.2008073/

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by Nov 3, 2023, 12:44 PM

              I wouldn't have expected VLANs to make any significant difference there.

              You can certainly try setting a different value for the queue descriptors. That's not normally required for igb though.

              T 1 Reply Last reply Nov 16, 2023, 3:40 AM Reply Quote 0
              • T
                thetrevster @stephenw10
                last edited by thetrevster Nov 16, 2023, 3:40 AM Nov 16, 2023, 3:40 AM

                @stephenw10 so I’m definitely still having issues but I have searched around and found the following forum post. This seems to line up with almost exactly the same thing I’m running into but it seems the OP never found a solution…

                https://forum.netgate.com/topic/148800/throughput-expectations-on-celeron-igb-driver-system

                1 Reply Last reply Reply Quote 0
                • S
                  stephenw10 Netgate Administrator
                  last edited by Nov 16, 2023, 12:47 PM

                  Ah, I was just about to suggest the same thing I did there. Is your CPU stuck at some low frequency mode?

                  Check: sysctl dev.cpu.0

                  What does the top -HaSP output actually look like when your are testing?

                  T 1 Reply Last reply Nov 17, 2023, 1:40 AM Reply Quote 0
                  • T
                    thetrevster @stephenw10
                    last edited by thetrevster Nov 17, 2023, 1:42 AM Nov 17, 2023, 1:40 AM

                    @stephenw10 We might be getting somewhere... I'm not positive what I'm looking at, but the output is below for sysctl dev.cpu.0. If I remember reading right, 1601 indicates that TurboBoost or whatever is allowed on the CPU with the high limit being, apparently 2GHz, according to below.

                    dev.cpu.0.temperature: 34.0C
                    dev.cpu.0.coretemp.throttle_log: 0
                    dev.cpu.0.coretemp.tjmax: 90.0C
                    dev.cpu.0.coretemp.resolution: 1
                    dev.cpu.0.coretemp.delta: 56
                    dev.cpu.0.cx_method: C1/mwait/hwc C2/mwait/hwc C3/mwait/hwc
                    dev.cpu.0.cx_usage_counters: 1802411824 0 0
                    dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% last 70us
                    dev.cpu.0.cx_lowest: C1
                    dev.cpu.0.cx_supported: C1/1/1 C2/2/500 C3/3/1000
                    dev.cpu.0.freq_levels: 1601/2000 1600/2000 1520/1900 1440/1800 1360/1700 1280/1600 1200/1500 1120/1400 1040/1300 960/1200 880/1100 800/1000 720/900 640/800 560/700 480/600
                    dev.cpu.0.freq: 1601
                    dev.cpu.0.%parent: acpi0
                    dev.cpu.0.%pnpinfo: _HID=none _UID=0 _CID=none
                    dev.cpu.0.%location: handle=\_PR_.CPU0
                    dev.cpu.0.%driver: cpu
                    dev.cpu.0.%desc: ACPI CPU
                    

                    I did some further iPerf testing (iperf3 -c speedtest.sea11.us.leaseweb.net -p 5201-5210 -P4 -R -t 60) with a local server here in Seattle from pfSense CLI directly and was able to see 940Mb/s which is great! During that test, dashboard showed ~64% CPU utilization and the below with your top command:

                    last pid: 36502;  load averages:  1.21,  0.71,  0.55                                                         up 14+13:49:52  17:35:04
                    307 threads:   8 running, 281 sleeping, 18 waiting
                    CPU 0:  0.8% user,  0.0% nice, 27.6% system, 15.4% interrupt, 56.3% idle
                    CPU 1:  1.6% user,  0.0% nice, 40.9% system,  9.4% interrupt, 48.0% idle
                    CPU 2:  0.8% user,  0.0% nice, 49.6% system, 29.9% interrupt, 19.7% idle
                    CPU 3:  3.9% user,  0.0% nice, 31.9% system, 13.8% interrupt, 50.4% idle
                    Mem: 40M Active, 269M Inact, 490M Wired, 56K Buf, 6991M Free
                    ARC: 130M Total, 19M MFU, 104M MRU, 324K Anon, 717K Header, 6000K Other
                         100M Compressed, 259M Uncompressed, 2.58:1 Ratio
                    Swap: 1024M Total, 1024M Free
                    
                    Message from syslogd@rtr01 at Nov 16 17:26:49 ... C   TIME    WCPU COMMAND
                    35513 tstrotz     109    0    17M  7300K CPU1     1   0:07  74.43% iperf3 -c speedtest.sea11.us.leaseweb.net -p 5201-5210 -P4 -R -t 6
                       11 root        187 ki31     0B    64K CPU0     0 330.4H  54.34% [idle{idle: cpu0}]
                       11 root        187 ki31     0B    64K RUN      3 329.2H  51.07% [idle{idle: cpu3}]
                       11 root        187 ki31     0B    64K RUN      1 330.6H  48.84% [idle{idle: cpu1}]
                       12 root        -56    -     0B   240K RUN      2 106:00  46.99% [intr{swi1: netisr 1}]
                        0 root        -60    -     0B  1488K CPU2     2 592:29  43.96% [kernel{if_io_tqg_2}]
                        0 root        -60    -     0B  1488K -        1 563:46  30.88% [kernel{if_io_tqg_1}]
                       11 root        187 ki31     0B    64K RUN      2 331.6H  20.12% [idle{idle: cpu2}]
                       12 root        -60    -     0B   240K WAIT     3 118:31  15.05% [intr{swi1: netisr 2}]
                        0 root        -60    -     0B  1488K -        3 751:32   4.99% [kernel{if_io_tqg_3}]
                        0 root        -60    -     0B  1488K -        0 568:48   3.88% [kernel{if_io_tqg_0}]
                       12 root        -60    -     0B   240K WAIT     0 401:39   2.31% [intr{swi1: netisr 3}]
                       12 root        -60    -     0B   240K WAIT     1 413:23   2.22% [intr{swi1: netisr 0}]
                        0 root        -64    -     0B  1488K -        0  60:28   0.40% [kernel{dummynet}]
                    75759 tstrotz      20    0    14M  4384K CPU3     3   0:01   0.16% top -HaSP
                        7 root        -16    -     0B    16K pftm     3  12:06   0.12% [pf purge]
                        0 root        -60    -     0B  1488K -        1  14:18   0.06% [kernel{if_config_tqg_0}]
                        8 root        -16    -     0B    16K -        1   8:27   0.05% [rand_harvestq]
                    18647 dhcpd        20    0    25M    12M select   0   5:54   0.03% /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd
                    

                    Testing with a tagged VLAN through the MikroTik on the other hand with the exact same iPerf command, I get the below and a max of ~650Mb/s. Maybe it's the MikroTik switch? I can schedule time to go to the site to test a tagged VLAN directly out of the pfSense box itself and specify the tag on my laptop NIC if needed.

                    last pid:  1926;  load averages:  1.15,  0.97,  0.72                                                         up 14+13:54:27  17:39:39
                    306 threads:   7 running, 280 sleeping, 19 waiting
                    CPU 0:  2.8% user,  0.0% nice,  5.9% system, 22.8% interrupt, 68.5% idle
                    CPU 1:  0.8% user,  0.0% nice, 33.5% system,  8.7% interrupt, 57.1% idle
                    CPU 2:  2.0% user,  0.0% nice, 14.2% system,  6.3% interrupt, 77.6% idle
                    CPU 3:  0.8% user,  0.0% nice, 64.2% system,  1.2% interrupt, 33.9% idle
                    Mem: 39M Active, 268M Inact, 490M Wired, 56K Buf, 6992M Free
                    ARC: 129M Total, 20M MFU, 102M MRU, 464K Anon, 717K Header, 5961K Other
                         100M Compressed, 259M Uncompressed, 2.59:1 Ratio
                    Swap: 1024M Total, 1024M Free
                    
                    Message from syslogd@rtr01 at Nov 16 17:26:49 ... C   TIME    WCPU COMMAND
                       11 root        187 ki31     0B    64K RUN      2 331.7H  75.74% [idle{idle: cpu2}]
                       11 root        187 ki31     0B    64K CPU0     0 330.5H  67.53% [idle{idle: cpu0}]
                        0 root        -60    -     0B  1488K CPU3     3 752:18  62.08% [kernel{if_io_tqg_3}]
                       11 root        187 ki31     0B    64K CPU1     1 330.7H  59.62% [idle{idle: cpu1}]
                       11 root        187 ki31     0B    64K RUN      3 329.3H  34.97% [idle{idle: cpu3}]
                        0 root        -60    -     0B  1488K CPU1     1 564:02  29.40% [kernel{if_io_tqg_1}]
                       12 root        -60    -     0B   240K WAIT     0 401:56  26.67% [intr{swi1: netisr 3}]
                        0 root        -60    -     0B  1488K -        2 593:03  17.13% [kernel{if_io_tqg_2}]
                       12 root        -60    -     0B   240K WAIT     1 106:14   6.13% [intr{swi1: netisr 1}]
                        0 root        -60    -     0B  1488K -        0 569:12   5.26% [kernel{if_io_tqg_0}]
                    28137 root         20    0    32M    11M kqread   3  42:21   4.38% nginx: worker process (nginx)
                       12 root        -60    -     0B   240K WAIT     0 413:36   3.34% [intr{swi1: netisr 0}]
                       12 root        -60    -     0B   240K WAIT     0 118:39   2.47% [intr{swi1: netisr 2}]
                    20020 root         20    0   145M    56M accept   2   1:39   1.85% php-fpm: pool nginx (php-fpm)
                    32572 root         20    0   149M    52M accept   2   2:00   1.76% php-fpm: pool nginx (php-fpm){php-fpm}
                        0 root        -64    -     0B  1488K -        0  60:29   0.33% [kernel{dummynet}]
                    75759 tstrotz      20    0    14M  4384K CPU2     2   0:01   0.15% top -HaSP
                    74879 root         20    0    13M  3000K select   1   6:39   0.10% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/s
                    96522 root         20    0   107M    25M uwait    1   0:14   0.07% /usr/local/libexec/ipsec/charon --use-syslog{charon}
                    
                    1 Reply Last reply Reply Quote 0
                    • S
                      stephenw10 Netgate Administrator
                      last edited by Nov 17, 2023, 11:46 AM

                      Mmm, that does seem suspicious. I would normally expect a higher result when testing from a client behind the firewall. Unless that client itself is restricted.

                      You can see that in both cases no single core is maxed out. But when testing from the firewall directly the load created by iperf itself is larger than anything else.

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.