Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Performance tuning for 10gb connection

    Scheduled Pinned Locked Moved Hardware
    20 Posts 5 Posters 4.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      If you hit q during the test you can copy/paste the output here.
      Really depends what those processes are.

      Steve

      1 Reply Last reply Reply Quote 0
      • D
        dustintuft
        last edited by stephenw10

        last pid: 37455;  load averages:  0.46,  0.22,  0.12                                                                                                                            up 5+01:50:00  19:02:45
        225 processes: 9 running, 162 sleeping, 54 waiting
        CPU:  0.1% user,  0.0% nice,  0.0% system,  5.1% interrupt, 94.8% idle
        Mem: 39M Active, 126M Inact, 453M Wired, 113M Buf, 15G Free
        Swap: 4096M Total, 4096M Free
        
          PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
           11 root       155 ki31     0K   128K RUN     7 121.4H  99.51% [idle{idle: cpu7}]
           11 root       155 ki31     0K   128K CPU5    5 121.4H  99.27% [idle{idle: cpu5}]
           11 root       155 ki31     0K   128K CPU3    3 121.4H  97.10% [idle{idle: cpu3}]
           11 root       155 ki31     0K   128K CPU1    1 121.4H  96.79% [idle{idle: cpu1}]
           11 root       155 ki31     0K   128K CPU4    4 121.2H  93.31% [idle{idle: cpu4}]
           11 root       155 ki31     0K   128K CPU0    0 121.3H  92.33% [idle{idle: cpu0}]
           11 root       155 ki31     0K   128K CPU6    6 121.3H  88.53% [idle{idle: cpu6}]
           11 root       155 ki31     0K   128K CPU2    2 121.2H  87.83% [idle{idle: cpu2}]
           12 root       -92    -     0K   864K WAIT    6   2:49  10.92% [intr{irq268: t5nex0:0a2}]
           12 root       -92    -     0K   864K WAIT    5   6:20   5.94% [intr{irq266: t5nex0:0a0}]
           12 root       -92    -     0K   864K WAIT    0   2:31   5.81% [intr{irq269: t5nex0:0a3}]
           12 root       -92    -     0K   864K WAIT    4   3:34   5.51% [intr{irq267: t5nex0:0a1}]
           12 root       -92    -     0K   864K WAIT    2   2:58   5.49% [intr{irq270: t5nex0:0a4}]
           12 root       -92    -     0K   864K WAIT    0   1:20   1.57% [intr{irq277: t5nex0:1a3}]
           12 root       -92    -     0K   864K WAIT    5   1:30   1.53% [intr{irq278: t5nex0:1a4}]
           12 root       -92    -     0K   864K WAIT    6   1:28   0.81% [intr{irq276: t5nex0:1a2}]
           12 root       -92    -     0K   864K WAIT    4   1:38   0.76% [intr{irq279: t5nex0:1a5}]
        95341 root        52    0 92832K 35520K accept  0   0:21   0.29% php-fpm: pool nginx (php-fpm)
           12 root       -92    -     0K   864K WAIT    0   3:16   0.16% [intr{irq273: t5nex0:0a7}]
        85715 unbound     20    0 87952K 45620K kqread  2   0:14   0.06% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
         4510 dustin      20    0  9860K  4660K CPU7    7   0:11   0.04% top -aSH
           12 root       -92    -     0K   864K WAIT    0   1:27   0.03% [intr{irq281: t5nex0:1a7}]
           12 root       -92    -     0K   864K WAIT    4   3:04   0.03% [intr{irq271: t5nex0:0a5}]
        85715 unbound     20    0 87952K 45620K kqread  3   0:08   0.03% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        85715 unbound     20    0 87952K 45620K kqread  5   0:10   0.02% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
            0 root       -92    -     0K   656K -       5   0:35   0.02% [kernel{t5nex0 tq1}]
        85715 unbound     20    0 87952K 45620K kqread  1   0:06   0.02% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
           12 root       -92    -     0K   864K WAIT    2   1:35   0.02% [intr{irq280: t5nex0:1a6}]
           12 root       -92    -     0K   864K WAIT    2   2:03   0.02% [intr{irq274: t5nex0:1a0}]
           12 root       -92    -     0K   864K WAIT    4   1:54   0.02% [intr{irq275: t5nex0:1a1}]
           12 root       -92    -     0K   864K WAIT    6   2:50   0.02% [intr{irq272: t5nex0:0a6}]
           12 root       -60    -     0K   864K WAIT    6   4:26   0.02% [intr{swi4: clock (0)}]
        85715 unbound     20    0 87952K 45620K kqread  6   0:08   0.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
            0 root       -92    -     0K   656K -       7   0:31   0.01% [kernel{t5nex0 tq0}]
        85715 unbound     20    0 87952K 45620K kqread  2   0:07   0.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
           23 root       -16    -     0K    16K pftm    0   1:25   0.01% [pf purge]
        50367 root        20    0 23592K  9432K kqread  4   1:08   0.01% nginx: worker process (nginx)
        85715 unbound     20    0 87952K 45620K kqread  7   0:06   0.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        98386 root        20    0  9464K  5788K select  6   0:10   0.01% /usr/local/sbin/miniupnpd -f /var/etc/miniupnpd.conf -P /var/run/miniupnpd.pid
        51067 root        20    0 12396K 12500K select  2   0:19   0.01% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid{ntpd}
        85715 unbound     20    0 87952K 45620K kqread  4   0:03   0.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        98478 root        20    0  6288K  2108K select  5   1:25   0.00% /usr/sbin/powerd -b hadp -a hadp -n hadp
           24 root       -16    -     0K    16K -       0   0:42   0.00% [rand_harvestq]
        35774 dustin      20    0 12672K  7936K select  6   0:01   0.00% sshd: dustin@pts/0 (sshd)
        25388 root        20    0 10996K  2408K nanslp  7   0:18   0.00% [dpinger{dpinger}]
        91815 dhcpd       20    0 12580K  8616K select  3   0:08   0.00% /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid cxl1
           25 root       -16    -     0K    16K tzpoll  5   0:06   0.00% [acpi_thermal]
        11268 root        20    0  6400K  2560K select  1   0:24   0.00% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf
        25388 root        20    0 10996K  2408K sbwait  1   0:09   0.00% [dpinger{dpinger}]
           12 root       -72    -     0K   864K WAIT    4   0:01   0.00% [intr{swi1: netisr 0}]
          356 root        20    0 88604K 22628K kqread  5   0:07   0.00% php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
           12 root       -88    -     0K   864K WAIT    4   0:04   0.00% [intr{irq283: xhci0}]
        
        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Hmm, that was whilst passing 4Gbps?

          Definitely not a CPU issue then, hardly having to work at all there! In fact suspiciously low.

          Steve

          1 Reply Last reply Reply Quote 0
          • D
            dustintuft
            last edited by

            Yes that was about mid test on the download leg of the test, but it remains pretty steady through both tests.

            1 Reply Last reply Reply Quote 0
            • T
              tman222
              last edited by

              Hi @dustintuft - I have also done some testing at 10Gbit recently, here is a thread on that topic:

              https://forum.netgate.com/topic/132394/10gbit-performance-testing

              Are you getting 10Gbit locally on your network? The hardware you have running pfSense should be powerful for you to get 10Gbit speeds across the firewall. It might be interesting to run some iperf3 tests and drop the MTU down to decrease the packet size and the get an idea how many PPS your hardware can handle (this will be a limiting factor).

              Also, here is a helpful link for 10Gbit tuning:
              https://fasterdata.es.net/host-tuning/

              Finally, do you have another host (or hosts) you can test the throughput with or just he Mac?

              Hope this helps.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                The problem with that or this say: https://calomel.org/freebsd_network_tuning.html

                Is that they are setting meant for hosts. Most of that tuning is only applicable to TCP end points and not directly to a router/firewall.

                Some of what's here is more directly applicable:
                https://calomel.org/network_performance.html

                Steve

                T 1 Reply Last reply Reply Quote 0
                • T
                  tman222 @stephenw10
                  last edited by

                  @stephenw10 - good point and you are right. I mainly pointed that out in case the hosts themselves needed some tuning. At this point it's still unclear to me whether the issue is with the firewall (i.e. it can't process enough PPS) or one of the hosts.

                  Having said that, there is some tuning that can be done on the Chelsio itself. Please see this page:

                  https://www.freebsd.org/cgi/man.cgi?query=cxgbe&sektion=4&manpath=freebsd-release-ports

                  IMHO, parameters worth tweaking to start include the size of RX and TX queues and disabling flow control (Pause Settings).

                  Hope this helps.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Yes lets see the output of netstat -m and sysctl hw.cxgbe.

                    Might also need the dev.cxl and dev.t5nex if they have individual settings applied or to check for dropped packets etc.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • D
                      dustintuft
                      last edited by dustintuft

                      netstat -m

                      Before running speed test:
                      [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m
                      4100/6535/10635 mbufs in use (current/cache/total)
                      0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max)
                      0/3036 mbuf+clusters out of packet secondary zone in use (current/cache)
                      16256/453/16709/505392 4k (page size) jumbo clusters in use (current/cache/total/max)
                      0/0/0/149746 9k jumbo clusters in use (current/cache/total/max)
                      0/0/0/84232 16k jumbo clusters in use (current/cache/total/max)
                      66049K/9557K/75606K bytes allocated to network (current/cache/total)
                      0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
                      0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
                      0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
                      0/0/0 requests for jumbo clusters denied (4k/9k/16k)
                      0 sendfile syscalls
                      0 sendfile syscalls completed without I/O request
                      0 requests for I/O initiated by sendfile
                      0 pages read by sendfile as part of a request
                      0 pages were valid at time of a sendfile request
                      0 pages were requested for read ahead by applications
                      0 pages were read ahead by sendfile
                      0 times sendfile encountered an already busy page
                      0 requests for sfbufs denied
                      0 requests for sfbufs delayed
                      
                      During Speed Test
                      [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m
                      4745/5890/10635 mbufs in use (current/cache/total)
                      0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max)
                      0/3036 mbuf+clusters out of packet secondary zone in use (current/cache)
                      16525/464/16989/505392 4k (page size) jumbo clusters in use (current/cache/total/max)
                      0/0/0/149746 9k jumbo clusters in use (current/cache/total/max)
                      0/0/0/84232 16k jumbo clusters in use (current/cache/total/max)
                      67286K/9440K/76726K bytes allocated to network (current/cache/total)
                      0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
                      0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
                      0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
                      0/0/0 requests for jumbo clusters denied (4k/9k/16k)
                      0 sendfile syscalls
                      0 sendfile syscalls completed without I/O request
                      0 requests for I/O initiated by sendfile
                      0 pages read by sendfile as part of a request
                      0 pages were valid at time of a sendfile request
                      0 pages were requested for read ahead by applications
                      0 pages were read ahead by sendfile
                      0 times sendfile encountered an already busy page
                      0 requests for sfbufs denied
                      0 requests for sfbufs delayed
                      [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m
                      4111/6524/10635 mbufs in use (current/cache/total)
                      0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max)
                      0/3036 mbuf+clusters out of packet secondary zone in use (current/cache)
                      16256/761/17017/505392 4k (page size) jumbo clusters in use (current/cache/total/max)
                      0/0/0/149746 9k jumbo clusters in use (current/cache/total/max)
                      0/0/0/84232 16k jumbo clusters in use (current/cache/total/max)
                      66051K/10787K/76838K bytes allocated to network (current/cache/total)
                      0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
                      0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
                      0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
                      0/0/0 requests for jumbo clusters denied (4k/9k/16k)
                      0 sendfile syscalls
                      0 sendfile syscalls completed without I/O request
                      0 requests for I/O initiated by sendfile
                      0 pages read by sendfile as part of a request
                      0 pages were valid at time of a sendfile request
                      0 pages were requested for read ahead by applications
                      0 pages were read ahead by sendfile
                      0 times sendfile encountered an already busy page
                      0 requests for sfbufs denied
                      0 requests for sfbufs delayed
                      [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m
                      4151/6484/10635 mbufs in use (current/cache/total)
                      0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max)
                      0/3036 mbuf+clusters out of packet secondary zone in use (current/cache)
                      16264/753/17017/505392 4k (page size) jumbo clusters in use (current/cache/total/max)
                      0/0/0/149746 9k jumbo clusters in use (current/cache/total/max)
                      0/0/0/84232 16k jumbo clusters in use (current/cache/total/max)
                      66093K/10745K/76838K bytes allocated to network (current/cache/total)
                      0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
                      0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
                      0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
                      0/0/0 requests for jumbo clusters denied (4k/9k/16k)
                      0 sendfile syscalls
                      0 sendfile syscalls completed without I/O request
                      0 requests for I/O initiated by sendfile
                      0 pages read by sendfile as part of a request
                      0 pages were valid at time of a sendfile request
                      0 pages were requested for read ahead by applications
                      0 pages were read ahead by sendfile
                      0 times sendfile encountered an already busy page
                      0 requests for sfbufs denied
                      0 requests for sfbufs delayed
                      
                      [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: sysctl hw.cxgbe
                      hw.cxgbe.nm_holdoff_tmr_idx: 2
                      hw.cxgbe.nm_rx_ndesc: 256
                      hw.cxgbe.nm_black_hole: 0
                      
                      [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: sysctl dev.t5nex
                      dev.t5nex.0.mgmtq.tx_wrs_sspace: 0
                      dev.t5nex.0.mgmtq.tx_wrs_copied: 0
                      dev.t5nex.0.mgmtq.tx_wrs_direct: 0
                      dev.t5nex.0.mgmtq.sidx: 127
                      dev.t5nex.0.mgmtq.pidx: 0
                      dev.t5nex.0.mgmtq.cidx: 0
                      dev.t5nex.0.mgmtq.cntxt_id: 32
                      dev.t5nex.0.mgmtq.dmalen: 8192
                      dev.t5nex.0.mgmtq.ba: 205512704
                      dev.t5nex.0.fwq.cidx: 4
                      dev.t5nex.0.fwq.cntxt_id: 16
                      dev.t5nex.0.fwq.abs_id: 16
                      dev.t5nex.0.fwq.dmalen: 16384
                      dev.t5nex.0.fwq.ba: 1095041024
                      dev.t5nex.0.core_vdd: 1014
                      dev.t5nex.0.temperature: 47
                      dev.t5nex.0.nfilters: 1008
                      dev.t5nex.0.fcoecaps: 0
                      dev.t5nex.0.cryptocaps: 0
                      dev.t5nex.0.iscsicaps: 0
                      dev.t5nex.0.rdmacaps: 0
                      dev.t5nex.0.toecaps: 0
                      dev.t5nex.0.niccaps: 1<NIC>
                      dev.t5nex.0.switchcaps: 3<INGRESS,EGRESS>
                      dev.t5nex.0.linkcaps: 0
                      dev.t5nex.0.nbmcaps: 0
                      dev.t5nex.0.cfcsum: 2311601560
                      dev.t5nex.0.cf: default
                      dev.t5nex.0.vpd_version: 1
                      dev.t5nex.0.scfg_version: 16814080
                      dev.t5nex.0.bs_version: 1.1.0.0
                      dev.t5nex.0.er_version: 1.0.0.90
                      dev.t5nex.0.na: 0007434A1920
                      dev.t5nex.0.md_version: t4d-0.0.0
                      dev.t5nex.0.ec: 0000000000000000
                      dev.t5nex.0.pn: 110118850A0
                      dev.t5nex.0.sn: PT31180920
                      dev.t5nex.0.hw_revision: 1
                      dev.t5nex.0.firmware_version: 1.19.1.0
                      dev.t5nex.0.tp_version: 0.1.4.9
                      dev.t5nex.0.dflags: 0
                      dev.t5nex.0.lro_timeout: 100
                      dev.t5nex.0.fl_pack: 128
                      dev.t5nex.0.cong_drop: 0
                      dev.t5nex.0.spg_len: 64
                      dev.t5nex.0.fl_pad: 32
                      dev.t5nex.0.fl_pktshift: 2
                      dev.t5nex.0.buffer_sizes: 2048* 4096* 3968* 3456* 9216* 16384* 1664* 9088* 16256* 0 0 0 0 0 0 0
                      dev.t5nex.0.holdoff_pkt_counts: 1 8 16 32
                      dev.t5nex.0.holdoff_timers: 1 5 10 50 100 200
                      dev.t5nex.0.core_clock: 250000
                      dev.t5nex.0.doorbells: 9<UDB,KDB>
                      dev.t5nex.0.nports: 2
                      dev.t5nex.0.do_rx_copy: 1
                      dev.t5nex.0.%parent: pci1
                      dev.t5nex.0.%pnpinfo: vendor=0x1425 device=0x5407 subvendor=0x1425 subdevice=0x0000 class=0x020000
                      dev.t5nex.0.%location: slot=0 function=4 dbsf=pci0:1:0:4
                      dev.t5nex.0.%driver: t5nex
                      dev.t5nex.0.%desc: Chelsio T520-SO
                      dev.t5nex.%parent:
                      

                      The dev.cxl was really long, and pushed out of my buffer, how do I save it to a file?

                      Thanks!
                      -Dustin T

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        sysctl dev.cxl > /root/cxlsysctls.txt will do it.

                        1 Reply Last reply Reply Quote 0
                        • D
                          dustintuft
                          last edited by

                          Thank you!
                          0_1539034203435_cxlsysctls.txt
                          -Dustin T

                          1 Reply Last reply Reply Quote 0
                          • T
                            tman222
                            last edited by

                            Hi @dustintuft - before trying to tweak parameters on the firewall, can you confirm that you can move 10Gbit between hosts locally on your network (e.g. by running something like an iperf3 test)? It would be good to know this info before troubleshooting further.

                            1 Reply Last reply Reply Quote 0
                            • M
                              mountainlion
                              last edited by

                              Doing a POC for Solutions Provider biz. @dustintuft is your lack of response that you fixed the issue and forgot to update the thread (happens to me) or figured out the testing gear wasnt getting to 10G either. Please keep this party going...

                              D 1 Reply Last reply Reply Quote 0
                              • D
                                dustintuft @mountainlion
                                last edited by

                                @mountainlion I ran into a wall, I have ran iprefs that show my switch and cards can achieve 9.8gb, but that is server to server, these are dell r710 with intel 10gb dual port SFP+ cards, but I can't get those kids of speeds through the firewall or through the firewall to a speed test server local to the public side of the firewall. I am still trying to tweak the card drivers, but I am very unformulare with linux drivers so I am afraid I am not making much progress.

                                1 Reply Last reply Reply Quote 0
                                • M
                                  mountainlion
                                  last edited by

                                  So just to drill down a bit more... are you saying you ran iperf from the pfsense OS via cli to another server and got 9.8G?
                                  But when you ran through the FW and presumably from LAN to WAN, you get the 4-6G?

                                  Or are you saying you had separate servers setup to validate the network underlay can support 10G, but all things PFsense come up short?
                                  If the later, then perhaps a feature enhancement request?

                                  1 Reply Last reply Reply Quote 0
                                  • D
                                    dustintuft
                                    last edited by dustintuft

                                    Yea the later is correct. I have been messing with this some more this weekend, and I think I have at least got to a state where I am happy with the results.

                                    Tweaking the BIOS and disabling hyper-threading seems to have given me the performance I was looking for. Here is my current speed test results. Keep in mind the speed test server is local to my public side and its 10gb so I am not expecting to ever see a solid 10 result.

                                    alt text

                                    If I get more time I might dive into some of the driver tweaks suggested, but I am not sure that will end well since I am a novus at best.

                                    1 Reply Last reply Reply Quote 0
                                    • Q
                                      q54e3w
                                      last edited by

                                      Id be interested to see a more comprehensive breakdown of the tweaks you made and before/after results. I tweaked by old c2758 setup and am about to embark on the same on my new system. Specifically interested in the difference between hyper threading on/off.

                                      1 Reply Last reply Reply Quote 1
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.