Performance tuning for 10gb connection



  • Hi All,

    I am having a hard time finding a good guide/instruction to help me get the most out of my performance. At best I am getting about 4gb through put. So is it my hardware? is there a way to tweak the config in PfSense so I can achieve above 5gb?

    My current hardware is an ASUS mini-itx motherboard with Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz PCU, 16gb of ram, the hard drive is M.2 250gig NVMe. I am running a T520-SO-CR cheliso dual port card in the only PCIe 16x slot the board has.

    I am on a fiber network for my internet, running 10gb service, so I have an extreme networks 670x 48 sfp+ switch connecting everything, I get a MM dual LC fiber hand off from the Juniper CPE into the switch, that then goes into the PfSense, and then back out on the second port into my switch, I have the switch configured with 4 or so Vlans, one being my public side where I connect the gear from my ISP and the PfSense and some various servers. then I have a private Vlan for 192. subnet, my system sits on that network with a Molenex 10gb card, so in theory I should be able to get 10gb from my desktop out to my ISP speed test server or at least 7gb maybe 8gb.

    I am fairly new to PfSense so please don't assume I have a solid grasp on proper setup.

    Thanks!
    -dustin


  • Netgate Administrator

    How are you testing that?

    Try running top -aSH at the command line during the test to see how the load is distributing across the cores.

    I would expect to see more than that with that hardware if you don't have a huge number of rules or packages running.

    You might try this initially:
    https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html?highlight=tuning#ip-input-queue-intr-queue

    Steve



  • Hi Steve,

    I am using Speedtest.net, I have setup a Ookla server on my public side of my network. now from my work I have been able to get speeds up 6Gbs directly to the speed test server from an iMac pro (the one with the 10gbT port), how ever that is pushing over a really busy NNI in our head room and since its T based it has to convert from copper to fiber, so it might be the best I can get coming that direction.

    I am not sure I am reading this correctly, but the top 8 lines items that have CPUx in the STATE category all stay in the high 95's, with 2 dropping to the low 70's.

    Tested with bumping my IP Input Queue from 1000 to 2000, then to 3000, is there a limit here? Also I noticed that I didn't have an entry for mbuf, so I put one in at 1000000 then tested with 2000000.

    between all changed I saw some small gains in speed on the speed tests. still about 4.1 to 4.3

    -dustin


  • Netgate Administrator

    If you hit q during the test you can copy/paste the output here.
    Really depends what those processes are.

    Steve



  • last pid: 37455;  load averages:  0.46,  0.22,  0.12                                                                                                                            up 5+01:50:00  19:02:45
    225 processes: 9 running, 162 sleeping, 54 waiting
    CPU:  0.1% user,  0.0% nice,  0.0% system,  5.1% interrupt, 94.8% idle
    Mem: 39M Active, 126M Inact, 453M Wired, 113M Buf, 15G Free
    Swap: 4096M Total, 4096M Free
    
      PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       11 root       155 ki31     0K   128K RUN     7 121.4H  99.51% [idle{idle: cpu7}]
       11 root       155 ki31     0K   128K CPU5    5 121.4H  99.27% [idle{idle: cpu5}]
       11 root       155 ki31     0K   128K CPU3    3 121.4H  97.10% [idle{idle: cpu3}]
       11 root       155 ki31     0K   128K CPU1    1 121.4H  96.79% [idle{idle: cpu1}]
       11 root       155 ki31     0K   128K CPU4    4 121.2H  93.31% [idle{idle: cpu4}]
       11 root       155 ki31     0K   128K CPU0    0 121.3H  92.33% [idle{idle: cpu0}]
       11 root       155 ki31     0K   128K CPU6    6 121.3H  88.53% [idle{idle: cpu6}]
       11 root       155 ki31     0K   128K CPU2    2 121.2H  87.83% [idle{idle: cpu2}]
       12 root       -92    -     0K   864K WAIT    6   2:49  10.92% [intr{irq268: t5nex0:0a2}]
       12 root       -92    -     0K   864K WAIT    5   6:20   5.94% [intr{irq266: t5nex0:0a0}]
       12 root       -92    -     0K   864K WAIT    0   2:31   5.81% [intr{irq269: t5nex0:0a3}]
       12 root       -92    -     0K   864K WAIT    4   3:34   5.51% [intr{irq267: t5nex0:0a1}]
       12 root       -92    -     0K   864K WAIT    2   2:58   5.49% [intr{irq270: t5nex0:0a4}]
       12 root       -92    -     0K   864K WAIT    0   1:20   1.57% [intr{irq277: t5nex0:1a3}]
       12 root       -92    -     0K   864K WAIT    5   1:30   1.53% [intr{irq278: t5nex0:1a4}]
       12 root       -92    -     0K   864K WAIT    6   1:28   0.81% [intr{irq276: t5nex0:1a2}]
       12 root       -92    -     0K   864K WAIT    4   1:38   0.76% [intr{irq279: t5nex0:1a5}]
    95341 root        52    0 92832K 35520K accept  0   0:21   0.29% php-fpm: pool nginx (php-fpm)
       12 root       -92    -     0K   864K WAIT    0   3:16   0.16% [intr{irq273: t5nex0:0a7}]
    85715 unbound     20    0 87952K 45620K kqread  2   0:14   0.06% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
     4510 dustin      20    0  9860K  4660K CPU7    7   0:11   0.04% top -aSH
       12 root       -92    -     0K   864K WAIT    0   1:27   0.03% [intr{irq281: t5nex0:1a7}]
       12 root       -92    -     0K   864K WAIT    4   3:04   0.03% [intr{irq271: t5nex0:0a5}]
    85715 unbound     20    0 87952K 45620K kqread  3   0:08   0.03% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
    85715 unbound     20    0 87952K 45620K kqread  5   0:10   0.02% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        0 root       -92    -     0K   656K -       5   0:35   0.02% [kernel{t5nex0 tq1}]
    85715 unbound     20    0 87952K 45620K kqread  1   0:06   0.02% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
       12 root       -92    -     0K   864K WAIT    2   1:35   0.02% [intr{irq280: t5nex0:1a6}]
       12 root       -92    -     0K   864K WAIT    2   2:03   0.02% [intr{irq274: t5nex0:1a0}]
       12 root       -92    -     0K   864K WAIT    4   1:54   0.02% [intr{irq275: t5nex0:1a1}]
       12 root       -92    -     0K   864K WAIT    6   2:50   0.02% [intr{irq272: t5nex0:0a6}]
       12 root       -60    -     0K   864K WAIT    6   4:26   0.02% [intr{swi4: clock (0)}]
    85715 unbound     20    0 87952K 45620K kqread  6   0:08   0.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        0 root       -92    -     0K   656K -       7   0:31   0.01% [kernel{t5nex0 tq0}]
    85715 unbound     20    0 87952K 45620K kqread  2   0:07   0.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
       23 root       -16    -     0K    16K pftm    0   1:25   0.01% [pf purge]
    50367 root        20    0 23592K  9432K kqread  4   1:08   0.01% nginx: worker process (nginx)
    85715 unbound     20    0 87952K 45620K kqread  7   0:06   0.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
    98386 root        20    0  9464K  5788K select  6   0:10   0.01% /usr/local/sbin/miniupnpd -f /var/etc/miniupnpd.conf -P /var/run/miniupnpd.pid
    51067 root        20    0 12396K 12500K select  2   0:19   0.01% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid{ntpd}
    85715 unbound     20    0 87952K 45620K kqread  4   0:03   0.01% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
    98478 root        20    0  6288K  2108K select  5   1:25   0.00% /usr/sbin/powerd -b hadp -a hadp -n hadp
       24 root       -16    -     0K    16K -       0   0:42   0.00% [rand_harvestq]
    35774 dustin      20    0 12672K  7936K select  6   0:01   0.00% sshd: dustin@pts/0 (sshd)
    25388 root        20    0 10996K  2408K nanslp  7   0:18   0.00% [dpinger{dpinger}]
    91815 dhcpd       20    0 12580K  8616K select  3   0:08   0.00% /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid cxl1
       25 root       -16    -     0K    16K tzpoll  5   0:06   0.00% [acpi_thermal]
    11268 root        20    0  6400K  2560K select  1   0:24   0.00% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf
    25388 root        20    0 10996K  2408K sbwait  1   0:09   0.00% [dpinger{dpinger}]
       12 root       -72    -     0K   864K WAIT    4   0:01   0.00% [intr{swi1: netisr 0}]
      356 root        20    0 88604K 22628K kqread  5   0:07   0.00% php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
       12 root       -88    -     0K   864K WAIT    4   0:04   0.00% [intr{irq283: xhci0}]
    

  • Netgate Administrator

    Hmm, that was whilst passing 4Gbps?

    Definitely not a CPU issue then, hardly having to work at all there! In fact suspiciously low.

    Steve



  • Yes that was about mid test on the download leg of the test, but it remains pretty steady through both tests.



  • Hi @dustintuft - I have also done some testing at 10Gbit recently, here is a thread on that topic:

    https://forum.netgate.com/topic/132394/10gbit-performance-testing

    Are you getting 10Gbit locally on your network? The hardware you have running pfSense should be powerful for you to get 10Gbit speeds across the firewall. It might be interesting to run some iperf3 tests and drop the MTU down to decrease the packet size and the get an idea how many PPS your hardware can handle (this will be a limiting factor).

    Also, here is a helpful link for 10Gbit tuning:
    https://fasterdata.es.net/host-tuning/

    Finally, do you have another host (or hosts) you can test the throughput with or just he Mac?

    Hope this helps.


  • Netgate Administrator

    The problem with that or this say: https://calomel.org/freebsd_network_tuning.html

    Is that they are setting meant for hosts. Most of that tuning is only applicable to TCP end points and not directly to a router/firewall.

    Some of what's here is more directly applicable:
    https://calomel.org/network_performance.html

    Steve



  • @stephenw10 - good point and you are right. I mainly pointed that out in case the hosts themselves needed some tuning. At this point it's still unclear to me whether the issue is with the firewall (i.e. it can't process enough PPS) or one of the hosts.

    Having said that, there is some tuning that can be done on the Chelsio itself. Please see this page:

    https://www.freebsd.org/cgi/man.cgi?query=cxgbe&sektion=4&manpath=freebsd-release-ports

    IMHO, parameters worth tweaking to start include the size of RX and TX queues and disabling flow control (Pause Settings).

    Hope this helps.


  • Netgate Administrator

    Yes lets see the output of netstat -m and sysctl hw.cxgbe.

    Might also need the dev.cxl and dev.t5nex if they have individual settings applied or to check for dropped packets etc.

    Steve



  • netstat -m

    Before running speed test:
    [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m
    4100/6535/10635 mbufs in use (current/cache/total)
    0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max)
    0/3036 mbuf+clusters out of packet secondary zone in use (current/cache)
    16256/453/16709/505392 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/149746 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/84232 16k jumbo clusters in use (current/cache/total/max)
    66049K/9557K/75606K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    0 sendfile syscalls
    0 sendfile syscalls completed without I/O request
    0 requests for I/O initiated by sendfile
    0 pages read by sendfile as part of a request
    0 pages were valid at time of a sendfile request
    0 pages were requested for read ahead by applications
    0 pages were read ahead by sendfile
    0 times sendfile encountered an already busy page
    0 requests for sfbufs denied
    0 requests for sfbufs delayed
    
    During Speed Test
    [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m
    4745/5890/10635 mbufs in use (current/cache/total)
    0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max)
    0/3036 mbuf+clusters out of packet secondary zone in use (current/cache)
    16525/464/16989/505392 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/149746 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/84232 16k jumbo clusters in use (current/cache/total/max)
    67286K/9440K/76726K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    0 sendfile syscalls
    0 sendfile syscalls completed without I/O request
    0 requests for I/O initiated by sendfile
    0 pages read by sendfile as part of a request
    0 pages were valid at time of a sendfile request
    0 pages were requested for read ahead by applications
    0 pages were read ahead by sendfile
    0 times sendfile encountered an already busy page
    0 requests for sfbufs denied
    0 requests for sfbufs delayed
    [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m
    4111/6524/10635 mbufs in use (current/cache/total)
    0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max)
    0/3036 mbuf+clusters out of packet secondary zone in use (current/cache)
    16256/761/17017/505392 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/149746 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/84232 16k jumbo clusters in use (current/cache/total/max)
    66051K/10787K/76838K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    0 sendfile syscalls
    0 sendfile syscalls completed without I/O request
    0 requests for I/O initiated by sendfile
    0 pages read by sendfile as part of a request
    0 pages were valid at time of a sendfile request
    0 pages were requested for read ahead by applications
    0 pages were read ahead by sendfile
    0 times sendfile encountered an already busy page
    0 requests for sfbufs denied
    0 requests for sfbufs delayed
    [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m
    4151/6484/10635 mbufs in use (current/cache/total)
    0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max)
    0/3036 mbuf+clusters out of packet secondary zone in use (current/cache)
    16264/753/17017/505392 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/149746 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/84232 16k jumbo clusters in use (current/cache/total/max)
    66093K/10745K/76838K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    0 sendfile syscalls
    0 sendfile syscalls completed without I/O request
    0 requests for I/O initiated by sendfile
    0 pages read by sendfile as part of a request
    0 pages were valid at time of a sendfile request
    0 pages were requested for read ahead by applications
    0 pages were read ahead by sendfile
    0 times sendfile encountered an already busy page
    0 requests for sfbufs denied
    0 requests for sfbufs delayed
    
    [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: sysctl hw.cxgbe
    hw.cxgbe.nm_holdoff_tmr_idx: 2
    hw.cxgbe.nm_rx_ndesc: 256
    hw.cxgbe.nm_black_hole: 0
    
    [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: sysctl dev.t5nex
    dev.t5nex.0.mgmtq.tx_wrs_sspace: 0
    dev.t5nex.0.mgmtq.tx_wrs_copied: 0
    dev.t5nex.0.mgmtq.tx_wrs_direct: 0
    dev.t5nex.0.mgmtq.sidx: 127
    dev.t5nex.0.mgmtq.pidx: 0
    dev.t5nex.0.mgmtq.cidx: 0
    dev.t5nex.0.mgmtq.cntxt_id: 32
    dev.t5nex.0.mgmtq.dmalen: 8192
    dev.t5nex.0.mgmtq.ba: 205512704
    dev.t5nex.0.fwq.cidx: 4
    dev.t5nex.0.fwq.cntxt_id: 16
    dev.t5nex.0.fwq.abs_id: 16
    dev.t5nex.0.fwq.dmalen: 16384
    dev.t5nex.0.fwq.ba: 1095041024
    dev.t5nex.0.core_vdd: 1014
    dev.t5nex.0.temperature: 47
    dev.t5nex.0.nfilters: 1008
    dev.t5nex.0.fcoecaps: 0
    dev.t5nex.0.cryptocaps: 0
    dev.t5nex.0.iscsicaps: 0
    dev.t5nex.0.rdmacaps: 0
    dev.t5nex.0.toecaps: 0
    dev.t5nex.0.niccaps: 1<NIC>
    dev.t5nex.0.switchcaps: 3<INGRESS,EGRESS>
    dev.t5nex.0.linkcaps: 0
    dev.t5nex.0.nbmcaps: 0
    dev.t5nex.0.cfcsum: 2311601560
    dev.t5nex.0.cf: default
    dev.t5nex.0.vpd_version: 1
    dev.t5nex.0.scfg_version: 16814080
    dev.t5nex.0.bs_version: 1.1.0.0
    dev.t5nex.0.er_version: 1.0.0.90
    dev.t5nex.0.na: 0007434A1920
    dev.t5nex.0.md_version: t4d-0.0.0
    dev.t5nex.0.ec: 0000000000000000
    dev.t5nex.0.pn: 110118850A0
    dev.t5nex.0.sn: PT31180920
    dev.t5nex.0.hw_revision: 1
    dev.t5nex.0.firmware_version: 1.19.1.0
    dev.t5nex.0.tp_version: 0.1.4.9
    dev.t5nex.0.dflags: 0
    dev.t5nex.0.lro_timeout: 100
    dev.t5nex.0.fl_pack: 128
    dev.t5nex.0.cong_drop: 0
    dev.t5nex.0.spg_len: 64
    dev.t5nex.0.fl_pad: 32
    dev.t5nex.0.fl_pktshift: 2
    dev.t5nex.0.buffer_sizes: 2048* 4096* 3968* 3456* 9216* 16384* 1664* 9088* 16256* 0 0 0 0 0 0 0
    dev.t5nex.0.holdoff_pkt_counts: 1 8 16 32
    dev.t5nex.0.holdoff_timers: 1 5 10 50 100 200
    dev.t5nex.0.core_clock: 250000
    dev.t5nex.0.doorbells: 9<UDB,KDB>
    dev.t5nex.0.nports: 2
    dev.t5nex.0.do_rx_copy: 1
    dev.t5nex.0.%parent: pci1
    dev.t5nex.0.%pnpinfo: vendor=0x1425 device=0x5407 subvendor=0x1425 subdevice=0x0000 class=0x020000
    dev.t5nex.0.%location: slot=0 function=4 dbsf=pci0:1:0:4
    dev.t5nex.0.%driver: t5nex
    dev.t5nex.0.%desc: Chelsio T520-SO
    dev.t5nex.%parent:
    

    The dev.cxl was really long, and pushed out of my buffer, how do I save it to a file?

    Thanks!
    -Dustin T


  • Netgate Administrator

    sysctl dev.cxl > /root/cxlsysctls.txt will do it.



  • Thank you!
    0_1539034203435_cxlsysctls.txt
    -Dustin T



  • Hi @dustintuft - before trying to tweak parameters on the firewall, can you confirm that you can move 10Gbit between hosts locally on your network (e.g. by running something like an iperf3 test)? It would be good to know this info before troubleshooting further.