Performance tuning for 10gb connection
-
The problem with that or this say: https://calomel.org/freebsd_network_tuning.html
Is that they are setting meant for hosts. Most of that tuning is only applicable to TCP end points and not directly to a router/firewall.
Some of what's here is more directly applicable:
https://calomel.org/network_performance.htmlSteve
-
@stephenw10 - good point and you are right. I mainly pointed that out in case the hosts themselves needed some tuning. At this point it's still unclear to me whether the issue is with the firewall (i.e. it can't process enough PPS) or one of the hosts.
Having said that, there is some tuning that can be done on the Chelsio itself. Please see this page:
https://www.freebsd.org/cgi/man.cgi?query=cxgbe&sektion=4&manpath=freebsd-release-ports
IMHO, parameters worth tweaking to start include the size of RX and TX queues and disabling flow control (Pause Settings).
Hope this helps.
-
Yes lets see the output of
netstat -m
andsysctl hw.cxgbe
.Might also need the dev.cxl and dev.t5nex if they have individual settings applied or to check for dropped packets etc.
Steve
-
netstat -m
Before running speed test: [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m 4100/6535/10635 mbufs in use (current/cache/total) 0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max) 0/3036 mbuf+clusters out of packet secondary zone in use (current/cache) 16256/453/16709/505392 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/149746 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84232 16k jumbo clusters in use (current/cache/total/max) 66049K/9557K/75606K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed During Speed Test [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m 4745/5890/10635 mbufs in use (current/cache/total) 0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max) 0/3036 mbuf+clusters out of packet secondary zone in use (current/cache) 16525/464/16989/505392 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/149746 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84232 16k jumbo clusters in use (current/cache/total/max) 67286K/9440K/76726K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m 4111/6524/10635 mbufs in use (current/cache/total) 0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max) 0/3036 mbuf+clusters out of packet secondary zone in use (current/cache) 16256/761/17017/505392 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/149746 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84232 16k jumbo clusters in use (current/cache/total/max) 66051K/10787K/76838K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed [2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: netstat -m 4151/6484/10635 mbufs in use (current/cache/total) 0/3056/3056/2000000 mbuf clusters in use (current/cache/total/max) 0/3036 mbuf+clusters out of packet secondary zone in use (current/cache) 16264/753/17017/505392 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/149746 9k jumbo clusters in use (current/cache/total/max) 0/0/0/84232 16k jumbo clusters in use (current/cache/total/max) 66093K/10745K/76838K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0 sendfile syscalls 0 sendfile syscalls completed without I/O request 0 requests for I/O initiated by sendfile 0 pages read by sendfile as part of a request 0 pages were valid at time of a sendfile request 0 pages were requested for read ahead by applications 0 pages were read ahead by sendfile 0 times sendfile encountered an already busy page 0 requests for sfbufs denied 0 requests for sfbufs delayed
[2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: sysctl hw.cxgbe hw.cxgbe.nm_holdoff_tmr_idx: 2 hw.cxgbe.nm_rx_ndesc: 256 hw.cxgbe.nm_black_hole: 0
[2.4.4-RELEASE][dustin@tuftfw.--.net]/home/dustin: sysctl dev.t5nex dev.t5nex.0.mgmtq.tx_wrs_sspace: 0 dev.t5nex.0.mgmtq.tx_wrs_copied: 0 dev.t5nex.0.mgmtq.tx_wrs_direct: 0 dev.t5nex.0.mgmtq.sidx: 127 dev.t5nex.0.mgmtq.pidx: 0 dev.t5nex.0.mgmtq.cidx: 0 dev.t5nex.0.mgmtq.cntxt_id: 32 dev.t5nex.0.mgmtq.dmalen: 8192 dev.t5nex.0.mgmtq.ba: 205512704 dev.t5nex.0.fwq.cidx: 4 dev.t5nex.0.fwq.cntxt_id: 16 dev.t5nex.0.fwq.abs_id: 16 dev.t5nex.0.fwq.dmalen: 16384 dev.t5nex.0.fwq.ba: 1095041024 dev.t5nex.0.core_vdd: 1014 dev.t5nex.0.temperature: 47 dev.t5nex.0.nfilters: 1008 dev.t5nex.0.fcoecaps: 0 dev.t5nex.0.cryptocaps: 0 dev.t5nex.0.iscsicaps: 0 dev.t5nex.0.rdmacaps: 0 dev.t5nex.0.toecaps: 0 dev.t5nex.0.niccaps: 1<NIC> dev.t5nex.0.switchcaps: 3<INGRESS,EGRESS> dev.t5nex.0.linkcaps: 0 dev.t5nex.0.nbmcaps: 0 dev.t5nex.0.cfcsum: 2311601560 dev.t5nex.0.cf: default dev.t5nex.0.vpd_version: 1 dev.t5nex.0.scfg_version: 16814080 dev.t5nex.0.bs_version: 1.1.0.0 dev.t5nex.0.er_version: 1.0.0.90 dev.t5nex.0.na: 0007434A1920 dev.t5nex.0.md_version: t4d-0.0.0 dev.t5nex.0.ec: 0000000000000000 dev.t5nex.0.pn: 110118850A0 dev.t5nex.0.sn: PT31180920 dev.t5nex.0.hw_revision: 1 dev.t5nex.0.firmware_version: 1.19.1.0 dev.t5nex.0.tp_version: 0.1.4.9 dev.t5nex.0.dflags: 0 dev.t5nex.0.lro_timeout: 100 dev.t5nex.0.fl_pack: 128 dev.t5nex.0.cong_drop: 0 dev.t5nex.0.spg_len: 64 dev.t5nex.0.fl_pad: 32 dev.t5nex.0.fl_pktshift: 2 dev.t5nex.0.buffer_sizes: 2048* 4096* 3968* 3456* 9216* 16384* 1664* 9088* 16256* 0 0 0 0 0 0 0 dev.t5nex.0.holdoff_pkt_counts: 1 8 16 32 dev.t5nex.0.holdoff_timers: 1 5 10 50 100 200 dev.t5nex.0.core_clock: 250000 dev.t5nex.0.doorbells: 9<UDB,KDB> dev.t5nex.0.nports: 2 dev.t5nex.0.do_rx_copy: 1 dev.t5nex.0.%parent: pci1 dev.t5nex.0.%pnpinfo: vendor=0x1425 device=0x5407 subvendor=0x1425 subdevice=0x0000 class=0x020000 dev.t5nex.0.%location: slot=0 function=4 dbsf=pci0:1:0:4 dev.t5nex.0.%driver: t5nex dev.t5nex.0.%desc: Chelsio T520-SO dev.t5nex.%parent:
The dev.cxl was really long, and pushed out of my buffer, how do I save it to a file?
Thanks!
-Dustin T -
sysctl dev.cxl > /root/cxlsysctls.txt
will do it. -
Thank you!
0_1539034203435_cxlsysctls.txt
-Dustin T -
Hi @dustintuft - before trying to tweak parameters on the firewall, can you confirm that you can move 10Gbit between hosts locally on your network (e.g. by running something like an iperf3 test)? It would be good to know this info before troubleshooting further.
-
Doing a POC for Solutions Provider biz. @dustintuft is your lack of response that you fixed the issue and forgot to update the thread (happens to me) or figured out the testing gear wasnt getting to 10G either. Please keep this party going...
-
@mountainlion I ran into a wall, I have ran iprefs that show my switch and cards can achieve 9.8gb, but that is server to server, these are dell r710 with intel 10gb dual port SFP+ cards, but I can't get those kids of speeds through the firewall or through the firewall to a speed test server local to the public side of the firewall. I am still trying to tweak the card drivers, but I am very unformulare with linux drivers so I am afraid I am not making much progress.
-
So just to drill down a bit more... are you saying you ran iperf from the pfsense OS via cli to another server and got 9.8G?
But when you ran through the FW and presumably from LAN to WAN, you get the 4-6G?Or are you saying you had separate servers setup to validate the network underlay can support 10G, but all things PFsense come up short?
If the later, then perhaps a feature enhancement request? -
Yea the later is correct. I have been messing with this some more this weekend, and I think I have at least got to a state where I am happy with the results.
Tweaking the BIOS and disabling hyper-threading seems to have given me the performance I was looking for. Here is my current speed test results. Keep in mind the speed test server is local to my public side and its 10gb so I am not expecting to ever see a solid 10 result.
If I get more time I might dive into some of the driver tweaks suggested, but I am not sure that will end well since I am a novus at best.
-
Id be interested to see a more comprehensive breakdown of the tweaks you made and before/after results. I tweaked by old c2758 setup and am about to embark on the same on my new system. Specifically interested in the difference between hyper threading on/off.