hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz
-
@spacebass It’s your 10Gbe NIC that’s causing issues.
I have worked with HPE hardware for 20 years, and the NC523 card is a dud. I can’t quite remember the details, but the card took more than a year and half worth of Windows driver updates to it’s Qlogic controller to reliably deliver about 2Gbits performance. Until then it fluctuacted wildly and stalled to zero any time you tried to push it above about a gbit.
I can only guess how bad the driver state with FreeBSD is, but i’m quite sure the card is your culprit. Ditch it and get a Intel based 10Gbe NIC that has good driver support in pfSense.
-
@stephenw10 said in hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz:
What NIC chipset/driver is that? Those numbers seems really low.
Thanks @stephenw10 for replying!
It is a Qlogic 3200 which uses the qlxgb drivers.I applied the following tunables (but no real change)
kern.ipc.nmbjumbo9=262144 net.inet.tcp.recvbuf_max=262144 net.inet.tcp.recvbuf_inc=16384 kern.ipc.nmbclusters=1000000 kern.ipc.maxsockbuf=2097152 net.inet.tcp.recvspace=131072 net.inet.tcp.sendbuf_max=262144 net.inet.tcp.sendspace=65536
Make sure you have multiple queues in attached for each NIC.
Can you elaborate? I'm not using traffic shaping and that's the only context I have for queues
-
@keyser the good news is that I have Intel cards on order...
That said, I have a ton of these HP cards and have no problem getting 10Gbps on Linux-based boxes...it could be drivers, but the qlxgb in FreeBSD is pretty tried and true.
-
Ah I wasn't sure if
ql1
there was that. Ok then at the very least I'd start by disabling all the hardware off-loading options. But if you know exactly which driver it is you can start looking for known bugs/workarounds.But, yeah, if you can use an Intel NIC, you should.
-
@stephenw10 thanks - if I want to add the full (supposed) 8 queues, how would I go about it?
-
How many queues are you getting?
It's probably a sysctl or loader tunable for that driver. Without having one to test it's difficult to say.
Trysysctl hw.ql
or maybesysctl hw.qlxgb
and see what values exist.Also check
sysctl dev.ql.0
-
@stephenw10 said in hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz:
sysctl dev.ql.0
that's the key ... no mq options there though and I dont see any listed in the readme.txt in the driver's source.
I'll wait for the Intel NIC and see what I can get out of that
-
How many queues is it using by default? If it's just 1 that would explain the single threaded performance.
-
@stephenw10 I'll fire the box up and check ... just curious, what am I looking for in the output of sysctl? I didn't see anything with 'mq' or 'queues' in the output when I first checked.
perhaps related - what does it tell us that I get closer to 4Gbps with the -R flag on iperf3 (eg inbound) vs 2Gbps without the flag?
I'm not overly determined to make this Qlogic card work, but this is a really good learning opportunity and I'm enjoying the process.
-
You might have more Rx queues than Tx queues for example. Most drivers show that in the boot logs but I'm not sure qlxb does.
vmstat -i
may also show it. -
@stephenw10
Thanks for the continued help...Here's what I see
dev.ql.0.wake: 0 dev.ql.0.num_sds_rings: 4 dev.ql.0.num_rds_rings: 2 dev.ql.0.free_pkt_thres: 1024 dev.ql.0.snd_pkt_thres: 16 dev.ql.0.rcv_pkt_thres_d: 32 dev.ql.0.rcv_pkt_thres: 128 dev.ql.0.jumbo_replenish: 2 dev.ql.0.std_replenish: 8 dev.ql.0.debug: 0 dev.ql.0.fw_version: 4.16.50.1401759177 dev.ql.0.stats: 0 dev.ql.0.%parent: pci2 dev.ql.0.%pnpinfo: vendor=0x1077 device=0x8020 subvendor=0x103c subdevice=0x3733 class=0x020000 dev.ql.0.%location: slot=0 function=0 dbsf=pci0:2:0:0 handle=\_SB_.PCI0.PEG1.PEGP dev.ql.0.%driver: ql dev.ql.0.%desc: Qlogic ISP 80xx PCI CNA Adapter-Ethernet Function v1.1.36
I'm having trouble finding much documentation online for this driver... would snd_pkt_thres be the number of threads it is able to or currently using for the outbound queues?
Here's the output from a TrueNAS box with the same card which has no trouble moving 10Gbps traffic:
dev.ql.0.%desc: Qlogic ISP 80xx PCI CNA Adapter-Ethernet Function v1.1.36 root@matterhorn[~]# sysctl sysctl dev.ql.1 dev.ql.1.num_sds_rings: 4 dev.ql.1.num_rds_rings: 2 dev.ql.1.free_pkt_thres: 1024 dev.ql.1.snd_pkt_thres: 16 dev.ql.1.rcv_pkt_thres_d: 32 dev.ql.1.rcv_pkt_thres: 128 dev.ql.1.jumbo_replenish: 2 dev.ql.1.std_replenish: 8 dev.ql.1.debug: 0 dev.ql.1.fw_version: 4.20.1.1429931003 dev.ql.1.stats: 0
-
Mmm, so likely those are the default values. There should be a description of each tunable if you run:
sysctl -d dev.ql.0
-
@stephenw10 said in hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz:
sysctl -d dev.ql.0
well that's a helpful command! thanks
still not seeing anything about queues though :/
dev.ql.0: dev.ql.0.wake: Device set to wake the system dev.ql.0.num_sds_rings: Number of Status Descriptor Rings dev.ql.0.num_rds_rings: Number of Rcv Descriptor Rings dev.ql.0.free_pkt_thres: Threshold for # of packets to free at a time dev.ql.0.snd_pkt_thres: Threshold for # of snd packets dev.ql.0.rcv_pkt_thres_d: Threshold for # of rcv pkts to trigger indication defered dev.ql.0.rcv_pkt_thres: Threshold for # of rcv pkts to trigger indication isr dev.ql.0.jumbo_replenish: Threshold for Replenishing Jumbo Frames dev.ql.0.std_replenish: Threshold for Replenishing Standard Frames dev.ql.0.debug: Debug Level dev.ql.0.fw_version: firmware version dev.ql.0.stats: Statistics dev.ql.0.%parent: parent device dev.ql.0.%pnpinfo: device identification dev.ql.0.%location: device location relative to parent dev.ql.0.%driver: device driver name dev.ql.0.%desc: device description
Also, interesting, when I do an iperf3 with -R here's the top output on the pfSense box...
CPU 0: 0.0% user, 0.0% nice, 1.9% system, 86.5% interrupt, 11.6% idle CPU 1: 0.4% user, 0.0% nice, 8.1% system, 2.7% interrupt, 88.8% idle CPU 2: 0.0% user, 0.0% nice, 5.8% system, 54.8% interrupt, 39.4% idle CPU 3: 0.4% user, 0.0% nice, 8.5% system, 6.6% interrupt, 84.6% idle CPU 4: 0.8% user, 0.0% nice, 25.9% system, 6.2% interrupt, 67.2% idle CPU 5: 0.4% user, 0.0% nice, 21.6% system, 7.3% interrupt, 70.7% idle CPU 6: 0.4% user, 0.0% nice, 0.8% system, 51.4% interrupt, 47.5% idle CPU 7: 0.4% user, 0.0% nice, 5.4% system, 8.9% interrupt, 85.3% idle Mem: 305M Active, 201M Inact, 872M Wired, 14G Free ARC: 404M Total, 63M MFU, 336M MRU, 32K Anon, 1218K Header, 3853K Other 122M Compressed, 283M Uncompressed, 2.31:1 Ratio Swap: 1024M Total, 1024M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -92 - 0B 528K CPU0 0 0:32 90.84% [intr{irq265: ql0}] 11 root 155 ki31 0B 128K CPU1 1 45:28 87.99% [idle{idle: cpu1}] 11 root 155 ki31 0B 128K CPU7 7 45:32 86.71% [idle{idle: cpu7}] 11 root 155 ki31 0B 128K CPU3 3 45:28 83.73% [idle{idle: cpu3}] 11 root 155 ki31 0B 128K RUN 5 45:17 71.47% [idle{idle: cpu5}] 11 root 155 ki31 0B 128K CPU4 4 45:15 70.76% [idle{idle: cpu4}] 12 root -92 - 0B 528K CPU2 2 0:24 60.84% [intr{irq266: ql0}] 0 root -92 - 0B 1376K - 6 0:31 54.00% [kernel{ql1 txq}] 11 root 155 ki31 0B 128K RUN 6 44:26 48.31% [idle{idle: cpu6}] 12 root -92 - 0B 528K WAIT 6 0:49 43.46% [intr{irq268: ql1}] 11 root 155 ki31 0B 128K RUN 2 44:57 37.84% [idle{idle: cpu2}] 12 root -92 - 0B 528K WAIT 6 0:49 23.37% [intr{irq264: ql0}] 11 root 155 ki31 0B 128K RUN 0 45:02 12.88% [idle{idle: cpu0}] 0 root -92 - 0B 1376K - 7 0:07 12.82% [kernel{ql0 rcvq}] 0 root -92 - 0B 1376K - 1 0:12 5.35% [kernel{ql1 rcvq}] 99487 avahi 20 0 13M 4500K select 5 0:06 2.82% avahi-daemon: running [washington.local] 12 root -92 - 0B 528K WAIT 4 0:20 2.34% [intr{irq267: ql0}] 0 root -92 - 0B 1376K - 1 0:11 1.83% [kernel{ql0 txq}]
-
Well at least 4 IRQs for ql0 there. Does
vmstat -i
show those?Nothing about queues there I agree. That's the sort of setting that would usually be a loader value though. Those are usually shown in hw.ql but only values that are set are shown.
-
Intel X520-da2 update
tl;dr better performance for sure, but still not 10Gbps. 8 CPU cores, each NIC using 4 queues.
I'm increasingly of the opinion that even with a beefy CPU pfSense just doesnt like doing 10Gbps
iperf3
iperf3 -c ISP's server -P 10[SUM] 0.00-10.00 sec 5.64 GBytes 4.84 Gbits/sec 1455 sender [SUM] 0.00-10.03 sec 5.63 GBytes 4.82 Gbits/sec receiver
iperf3 -c ISP's server -P 10 -R
[SUM] 0.00-10.03 sec 5.18 GBytes 4.43 Gbits/sec 4033 sender [SUM] 0.00-10.00 sec 5.14 GBytes 4.42 Gbits/sec receiver
iperf3 -c local server on other vLAN -P 18
[SUM] 0.00-10.00 sec 5.40 GBytes 4.64 Gbits/sec 11944 sender [SUM] 0.00-10.01 sec 5.39 GBytes 4.62 Gbits/sec receiver
top
last pid: 52809; load averages: 0.71, 0.41, 0.36 up 0+00:28:16 16:09:59 742 threads: 10 running, 699 sleeping, 33 waiting CPU 0: 0.0% user, 0.0% nice, 31.1% system, 0.0% interrupt, 68.9% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 2: 0.4% user, 0.0% nice, 3.9% system, 0.0% interrupt, 95.7% idle CPU 3: 0.4% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.2% idle CPU 4: 0.0% user, 0.0% nice, 75.2% system, 0.0% interrupt, 24.8% idle CPU 5: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle CPU 6: 0.4% user, 0.0% nice, 75.6% system, 0.0% interrupt, 24.0% idle CPU 7: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle Mem: 297M Active, 178M Inact, 920M Wired, 14G Free ARC: 384M Total, 53M MFU, 326M MRU, 32K Anon, 1086K Header, 3216K Other 118M Compressed, 268M Uncompressed, 2.27:1 Ratio Swap: 1024M Total, 1024M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0B 128K CPU7 7 27:56 99.89% [idle{idle: cpu7}] 11 root 155 ki31 0B 128K CPU5 5 27:55 99.73% [idle{idle: cpu5}] 11 root 155 ki31 0B 128K CPU3 3 27:56 99.16% [idle{idle: cpu3}] 11 root 155 ki31 0B 128K CPU1 1 27:55 99.14% [idle{idle: cpu1}] 11 root 155 ki31 0B 128K RUN 2 26:31 95.65% [idle{idle: cpu2}] 0 root -76 - 0B 1376K - 6 1:13 78.85% [kernel{if_io_tqg_6}] 0 root -76 - 0B 1376K CPU4 4 1:16 72.63% [kernel{if_io_tqg_4}] 11 root 155 ki31 0B 128K RUN 0 26:34 69.36% [idle{idle: cpu0}] 0 root -76 - 0B 1376K - 0 1:27 30.30% [kernel{if_io_tqg_0}] 11 root 155 ki31 0B 128K RUN 4 26:34 27.30% [idle{idle: cpu4}] 11 root 155 ki31 0B 128K CPU6 6 26:57 21.08% [idle{idle: cpu6}] 0 root -76 - 0B 1376K - 2 1:40 3.32% [kernel{if_io_tqg_2}] 67535 unbound 20 0 107M 54M kqread 4 0:02 0.40% /usr/local/sbin/unbound -c /var/unbox
dmesg
root: dmesg | grep queues ix0: Using 4 RX queues 4 TX queues ix0: allocated for 4 queues ix0: allocated for 4 rx queues ix0: netmap queues/slots: TX 4/2048, RX 4/2048 ix1: Using 4 RX queues 4 TX queues ix1: allocated for 4 queues ix1: allocated for 4 rx queues ix1: netmap queues/slots: TX 4/2048, RX 4/2048
-
@spacebass
Pretty exactly the same on my end. I will now try to get in touch with our ISP again to make sure it's not their core router being the culprit here!https://www.reddit.com/r/PFSENSE/comments/137iv07/comment/jj6oqw4/?utm_source=share&utm_medium=web2x&context=3
-
Are you guys using SATA on your hardware??
Remember there is a 6gbit/s limit to that when writing to the disk sybsystem.
And I bet that is what you see.
IN short... your NIC is pushing the limits of the disk subsystem.
-
It is, but pFsense should not write data to disk while transferring?!
Or better, not the data it is routing through! -
@ogghi But youre downloading a file to test IPERF. Guess where that is written?
-
@cool_corona if that were the case, hosts on the same network would also be bottlenecked.