hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz
-
@keyser the good news is that I have Intel cards on order...
That said, I have a ton of these HP cards and have no problem getting 10Gbps on Linux-based boxes...it could be drivers, but the qlxgb in FreeBSD is pretty tried and true.
-
Ah I wasn't sure if
ql1
there was that. Ok then at the very least I'd start by disabling all the hardware off-loading options. But if you know exactly which driver it is you can start looking for known bugs/workarounds.But, yeah, if you can use an Intel NIC, you should.
-
@stephenw10 thanks - if I want to add the full (supposed) 8 queues, how would I go about it?
-
How many queues are you getting?
It's probably a sysctl or loader tunable for that driver. Without having one to test it's difficult to say.
Trysysctl hw.ql
or maybesysctl hw.qlxgb
and see what values exist.Also check
sysctl dev.ql.0
-
@stephenw10 said in hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz:
sysctl dev.ql.0
that's the key ... no mq options there though and I dont see any listed in the readme.txt in the driver's source.
I'll wait for the Intel NIC and see what I can get out of that
-
How many queues is it using by default? If it's just 1 that would explain the single threaded performance.
-
@stephenw10 I'll fire the box up and check ... just curious, what am I looking for in the output of sysctl? I didn't see anything with 'mq' or 'queues' in the output when I first checked.
perhaps related - what does it tell us that I get closer to 4Gbps with the -R flag on iperf3 (eg inbound) vs 2Gbps without the flag?
I'm not overly determined to make this Qlogic card work, but this is a really good learning opportunity and I'm enjoying the process.
-
You might have more Rx queues than Tx queues for example. Most drivers show that in the boot logs but I'm not sure qlxb does.
vmstat -i
may also show it. -
@stephenw10
Thanks for the continued help...Here's what I see
dev.ql.0.wake: 0 dev.ql.0.num_sds_rings: 4 dev.ql.0.num_rds_rings: 2 dev.ql.0.free_pkt_thres: 1024 dev.ql.0.snd_pkt_thres: 16 dev.ql.0.rcv_pkt_thres_d: 32 dev.ql.0.rcv_pkt_thres: 128 dev.ql.0.jumbo_replenish: 2 dev.ql.0.std_replenish: 8 dev.ql.0.debug: 0 dev.ql.0.fw_version: 4.16.50.1401759177 dev.ql.0.stats: 0 dev.ql.0.%parent: pci2 dev.ql.0.%pnpinfo: vendor=0x1077 device=0x8020 subvendor=0x103c subdevice=0x3733 class=0x020000 dev.ql.0.%location: slot=0 function=0 dbsf=pci0:2:0:0 handle=\_SB_.PCI0.PEG1.PEGP dev.ql.0.%driver: ql dev.ql.0.%desc: Qlogic ISP 80xx PCI CNA Adapter-Ethernet Function v1.1.36
I'm having trouble finding much documentation online for this driver... would snd_pkt_thres be the number of threads it is able to or currently using for the outbound queues?
Here's the output from a TrueNAS box with the same card which has no trouble moving 10Gbps traffic:
dev.ql.0.%desc: Qlogic ISP 80xx PCI CNA Adapter-Ethernet Function v1.1.36 root@matterhorn[~]# sysctl sysctl dev.ql.1 dev.ql.1.num_sds_rings: 4 dev.ql.1.num_rds_rings: 2 dev.ql.1.free_pkt_thres: 1024 dev.ql.1.snd_pkt_thres: 16 dev.ql.1.rcv_pkt_thres_d: 32 dev.ql.1.rcv_pkt_thres: 128 dev.ql.1.jumbo_replenish: 2 dev.ql.1.std_replenish: 8 dev.ql.1.debug: 0 dev.ql.1.fw_version: 4.20.1.1429931003 dev.ql.1.stats: 0
-
Mmm, so likely those are the default values. There should be a description of each tunable if you run:
sysctl -d dev.ql.0
-
@stephenw10 said in hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz:
sysctl -d dev.ql.0
well that's a helpful command! thanks
still not seeing anything about queues though :/
dev.ql.0: dev.ql.0.wake: Device set to wake the system dev.ql.0.num_sds_rings: Number of Status Descriptor Rings dev.ql.0.num_rds_rings: Number of Rcv Descriptor Rings dev.ql.0.free_pkt_thres: Threshold for # of packets to free at a time dev.ql.0.snd_pkt_thres: Threshold for # of snd packets dev.ql.0.rcv_pkt_thres_d: Threshold for # of rcv pkts to trigger indication defered dev.ql.0.rcv_pkt_thres: Threshold for # of rcv pkts to trigger indication isr dev.ql.0.jumbo_replenish: Threshold for Replenishing Jumbo Frames dev.ql.0.std_replenish: Threshold for Replenishing Standard Frames dev.ql.0.debug: Debug Level dev.ql.0.fw_version: firmware version dev.ql.0.stats: Statistics dev.ql.0.%parent: parent device dev.ql.0.%pnpinfo: device identification dev.ql.0.%location: device location relative to parent dev.ql.0.%driver: device driver name dev.ql.0.%desc: device description
Also, interesting, when I do an iperf3 with -R here's the top output on the pfSense box...
CPU 0: 0.0% user, 0.0% nice, 1.9% system, 86.5% interrupt, 11.6% idle CPU 1: 0.4% user, 0.0% nice, 8.1% system, 2.7% interrupt, 88.8% idle CPU 2: 0.0% user, 0.0% nice, 5.8% system, 54.8% interrupt, 39.4% idle CPU 3: 0.4% user, 0.0% nice, 8.5% system, 6.6% interrupt, 84.6% idle CPU 4: 0.8% user, 0.0% nice, 25.9% system, 6.2% interrupt, 67.2% idle CPU 5: 0.4% user, 0.0% nice, 21.6% system, 7.3% interrupt, 70.7% idle CPU 6: 0.4% user, 0.0% nice, 0.8% system, 51.4% interrupt, 47.5% idle CPU 7: 0.4% user, 0.0% nice, 5.4% system, 8.9% interrupt, 85.3% idle Mem: 305M Active, 201M Inact, 872M Wired, 14G Free ARC: 404M Total, 63M MFU, 336M MRU, 32K Anon, 1218K Header, 3853K Other 122M Compressed, 283M Uncompressed, 2.31:1 Ratio Swap: 1024M Total, 1024M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -92 - 0B 528K CPU0 0 0:32 90.84% [intr{irq265: ql0}] 11 root 155 ki31 0B 128K CPU1 1 45:28 87.99% [idle{idle: cpu1}] 11 root 155 ki31 0B 128K CPU7 7 45:32 86.71% [idle{idle: cpu7}] 11 root 155 ki31 0B 128K CPU3 3 45:28 83.73% [idle{idle: cpu3}] 11 root 155 ki31 0B 128K RUN 5 45:17 71.47% [idle{idle: cpu5}] 11 root 155 ki31 0B 128K CPU4 4 45:15 70.76% [idle{idle: cpu4}] 12 root -92 - 0B 528K CPU2 2 0:24 60.84% [intr{irq266: ql0}] 0 root -92 - 0B 1376K - 6 0:31 54.00% [kernel{ql1 txq}] 11 root 155 ki31 0B 128K RUN 6 44:26 48.31% [idle{idle: cpu6}] 12 root -92 - 0B 528K WAIT 6 0:49 43.46% [intr{irq268: ql1}] 11 root 155 ki31 0B 128K RUN 2 44:57 37.84% [idle{idle: cpu2}] 12 root -92 - 0B 528K WAIT 6 0:49 23.37% [intr{irq264: ql0}] 11 root 155 ki31 0B 128K RUN 0 45:02 12.88% [idle{idle: cpu0}] 0 root -92 - 0B 1376K - 7 0:07 12.82% [kernel{ql0 rcvq}] 0 root -92 - 0B 1376K - 1 0:12 5.35% [kernel{ql1 rcvq}] 99487 avahi 20 0 13M 4500K select 5 0:06 2.82% avahi-daemon: running [washington.local] 12 root -92 - 0B 528K WAIT 4 0:20 2.34% [intr{irq267: ql0}] 0 root -92 - 0B 1376K - 1 0:11 1.83% [kernel{ql0 txq}]
-
Well at least 4 IRQs for ql0 there. Does
vmstat -i
show those?Nothing about queues there I agree. That's the sort of setting that would usually be a loader value though. Those are usually shown in hw.ql but only values that are set are shown.
-
Intel X520-da2 update
tl;dr better performance for sure, but still not 10Gbps. 8 CPU cores, each NIC using 4 queues.
I'm increasingly of the opinion that even with a beefy CPU pfSense just doesnt like doing 10Gbps
iperf3
iperf3 -c ISP's server -P 10[SUM] 0.00-10.00 sec 5.64 GBytes 4.84 Gbits/sec 1455 sender [SUM] 0.00-10.03 sec 5.63 GBytes 4.82 Gbits/sec receiver
iperf3 -c ISP's server -P 10 -R
[SUM] 0.00-10.03 sec 5.18 GBytes 4.43 Gbits/sec 4033 sender [SUM] 0.00-10.00 sec 5.14 GBytes 4.42 Gbits/sec receiver
iperf3 -c local server on other vLAN -P 18
[SUM] 0.00-10.00 sec 5.40 GBytes 4.64 Gbits/sec 11944 sender [SUM] 0.00-10.01 sec 5.39 GBytes 4.62 Gbits/sec receiver
top
last pid: 52809; load averages: 0.71, 0.41, 0.36 up 0+00:28:16 16:09:59 742 threads: 10 running, 699 sleeping, 33 waiting CPU 0: 0.0% user, 0.0% nice, 31.1% system, 0.0% interrupt, 68.9% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 2: 0.4% user, 0.0% nice, 3.9% system, 0.0% interrupt, 95.7% idle CPU 3: 0.4% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.2% idle CPU 4: 0.0% user, 0.0% nice, 75.2% system, 0.0% interrupt, 24.8% idle CPU 5: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle CPU 6: 0.4% user, 0.0% nice, 75.6% system, 0.0% interrupt, 24.0% idle CPU 7: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle Mem: 297M Active, 178M Inact, 920M Wired, 14G Free ARC: 384M Total, 53M MFU, 326M MRU, 32K Anon, 1086K Header, 3216K Other 118M Compressed, 268M Uncompressed, 2.27:1 Ratio Swap: 1024M Total, 1024M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0B 128K CPU7 7 27:56 99.89% [idle{idle: cpu7}] 11 root 155 ki31 0B 128K CPU5 5 27:55 99.73% [idle{idle: cpu5}] 11 root 155 ki31 0B 128K CPU3 3 27:56 99.16% [idle{idle: cpu3}] 11 root 155 ki31 0B 128K CPU1 1 27:55 99.14% [idle{idle: cpu1}] 11 root 155 ki31 0B 128K RUN 2 26:31 95.65% [idle{idle: cpu2}] 0 root -76 - 0B 1376K - 6 1:13 78.85% [kernel{if_io_tqg_6}] 0 root -76 - 0B 1376K CPU4 4 1:16 72.63% [kernel{if_io_tqg_4}] 11 root 155 ki31 0B 128K RUN 0 26:34 69.36% [idle{idle: cpu0}] 0 root -76 - 0B 1376K - 0 1:27 30.30% [kernel{if_io_tqg_0}] 11 root 155 ki31 0B 128K RUN 4 26:34 27.30% [idle{idle: cpu4}] 11 root 155 ki31 0B 128K CPU6 6 26:57 21.08% [idle{idle: cpu6}] 0 root -76 - 0B 1376K - 2 1:40 3.32% [kernel{if_io_tqg_2}] 67535 unbound 20 0 107M 54M kqread 4 0:02 0.40% /usr/local/sbin/unbound -c /var/unbox
dmesg
root: dmesg | grep queues ix0: Using 4 RX queues 4 TX queues ix0: allocated for 4 queues ix0: allocated for 4 rx queues ix0: netmap queues/slots: TX 4/2048, RX 4/2048 ix1: Using 4 RX queues 4 TX queues ix1: allocated for 4 queues ix1: allocated for 4 rx queues ix1: netmap queues/slots: TX 4/2048, RX 4/2048
-
@spacebass
Pretty exactly the same on my end. I will now try to get in touch with our ISP again to make sure it's not their core router being the culprit here!https://www.reddit.com/r/PFSENSE/comments/137iv07/comment/jj6oqw4/?utm_source=share&utm_medium=web2x&context=3
-
Are you guys using SATA on your hardware??
Remember there is a 6gbit/s limit to that when writing to the disk sybsystem.
And I bet that is what you see.
IN short... your NIC is pushing the limits of the disk subsystem.
-
It is, but pFsense should not write data to disk while transferring?!
Or better, not the data it is routing through! -
@ogghi But youre downloading a file to test IPERF. Guess where that is written?
-
@cool_corona if that were the case, hosts on the same network would also be bottlenecked.
-
@cool_corona Don’t run speed tests on pfSense if at all possible, use a host behind it. Then it (also) isn’t using CPU cycles on the test.
-
@spacebass Why?? doesnt pass through pfsense?