hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz
-
Mmm, so likely those are the default values. There should be a description of each tunable if you run:
sysctl -d dev.ql.0
-
@stephenw10 said in hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz:
sysctl -d dev.ql.0
well that's a helpful command! thanks
still not seeing anything about queues though :/
dev.ql.0: dev.ql.0.wake: Device set to wake the system dev.ql.0.num_sds_rings: Number of Status Descriptor Rings dev.ql.0.num_rds_rings: Number of Rcv Descriptor Rings dev.ql.0.free_pkt_thres: Threshold for # of packets to free at a time dev.ql.0.snd_pkt_thres: Threshold for # of snd packets dev.ql.0.rcv_pkt_thres_d: Threshold for # of rcv pkts to trigger indication defered dev.ql.0.rcv_pkt_thres: Threshold for # of rcv pkts to trigger indication isr dev.ql.0.jumbo_replenish: Threshold for Replenishing Jumbo Frames dev.ql.0.std_replenish: Threshold for Replenishing Standard Frames dev.ql.0.debug: Debug Level dev.ql.0.fw_version: firmware version dev.ql.0.stats: Statistics dev.ql.0.%parent: parent device dev.ql.0.%pnpinfo: device identification dev.ql.0.%location: device location relative to parent dev.ql.0.%driver: device driver name dev.ql.0.%desc: device description
Also, interesting, when I do an iperf3 with -R here's the top output on the pfSense box...
CPU 0: 0.0% user, 0.0% nice, 1.9% system, 86.5% interrupt, 11.6% idle CPU 1: 0.4% user, 0.0% nice, 8.1% system, 2.7% interrupt, 88.8% idle CPU 2: 0.0% user, 0.0% nice, 5.8% system, 54.8% interrupt, 39.4% idle CPU 3: 0.4% user, 0.0% nice, 8.5% system, 6.6% interrupt, 84.6% idle CPU 4: 0.8% user, 0.0% nice, 25.9% system, 6.2% interrupt, 67.2% idle CPU 5: 0.4% user, 0.0% nice, 21.6% system, 7.3% interrupt, 70.7% idle CPU 6: 0.4% user, 0.0% nice, 0.8% system, 51.4% interrupt, 47.5% idle CPU 7: 0.4% user, 0.0% nice, 5.4% system, 8.9% interrupt, 85.3% idle Mem: 305M Active, 201M Inact, 872M Wired, 14G Free ARC: 404M Total, 63M MFU, 336M MRU, 32K Anon, 1218K Header, 3853K Other 122M Compressed, 283M Uncompressed, 2.31:1 Ratio Swap: 1024M Total, 1024M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -92 - 0B 528K CPU0 0 0:32 90.84% [intr{irq265: ql0}] 11 root 155 ki31 0B 128K CPU1 1 45:28 87.99% [idle{idle: cpu1}] 11 root 155 ki31 0B 128K CPU7 7 45:32 86.71% [idle{idle: cpu7}] 11 root 155 ki31 0B 128K CPU3 3 45:28 83.73% [idle{idle: cpu3}] 11 root 155 ki31 0B 128K RUN 5 45:17 71.47% [idle{idle: cpu5}] 11 root 155 ki31 0B 128K CPU4 4 45:15 70.76% [idle{idle: cpu4}] 12 root -92 - 0B 528K CPU2 2 0:24 60.84% [intr{irq266: ql0}] 0 root -92 - 0B 1376K - 6 0:31 54.00% [kernel{ql1 txq}] 11 root 155 ki31 0B 128K RUN 6 44:26 48.31% [idle{idle: cpu6}] 12 root -92 - 0B 528K WAIT 6 0:49 43.46% [intr{irq268: ql1}] 11 root 155 ki31 0B 128K RUN 2 44:57 37.84% [idle{idle: cpu2}] 12 root -92 - 0B 528K WAIT 6 0:49 23.37% [intr{irq264: ql0}] 11 root 155 ki31 0B 128K RUN 0 45:02 12.88% [idle{idle: cpu0}] 0 root -92 - 0B 1376K - 7 0:07 12.82% [kernel{ql0 rcvq}] 0 root -92 - 0B 1376K - 1 0:12 5.35% [kernel{ql1 rcvq}] 99487 avahi 20 0 13M 4500K select 5 0:06 2.82% avahi-daemon: running [washington.local] 12 root -92 - 0B 528K WAIT 4 0:20 2.34% [intr{irq267: ql0}] 0 root -92 - 0B 1376K - 1 0:11 1.83% [kernel{ql0 txq}]
-
Well at least 4 IRQs for ql0 there. Does
vmstat -i
show those?Nothing about queues there I agree. That's the sort of setting that would usually be a loader value though. Those are usually shown in hw.ql but only values that are set are shown.
-
Intel X520-da2 update
tl;dr better performance for sure, but still not 10Gbps. 8 CPU cores, each NIC using 4 queues.
I'm increasingly of the opinion that even with a beefy CPU pfSense just doesnt like doing 10Gbps
iperf3
iperf3 -c ISP's server -P 10[SUM] 0.00-10.00 sec 5.64 GBytes 4.84 Gbits/sec 1455 sender [SUM] 0.00-10.03 sec 5.63 GBytes 4.82 Gbits/sec receiver
iperf3 -c ISP's server -P 10 -R
[SUM] 0.00-10.03 sec 5.18 GBytes 4.43 Gbits/sec 4033 sender [SUM] 0.00-10.00 sec 5.14 GBytes 4.42 Gbits/sec receiver
iperf3 -c local server on other vLAN -P 18
[SUM] 0.00-10.00 sec 5.40 GBytes 4.64 Gbits/sec 11944 sender [SUM] 0.00-10.01 sec 5.39 GBytes 4.62 Gbits/sec receiver
top
last pid: 52809; load averages: 0.71, 0.41, 0.36 up 0+00:28:16 16:09:59 742 threads: 10 running, 699 sleeping, 33 waiting CPU 0: 0.0% user, 0.0% nice, 31.1% system, 0.0% interrupt, 68.9% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 2: 0.4% user, 0.0% nice, 3.9% system, 0.0% interrupt, 95.7% idle CPU 3: 0.4% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.2% idle CPU 4: 0.0% user, 0.0% nice, 75.2% system, 0.0% interrupt, 24.8% idle CPU 5: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle CPU 6: 0.4% user, 0.0% nice, 75.6% system, 0.0% interrupt, 24.0% idle CPU 7: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle Mem: 297M Active, 178M Inact, 920M Wired, 14G Free ARC: 384M Total, 53M MFU, 326M MRU, 32K Anon, 1086K Header, 3216K Other 118M Compressed, 268M Uncompressed, 2.27:1 Ratio Swap: 1024M Total, 1024M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0B 128K CPU7 7 27:56 99.89% [idle{idle: cpu7}] 11 root 155 ki31 0B 128K CPU5 5 27:55 99.73% [idle{idle: cpu5}] 11 root 155 ki31 0B 128K CPU3 3 27:56 99.16% [idle{idle: cpu3}] 11 root 155 ki31 0B 128K CPU1 1 27:55 99.14% [idle{idle: cpu1}] 11 root 155 ki31 0B 128K RUN 2 26:31 95.65% [idle{idle: cpu2}] 0 root -76 - 0B 1376K - 6 1:13 78.85% [kernel{if_io_tqg_6}] 0 root -76 - 0B 1376K CPU4 4 1:16 72.63% [kernel{if_io_tqg_4}] 11 root 155 ki31 0B 128K RUN 0 26:34 69.36% [idle{idle: cpu0}] 0 root -76 - 0B 1376K - 0 1:27 30.30% [kernel{if_io_tqg_0}] 11 root 155 ki31 0B 128K RUN 4 26:34 27.30% [idle{idle: cpu4}] 11 root 155 ki31 0B 128K CPU6 6 26:57 21.08% [idle{idle: cpu6}] 0 root -76 - 0B 1376K - 2 1:40 3.32% [kernel{if_io_tqg_2}] 67535 unbound 20 0 107M 54M kqread 4 0:02 0.40% /usr/local/sbin/unbound -c /var/unbox
dmesg
root: dmesg | grep queues ix0: Using 4 RX queues 4 TX queues ix0: allocated for 4 queues ix0: allocated for 4 rx queues ix0: netmap queues/slots: TX 4/2048, RX 4/2048 ix1: Using 4 RX queues 4 TX queues ix1: allocated for 4 queues ix1: allocated for 4 rx queues ix1: netmap queues/slots: TX 4/2048, RX 4/2048
-
@spacebass
Pretty exactly the same on my end. I will now try to get in touch with our ISP again to make sure it's not their core router being the culprit here!https://www.reddit.com/r/PFSENSE/comments/137iv07/comment/jj6oqw4/?utm_source=share&utm_medium=web2x&context=3
-
Are you guys using SATA on your hardware??
Remember there is a 6gbit/s limit to that when writing to the disk sybsystem.
And I bet that is what you see.
IN short... your NIC is pushing the limits of the disk subsystem.
-
It is, but pFsense should not write data to disk while transferring?!
Or better, not the data it is routing through! -
@ogghi But youre downloading a file to test IPERF. Guess where that is written?
-
@cool_corona if that were the case, hosts on the same network would also be bottlenecked.
-
@cool_corona Don’t run speed tests on pfSense if at all possible, use a host behind it. Then it (also) isn’t using CPU cycles on the test.
-
@spacebass Why?? doesnt pass through pfsense?
-
@cool_corona It's a single NIC route. If you want to test throughput you should test the THROUGH part of it
-
Hardware
pfSense box
Dell R230 Xeon E3-1270 v5 @ 3.6GHz 16GB 2x Samsung 850 SSD in ZFS redundant pool HP NC523SPF NIC in PCIe port 2 (which I believe is full 16 lanes)
switches & cables & optics
unifi aggregation 10G switches Intel 850mm SFP+ optics mm patch cables (same ones used to get faster results with 6100)
Testing
iperf3:
iperf3 -c server.fqdn.foo.bar -P 10
iperf3 -c server.fqdn.foo.bar -P 10 -R
iperf3 -c server.fqdn.foo.bar -P 10 -6As I see it, when tested on LAN the traffic never reaches pfsense.
Its only throughput on pfsense thats the issue and could be disk subsystem related on the pfsense hardware if offloading is disabled.
-
Hmm, I wouldn't expect anything to be written to disk there unless something is misconfigured, somehow using swap maybe. You should see that in iostat if it was though.
It's clearly not a CPU limit with those numbers. No core is close to 100%. -
@spacebass exatly that's what I am doing, using a host behind!
-
@ogghi for what it’s worth, I determined those HP NICs just aren’t great in FreeBSD. It’s unclear how many queues they use and the driver doesn’t seem to support any kind of manual or dynamic assignment.
I moved to an intel NIC in that same box and am now getting closer to 8Gbps.
-
@spacebass To be sure our provider actually will change their core router in our office, let's see. Maybe it's not our pFsense' issue after all xD
-
“Make sure you have multiple queues in attached for each NIC.”
And how do we do that?
-
@jimbob-indiana said in hoping for 10Gbps, getting sub 1Gbps speed Xeon E3-1270 v5 3.6GHz:
And how do we do that?
what I think I've learned is it both NIC and driver dependent. For instance, now that I've moved to an Intel NIC, at boot (via dmesg) I can see that the system automatically assigns tx and rx queues based on the number of CPU cores I have.
On the HP NC523SFP NIC's driver, there does not seem to be any way to set or have the system manually assign queues.
-
Most NICs will enable multiple queues by default if it's possible. You will usually see 1Rx and 1Tx queue per CPU core up to a limit defined by the NIC chip. Usually 4 or 8.
However some NICs will use 1 queue by default, notably vmx, and most can be configured to use just one or might be detecting something incorrectly. So if you see poor throughput you should check so see how many queues are in use. Most drivers report it in the boot log.ix0: <Intel(R) X553 N (SFP+)> mem 0x80400000-0x805fffff,0x80604000-0x80607fff at device 0.0 on pci9 ix0: Using 2048 TX descriptors and 2048 RX descriptors ix0: Using 8 RX queues 8 TX queues ix0: Using MSI-X interrupts with 9 vectors ix0: allocated for 8 queues ix0: allocated for 8 rx queues ix0: Ethernet address: 00:08:a2:12:e2:ca ix0: eTrack 0x8000084b PHY FW V65535 ix0: netmap queues/slots: TX 8/2048, RX 8/2048