10Gbe Tuning?
-
You must be hitting some limit. Are the NICs connecting at 10Gbps? Are you seeing errors on the interface? What does your CPU usage look like? Large interrupt load?
Steve
-
@irj972:
Mine throughput completely sucks right now….Im seeing 600mbps (you read it right, not even 1gig) when testing iperf from my desktop to my pfSense router. Ive applied the calomel tricks and tips re buffers etc and still seeing sucky perf so I need to do some debugging for sure. Im dreaming of the lefty heights of a 2gig connection right now!
BTW, this guy nails 9.x gbps > https://forum.pfsense.org/index.php?topic=77144.msg435304#msg435304
FYI Im using a1srm 2758f board with intel x520 SFP+ optical cables etc. I'm still limited to 600mbps on a gigabit ethernet cat6 wire to my quad i350 too.
PFSense 2.2 will have better multi-core multi-stream performance. Your Atom CPU has poor single thread performance, even thought it should have decent aggregate throughput.
I'm getting 980mb, ~1.5gb with bi-directional test, with Iperf through PFSense NAT. All with 7.7% cpu load and no tweaking. The performance is entirely limited by my 2 testing computer's integrated NICs.
-
It still has almost double the single thread rating of, say, a D525 which can itself manage close to 600Mbps throughput. :-
This test used the pfSense box as the end point though so they are not comparable.Steve
-
It still has almost double the single thread rating of, say, a D525 which can itself manage close to 600Mbps throughput. :-
This test used the pfSense box as the end point though so they are not comparable.Steve
Steve, did you get anywhere with this?
I also just ran some iperf test, I have Atom D2550, and it's also maxing out at ~450-500 mbps when I do UDP from my pfsense box. I see the CPU staying right at 25-27% load during tests. I'm thinking that this is getting limited by the single thread of iperf on Atom.
Interestingly enough. I got a Lenovo T440 laptop with Win7, when I also run the UDP test from that (Intel NIC) it's also maxing out at 450-500 mbps.
I'm not sure what to make of that. Maybe an issue with 2.0.x iperf?
-Dmitri
-
Run 'top -SH' at the console to see how the usage breaks down across the cores.
How are the NICs connected? If they're PCI you might hit a bottleneck there.
Try running a test through pfSense instead of using it as an end-point.
The previous user who got greater than 600Mbps through his atom had to make some tweaks. I forget the details but I think he disabled some PCI power saving options in the bios.
You could try enabling ip fast-forwarding if your not using ipsec.Steve
-
Run 'top -SH' at the console to see how the usage breaks down across the cores.
How are the NICs connected? If they're PCI you might hit a bottleneck there.
Try running a test through pfSense instead of using it as an end-point.
The previous user who got greater than 600Mbps through his atom had to make some tweaks. I forget the details but I think he disabled some PCI power saving options in the bios.
You could try enabling ip fast-forwarding if your not using ipsec.Steve
I have embedded Broadcom NICs, not PCI.
Unfortunately I don't have enough (powerful enough) equipment to handle 1 Gbps simulation through the pfsense. I got a Lenovo T440 with an i5, but like I said in my previous thread, the I can't get 1 Gbps saturation via iperf on it either (it should be able to, maybe it's a Win7 issue or something.) I also got a NAS, but it's a very slow processor. I got macbook air as well, but without a gigabit adapter (wifi only).
So, using what I got. Pfsense –> Lenovo. TCP Window size of 128Kb:
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 37.6 MBytes 316 Mbits/sec
[ 3] 1.0- 2.0 sec 39.1 MBytes 328 Mbits/sec
[ 3] 2.0- 3.0 sec 38.4 MBytes 322 Mbits/sec
[ 3] 3.0- 4.0 sec 37.8 MBytes 317 Mbits/sec
[ 3] 4.0- 5.0 sec 37.1 MBytes 311 Mbits/sec
[ 3] 5.0- 6.0 sec 36.9 MBytes 309 Mbits/sec
[ 3] 6.0- 7.0 sec 37.1 MBytes 311 Mbits/sec
[ 3] 7.0- 8.0 sec 37.0 MBytes 310 Mbits/sec
[ 3] 8.0- 9.0 sec 40.0 MBytes 336 Mbits/sec
[ 3] 9.0-10.0 sec 37.9 MBytes 318 Mbits/sec
[ 3] 0.0-10.0 sec 379 MBytes 318 Mbits/secI was running top -SH in another session:
last pid: 65943; load averages: 0.18, 0.04, 0.01 up 2+03:16:25 20:26:55
169 processes: 10 running, 139 sleeping, 3 stopped, 17 waiting
CPU: 0.0% user, 0.0% nice, 23.7% system, 24.9% interrupt, 51.3% idle
Mem: 834M Active, 1198M Inact, 699M Wired, 296K Cache, 416M Buf, 1180M Free
Swap: 8192M Total, 8192M FreePID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 171 ki31 0K 64K CPU2 2 49.9H 91.16% idle{idle: cpu2}
11 root 171 ki31 0K 64K RUN 3 50.3H 87.50% idle{idle: cpu3}
11 root 171 ki31 0K 64K RUN 1 50.2H 83.25% idle{idle: cpu1}
12 root -68 - 0K 336K CPU0 0 10:10 60.89% intr{irq18: bge1
65943 root 76 0 13556K 2628K CPU1 1 0:08 54.88% iperf{iperf}
11 root 171 ki31 0K 64K RUN 0 50.5H 43.55% idle{idle: cpu0}
34264 root 64 20 619M 301M bpf 1 17:53 0.00% snort{snort}
258 root 76 20 6908K 1404K kqread 3 15:34 0.00% check_reload_stat
12 root -68 - 0K 336K WAIT 0 10:05 0.00% intr{irq16: bge0
12 root -32 - 0K 336K RUN 0 7:13 0.00% intr{swi4: clock}
64693 proxy 64 20 380M 364M kqread 2 3:35 0.00% squid
28093 root 44 0 5784K 1484K select 2 1:29 0.00% apinger
23 root 20 - 0K 16K syncer 3 0:58 0.00% syncer
0 root -16 0 0K 176K sched 2 0:44 0.00% kernel{swapper}
14 root -16 - 0K 16K - 2 0:32 0.00% yarrow
20488 root 44 0 26272K 7532K kqread 0 0:24 0.00% lighttpd
86216 root 76 20 8296K 1932K wait 0 0:21 0.00% sh
12 root -32 - 0K 336K RUN 0 0:18 0.00% intr{swi4: clock}
8 root -16 - 0K 16K pftm 1 0:14 0.00% pfpurge
30278 dhcpd 44 0 15180K 10444K select 2 0:13 0.00% dhcpdI'm not sure what the bottleneck is here. On second thought, it doesn't looks like a processor issue. Also, I already have ip fast-forward turned on (I do use IPsec, but have not had any issues with ip fast-forward yet).
Thanks for any help!
-
Good news. I figured out the issue. The length of buffers was too short (1470 bytes for UDP by default), once I increased it to 16000 bytes things got moving much quicker.
Again pfsense –> Lenovo:
[2.1.4-RELEASE]: iperf -c 192.168.1.107 -u -b 1000m -i 1 -l 16000
–----------------------------------------------------------
Client connecting to 192.168.1.107, UDP port 5001
Sending 16000 byte datagrams
UDP buffer size: 56.0 KByte (default)[ 3] local 192.168.1.1 port 46600 connected with 192.168.1.107 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 104 MBytes 872 Mbits/sec
[ 3] 1.0- 2.0 sec 105 MBytes 884 Mbits/sec
[ 3] 2.0- 3.0 sec 108 MBytes 908 Mbits/sec
[ 3] 3.0- 4.0 sec 107 MBytes 894 Mbits/sec
[ 3] 4.0- 5.0 sec 109 MBytes 914 Mbits/sec
[ 3] 5.0- 6.0 sec 109 MBytes 915 Mbits/sec
[ 3] 6.0- 7.0 sec 109 MBytes 912 Mbits/sec
[ 3] 7.0- 8.0 sec 108 MBytes 909 Mbits/sec
[ 3] 8.0- 9.0 sec 106 MBytes 890 Mbits/sec
[ 3] 9.0-10.0 sec 105 MBytes 883 Mbits/sec
[ 3] 0.0-10.0 sec 1.05 GBytes 898 Mbits/sec
[ 3] Sent 70583 datagramsI'm pretty much hitting the practical limit of a gigabit right there.
But when I switch to TCP, I'm still getting ~300mbps.
-
Even though your NICs are on-board they will still be connected via either a PCI or PCIe bus to the chipset. It seems unlikely that it would be PCI but you never know. The exact NIC chip code will tell you. Clearly the CPU is not the restriction here, all the cores are still running idle processes.
Steve
-
12 root -68 - 0K 336K CPU0 0 10:10 60.89% intr{irq18: bge1
The interrupt load seems pretty high for <1Gbps throughput.
-
12 root -68 - 0K 336K CPU0 0 10:10 60.89% intr{irq18: bge1
The interrupt load seems pretty high for <1Gbps throughput.
I'm sure these are not the best NICs out there. :). But considering 4 cores here, this is only ~15% of CPU usage. Probably not too bad, but not great either. Intel NICs would fair better for sure.
-
I'm sure these are not the best NICs out there. :). But considering 4 cores here, this is only ~15% of CPU usage. Probably not too bad, but not great either. Intel NICs would fair better for sure.
And Chelsio better still.
-
new Intel driver v2.5.25 for x520 / x540 cards was released last week - has anybody tried it yet?
https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=14688&lang=eng&ProdId=3412
-
@gonzopancho:
I was able to get ~8Gbit/s between two FreeNAS 9.x boxes without jumbo frames when using 4 threads. That's pretty close to wire.
OK, Jason… FreeBSD won't forward at wirespeed on 10Gbps networks.
Since the BSDRP guy can only manage to forward (no firewall, just fast forwarding) at a pinch over 1.8Mpps, (and you were doing, by my best estimate, 5.5Mpps), I'm going to assert that we still have work to do.
brunoc: we're currently engaged in a 10G performance study, but yes, part of the solution will be tuning, and part of it will be the threaded pf in pfSense version 2.2.
Hmm, if all I need is a a pair of routers running CARP and NAT with a pool of IPs with 10GbE Intel NICs, would it make sense to go with 2.2 Alpha snapshots?
-
"8Gbps" is not how we measure these things.
Quote PPS or go home.
-
@gonzopancho:
"8Gbps" is not how we measure these things.
Quote PPS or go home.
My bad, lets say I need NAT (PAT really) for 500kpps
-
There is an active internal project to get the performance of 'pf' up.
-
@gonzopancho:
There is an active internal project to get the performance of 'pf' up.
Would be nice to know a little more about that project. For the time being, how near that mark can I get with a Xeon E5520/E5620, PCIe and a decent 10GbE Intel NIC?.
Should I stay with 2.1.5 or venture with 2.2 ALPHA because of the FreeBSD 10 baseline? .
-
I'd go 2.2-BETA, personally. there are only a couple things to get fixed.
The test harness is here: https://github.com/gvnn3/conductor
(Remember, people say I don't know how to open source.)
-
I didn't know there was a Beta already, I'll look at it. Thanks.
-
It's not, but should be quite soon.