Solved - 10GB link 1GB speeds
-
Thanks for the reply
I have 5 of these C2758 servers, two already have centos on them. One OpenBSD which gives me the same numbers as PFSense.
With FreeBSD 11 and the settings for everything I am seeing nearly 3GB per second using iperf. I do understand that there are other tools out there, but its what I have all my metrics in.
When I use the linux box as a router, I see roughly the same numbers as vanilla BSD.
I would be happy with even half the throughput I get client to server.
Thanks again.
If I find a solution to this issue, I will report back.
Next I am going to try a fresh install of pfsense, the one I have now has been upgraded for the last several years.
PFSense has been just fantastic, and until I started doing performance tuning, i really didn't even notice…
Seems as how I have a 10GB network, I would really like to get the most out of every device on it.
-
So using the latest development version of pfsense I am getting 1.4GB with pf on and the default rule set, and 3.0GB with pf turned off.
A bios update was worth about 400M
-
Looks like this equipment just isn't capable of doing 10GB or even 5GB routing traffic. Looking at the system interrupts it's reaching nearly 90% when I run this test.
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average |||||/0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root intr XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXX
root intr XGuess I will just put pfsense on some better gear and hope for better results
-
Before you throw it all out: try polling. This isn't always the solution, but if you are starving due to interrupts, polling might solve some of it.
-
With FreeBSD 11 and the settings for everything I am seeing nearly 3GB per second using iperf. I do understand that there are other tools out there, but its what I have all my metrics in.
iPerf is ok, but you should use parallel streams (-P) option in iPerf to produce more parallel streams running across
the wire. Would this be do the trick? Or perhaps it is a little bit underpowered to get real 10 Gbit/s out of there. -
Try some of the changes in here. If they don't help, change them back to default.
https://forum.pfsense.org/index.php?topic=113496.msg631076#msg631076
Those are some crazy high interrupts for only 1.4Gb/s. I'm getting 25%-30% cpu usage when doing 2Gb/s(1Gb bi-directional) through pfSense, 4Gb/s total, with 64byte packets with HFSC traffic shaping enabled on LAN and WAN.
I also find it interesting that nearly all the load is on one core. My load is evenly distributed, unless you're using a single stream to test.
-
I am only using single streams for testing. It could be that my testing is flawed, however when I run the exact same test on another machine that is the nearly the exact same config I get the results I am seeking.
For instance on a linux installed machine I get the throughput I am looking for on a single old mellanox link. It also doesn't seem to tax the machine as much.
I have a ton of gear, and so I am setting this up on one of my blades… which is proving to be a challenge of its own.
Thanks all for the help, but this hassle just isn't worth the 4 days I have put into it.
-
Thanks all for the help, but this hassle just isn't worth the 4 days I have put into it.
-
pfSense tuning for 10 Gbit Throughput
Frequency of my cpu is 2.6 Ghz, scaling to 3.8 Ghz (Xeon E3-1275 Turboboos) is a linear factor of 1,46 -> 5,0 Gbit/s -> 7,3 Gbit/s -
10Gbe Tuning?
I set the MTU on these to 9000 yesterday and 9000 on the iperf servers I'm using and was able to saturate (9.5Gb/s) the link. So I'm pretty sure I'm hitting just one interface. -
10gbe firewall using open source tools
We're using Xeon E3 boxes (1260L) with Intel 10 GbE nic's (520 series) and PFSense 2.0.1 and it's working really well. We peak around 9000 Mbps at 55% CPU utilization.
I don´t know them personally and was not in place in the moment where the tests where done
but I am pretty sure with the today given options like HT, Speed Step and TurboBoost and perhaps
with no PPPoE at the WAN or 10 GBit/s at the LAN will be able to realize, to get nearly 10 GBit/s out.
But perhaps it is also pending on the used hardware. If your FreeNAS is able to deliver such numbers
what should be then the angle point on pfSense? The packet filter, the rules, something else. I really
don´t know it, but from time to time we will see more and more threads here in the forum about that,
perhaps there will be at one day someone able to deliver some results and tips that is matching then
for all others too. -
-
So I moved my pfsense machine to one of my blades. It's not new or anything fancy, but should have yielded better performance.
And I was correct, the performance was 3x of the C2758[ 3] 0.0- 1.0 sec 370 MBytes 3.10 Gbits/sec
[ 3] 1.0- 2.0 sec 363 MBytes 3.05 Gbits/sec
[ 3] 2.0- 3.0 sec 365 MBytes 3.06 Gbits/sec
[ 3] 3.0- 4.0 sec 366 MBytes 3.07 Gbits/sec
[ 3] 4.0- 5.0 sec 368 MBytes 3.08 Gbits/sec
[ 3] 5.0- 6.0 sec 372 MBytes 3.12 Gbits/sec
[ 3] 6.0- 7.0 sec 373 MBytes 3.13 Gbits/sec
[ 3] 7.0- 8.0 sec 373 MBytes 3.13 Gbits/sec
[ 3] 8.0- 9.0 sec 375 MBytes 3.15 Gbits/sec
[ 3] 9.0-10.0 sec 373 MBytes 3.13 Gbits/secI have hyperthreading disabled and the bios performance level set to maximum.
However with this type of equipment, I expected things to move about twice as fast. I have a fairly simple ruleset.
With PF turned off, I do get performance that is closer in line with the other systems I have on my network.For instance I have openstack routers (which are just linux/SNAT/iptables) that run around 6.5 to 7 G on the other blades.
Now with multiple threads (4) I can get a little closer to my mark at around 6G per second.
I know there are people out there getting near wireline speeds from their gear, I just don't know how they are doing it.
-
Try disabling all offloading options, and using polling and having bigger buffers.
-
It would seem that any sort of tuning actually makes it run slower. I have no marked improvements from the pfsense defaults.
It would seem I need to get some better hardware that is more suited for the task. Good news is a 40G card is coming along with a 40G switch (6 ports).
I am curious to see what kind of hurt I can put on this box with 40G gear.
You can mark this thread closed, I am moving on to more important things. I will open a new one when the 40G gear get here and I have a chance to tinker.
-
So i have an update. The 40G nic from mellanox performs wonderfully on vanilla FreeBSD and Linux, however I see the same performance with pfSense that I was getting with the 10GB nics. I would like to know what the differences are from the raw BSD kernel.
I really love pfSense, it makes my life so easy to do otherwise complicated stuff. But these performance issues should be addressed.
-
@johnkeates:
Before you throw it all out: try polling. This isn't always the solution, but if you are starving due to interrupts, polling might solve some of it.
I am not familiar? any good places to start?
edit
ifconfig mlxen0 pollingClient connecting to ..., TCP port 5001
TCP window size: 85.0 KByte (default)
–----------------------------------------------------------
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 110 MBytes 922 Mbits/sec
[ 4] 0.0- 1.0 sec 64.6 MBytes 542 Mbits/sec
[ 5] 0.0- 1.0 sec 53.2 MBytes 447 Mbits/sec
[SUM] 0.0- 1.0 sec 228 MBytes 1.91 Gbits/sec
[ 3] 1.0- 2.0 sec 110 MBytes 925 Mbits/sec
[ 5] 1.0- 2.0 sec 57.4 MBytes 481 Mbits/sec
[ 4] 1.0- 2.0 sec 56.5 MBytes 474 Mbits/sec
[SUM] 1.0- 2.0 sec 224 MBytes 1.88 Gbits/sec
[ 3] 2.0- 3.0 sec 112 MBytes 936 Mbits/sec
[ 4] 2.0- 3.0 sec 54.5 MBytes 457 Mbits/sec
[ 5] 2.0- 3.0 sec 59.9 MBytes 502 Mbits/sec
[SUM] 2.0- 3.0 sec 226 MBytes 1.90 Gbits/sec
[ 4] 3.0- 4.0 sec 52.8 MBytes 442 Mbits/sec
[ 3] 3.0- 4.0 sec 113 MBytes 948 Mbits/sec
[ 5] 3.0- 4.0 sec 62.1 MBytes 521 Mbits/sec
[SUM] 3.0- 4.0 sec 228 MBytes 1.91 Gbits/secifconfig mlxen0 -polling
–----------------------------------------------------------
Client connecting to ..., TCP port 5001
TCP window size: 85.0 KByte (default)[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 108 MBytes 905 Mbits/sec
[ 5] 0.0- 1.0 sec 109 MBytes 915 Mbits/sec
[ 4] 0.0- 1.0 sec 107 MBytes 898 Mbits/sec
[SUM] 0.0- 1.0 sec 324 MBytes 2.72 Gbits/sec
[ 5] 1.0- 2.0 sec 108 MBytes 904 Mbits/sec
[ 4] 1.0- 2.0 sec 107 MBytes 898 Mbits/sec
[ 3] 1.0- 2.0 sec 107 MBytes 901 Mbits/sec
[SUM] 1.0- 2.0 sec 322 MBytes 2.70 Gbits/sec
[ 5] 2.0- 3.0 sec 108 MBytes 910 Mbits/sec
[ 4] 2.0- 3.0 sec 107 MBytes 900 Mbits/sec
[ 3] 2.0- 3.0 sec 108 MBytes 906 Mbits/sec
[SUM] 2.0- 3.0 sec 324 MBytes 2.72 Gbits/sec -
So i have an update. The 40G nic from mellanox performs wonderfully on vanilla FreeBSD and Linux, however I see the same performance with pfSense that I was getting with the 10GB nics. I would like to know what the differences are from the raw BSD kernel.
pfSense is using the pf (packet filter) and NAT as a point later in the pf process, and this will be not done
in the FreeBSD and Linux OS!!!! So if you want to compare then against this will be the most matching
answer and on top of this it might be also pending on the used hardware, if you are using a Xeon E3 or high scaling
Xeon E3 CPU (3,7GHz 7C/8T) you will perhaps get more throughput out of this then using a C2758 based machine.I really love pfSense, it makes my life so easy to do otherwise complicated stuff. But these performance issues should be addressed.
Take hardware with more horse power, or stronger sorted CPUs (and RAM) so there is nothing that have addressed to.
-
To debug this a bit more try setting up pfSense as a test with no NAT enabled. At the same time, disable pf in the advanced settings. With that done, try a iperf test again. If we're gonna figure out why this is happening, we're gonna need to start excluding stuff.
On the other hand, if you need this to work, you might be better off buying support at Netgate since they build pfSense.
-
I agree with your point, and these are not complaints. If I wanted this to just work, I would stick with Fedora. However, I’m just trying to get to the bottom of what appears to be a pfsense specific issue. With pfctl -d I still only get around 5g and high cpu/ interrupts. Are there settings that I am missing. This is a clean install with default settings.
On FreeBSD and Linux there is almost no cpu utilization, as it’s mostly offloaded to the nic. However I’m not seeing this reflected in the pfsense build.
Thanks all for you input and time.
~/D -
@BlueKobold:
So i have an update. The 40G nic from mellanox performs wonderfully on vanilla FreeBSD and Linux, however I see the same performance with pfSense that I was getting with the 10GB nics. I would like to know what the differences are from the raw BSD kernel.
pfSense is using the pf (packet filter) and NAT as a point later in the pf process, and this will be not done
in the FreeBSD and Linux OS!!!! So if you want to compare then against this will be the most matching
answer and on top of this it might be also pending on the used hardware, if you are using a Xeon E3 or high scaling
Xeon E3 CPU (3,7GHz 7C/8T) you will perhaps get more throughput out of this then using a C2758 based machine.I’m only routing packets, no NAT. Also with pf fully disabled I still get very high utilization numbers.
I really love pfSense, it makes my life so easy to do otherwise complicated stuff. But these performance issues should be addressed.
Take hardware with more horse power, or stronger sorted CPUs (and RAM) so there is nothing that have addressed to.
There isn’t really a need for better equipment, it works fine with other options.
-
Have you tried to run VyOS on your hardware? With basic NAT and firewalling enabled it will allow you to assess what your hardware is really capable of as a basic gateway/firewall.
-
Hmm, next would probably be comparing sysctl output (I guess just getting both sysctl outputs and running a diff on them will do), and perhaps kernel/driver build configs (again, a diff should suffice).
-
There are some cheap ways to increase the throughput.
1. Increase MTU
If you are lucky you can use jumbo-frames throughout your environment (this will lead to a factor of 6 in throughput, assuming MTU of 9000 (maximum which is usable in vmware) instead of 1500). However if you speak to the outside-world you are likely to create a bottleneck due to the need to fragment.2. Packet Rates
For high packet rates with small packets this will not help. There is a limit within the packet processing within FreeBSD which might be lower than in other network-stacks: Compare for example:
http://rhelblog.redhat.com/2015/09/29/pushing-the-limits-of-kernel-networking/
A valid source seems the Freebsd-Router-Project:
https://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-crThey also give figures for pf.
3. Real World examples
Remember always to measure through the device:[ Pc1 ] –- > [pfsense-system] –- > [Pc2]
I can give some real world examples: ESXi-Guests with 8 CPUs (2.6 GHz) allow pushing of 5 Gbit/s with MTU 1500. Therefore i assume that real hardware should be able to achive higher throughputs.
The main problem seems to be the high interrupt-rate.
I did some measurements on a X710 40 Gbit/s Card (8 CPUs, > 2 GHz) and i was able to reach throughputs around 12.3 Gbit/s.
As far as i heared with commodity hardware the limit seems to be 26 Gbit/s,
https://www.ntop.org/products/packet-capture/pf_ring/pf_ring-zc-zero-copy/