Load testing methods, PPS & Bandwidth - performance with igb/em
I'm replacing my existing firewalls with some pfsense boxes and I'm just trying to get an idea of performance and how that should be tested. To give a quick overview of the configuration, I'm using the below. The servers are probably overkill - but its what I have spare. I should note that I tested this with two different motherboards, one with igb and one with em but saw the same results. The servers listed below use a Supermicro X9SCD-F with 2x integrated Intel 82580DB
2 pfsense servers: 3.4 GHz Intel Xeon E3-1240v2 / 16GB RAM / 2x 10KRPM SATAIII HDD (RAID1 gmirror)
2 test servers: 3.4 GHz Intel Xeon E3-1240v2 / 16GB RAM / 1x 10KRPM SATAIII HDD
2 switches: 1Gbit Juniper EX-3200
The configuration is a router on a stick set-up to provide firewalling and inter-vlan routing - with a single trunked 1Gbit interface to the switch (carrying the WAN VLAN and the internal VLANs)
I've been using iperf for inter-vlan testing, using the following command:
Server 1: iperf -s
Server 2: iperf -c 18.104.22.168 -d
Packets per second
pfsense 1: netstat -w 1 -I igb0 (to view packets/second)
Server 1: hping 10.0.1.1 -q -i u2 –data 64 --icmp | tail -n10
Server 2: hping 10.0.2.1 -q -i u2 --data 64 --icmp | tail -n10
(ie. pinging each other).
Are there any better / more accurate ways of performing testing? I'm not quite getting the output that I expect.
Eg. Bandwidth results
[ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 459 MBytes 385 Mbits/sec [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 642 MBytes 538 Mbits/sec
Which I would expect (with full duplex 1GB - auto neg. turned off on everything) to be 1Gb each way (not 1Gb total)?
Eg. Packets per second
The theoretical max over a 1Gbit connection should be about 1.4 million 64 byte packets per second, but I'm falling well short of this
input (igb0) output packets errs idrops bytes packets errs bytes colls 731579 0 0 77851066 489258 0 56387720 0
During the test, top -aSCHIP shows
last pid: 55400; load averages: 0.38, 0.14, 0.09 up 0+00:21:53 14:30:55 157 processes: 10 running, 110 sleeping, 37 waiting CPU 0: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle CPU 2: 0.0% user, 0.0% nice, 0.0% system, 28.2% interrupt, 71.8% idle CPU 3: 0.0% user, 0.0% nice, 0.0% system, 27.4% interrupt, 72.6% idle CPU 4: 0.0% user, 0.0% nice, 6.0% system, 0.0% interrupt, 94.0% idle CPU 5: 0.0% user, 0.0% nice, 12.8% system, 0.0% interrupt, 87.2% idle CPU 6: 0.0% user, 0.0% nice, 13.2% system, 0.0% interrupt, 86.8% idle CPU 7: 0.0% user, 0.0% nice, 9.8% system, 0.0% interrupt, 90.2% idle Mem: 52M Active, 15M Inact, 434M Wired, 72K Cache, 34M Buf, 15G Free Swap: 32G Total, 32G Free
And vmstat -i shows
interrupt total rate irq1: atkbd0 18 0 irq16: ehci0 2014 1 irq19: atapci0 11985 9 irq23: ehci1 2015 1 cpu0: timer 2584464 1995 irq256: igb0:que 0 778068 600 irq257: igb0:que 1 740291 571 irq258: igb0:que 2 10529010 8130 irq259: igb0:que 3 10489491 8099 irq260: igb0:que 4 830229 641 irq261: igb0:que 5 762681 588 irq262: igb0:que 6 798454 616 irq263: igb0:que 7 887188 685 irq264: igb0:link 3 0 cpu1: timer 2564435 1980 cpu4: timer 2564434 1980 cpu3: timer 2564434 1980 cpu5: timer 2564434 1980 cpu6: timer 2564434 1980 cpu2: timer 2564434 1980 cpu7: timer 2564434 1980 Total 46366950 35804
In terms of BSD tunables
/etc/sysctl.conf dev.igb.0.enable_lro=0 dev.igb.1.enable_lro=0 kern.random.sys.harvest.interrupt=0 kern.random.sys.harvest.ethernet=0 net.inet.ip.fastforwarding=1 kern.timecounter.hardware=HPET dev.igb.0.rx_processing_limit=480 dev.igb.1.rx_processing_limit=480 kern.ipc.nmbclusters=512000 /boot/loader.conf autoboot_delay="3" vm.kmem_size="435544320" vm.kmem_size_max="535544320" kern.ipc.nmbclusters="655356" hw.igb.num_queues="8" hw.igb.max_interrupt_rate="30000" hw.igb.rxd="3096" hw.igb.txd="3096"
What I want to know is,
1. Are the testing methods I am using accurate?
2. Are the results I am seeing good/average/poor?
3. Is there anything else I should be doing.
NB. There is no NAT/rate limiting, just pure firewalling and VLAN routing.
Incidentally, I did contact the consultancy wing of pfsense for paid professional support - but after 3 emails without a response, I'm not sure that anyone actually supports it?
Actually, regarding bandwidth, I've managed to answer my own question. It appears that the rate is normal (ie. the 1Gbit total). After reviewing systat I can see that it is processing at max performance on the interface.
# systat -ifstat Interface Traffic Peak Total igb0 in 115.311 MB/s 115.311 MB/s 6.273 GB out 115.497 MB/s 115.497 MB/s 6.033 GB
I split WAN off from the VLAN trunk and put then on igb0 and igb1 respectively, ran iperf again and saw a full 1Gbps in each direction. So the trunk was certainly the limiting factor.
1. Are the testing methods I am using accurate?
They measure what they measure - they are accurate in that respect. But perhaps what they measure is not particularly relevant to your particular circumstance. Consider a motor car. 0-60kmph is a probably a very relevant metric if you want to drag other people at the traffic lights but probably not particularly relevant to an elderly person purchasing a car for trips on suburban roads to destinations at most a few suburbs away.
Your ping statistic is interesting, but how much of your "real life" traffic is continuous pings?
Some people have reported that putting a pfSense box between two systems results in significant loss of bandwidth over a single TCP connection between the systems. This might be relevant if they are looking primarily to reduce the time of a single bulk transfer (e.g. a large backup) through a pfSense box but is perhaps of much less relevance if they are more concerned that the pfSense box is adequate to support large numbers of concurrent web page downloads. What attributes of a pfSense box are most important to you?
At the moment, pfsense is already suitable. The key task is to route <20Mbps over a 1000Mb bearer - but as it is an edge appliance, there is a requirement to be able to cope in non-normal situations (small DOS attacks and high levels of inter-vlan traffic). Note, I say cope, this is not the purpose of the firewall, but it is going to be best if the firewall is tuned to the best of its ability.
So to answer your question. No, they won't be under continuous ping nor sustained transfers.
But I was actually just aiming to work towards a target of 1.4M pps - but I wonder if the single VLAN trunk (ie. just 1x 1Bb interface) is actually the limiting factor, due to tx and rx occuring 4 times over (hence the halved iperf results seen above).
Given the server has 2x 1Gb interfaces, what would be an optimal configuration?
igb1: vlan trunk
igb0 + igb1 (lacp lagg): vlan trunk
Regarding the testing methods - I actually wanted to know how people actually test PPS rates. I've really struggled to find examples of what testing/tools/commands people use when coming up with a figure for 64 byte packet forwarding. Ie. whether they use hping or not, whether it is UDP or not, whether it is ICMP or not, etc.
Again, to follow up here. I set up a LAGG with LACP and bonded the two interfaces for the VLAN trunk to see if it altered the bandwidth test. Between 2 servers, it didn't change anything - but when testing 4 servers, the performance was shown. There's a good explanation on the limitations of LACP here https://supportforums.cisco.com/thread/2132362
If you are just transferring between 2 addresses that conversation will only flow down a single port within that port channel , thats the way port channels work . As you get more inputs from different addresses then the port channels will be more evened out due to the way the switch hashes the traffic from different sources down each port in the port channel . A single given conversation will only go down a single port .
I would go for the LAGG option, for redundancy (at least for NIC/cable).
As for PPS testing, just lower the MTU on the sending and run the iperf again?