Benchmark of pf / ipfw / forwarding on FreeBSD-HEAD

dhatz

Olivier Cochard-Labbé did an interesting benchmark of the two FreeBSD packet filters (pf and ipfw) running on various -HEAD (i.e. what will eventually become FreeBSD v10), and posted the results in freebsd-current mailing list.

http://lists.freebsd.org/pipermail/freebsd-current/2013-April/041323.html
forwarding/ipfw/pf evolution (in pps) on -current
Olivier Cochard-Labbé olivier at cochard.me
Wed Apr 24 10:45:53 UTC 2013

Hi all,

here is the result of my simple-and-dummy bench script regarding
forwarding/ipfw/pf performance evolution on -current on a single-core
server with one flow only.
It's the result of more than 810 bench tests (including reboot between
each) done twice for validating my methodology.

Disclaimer

1. It's not a "max performance" bench: The purpose is to graph the
variation of the performance only.
2. I know that using a single-core server in 2013 is a stupid idea but
it's all I've got on my lab :-(

Why all these benchs ?

I've found performance regression regarding packet forwarding/ipfw/pf
speed on -current comparing to 9.1 on my old server.
glebius@ ask me to do some bisection hunting on different -current
revision for spotting the culprit commit.
But as a lazy guy, in place of doing bisection, I've choose about 50
svn revision and graph them all: It's a lot's more easy to script this
than a bisection algorithm :-)
And the result is interesting…

The results

The gnuplot diagram in png format with some confirmed specifics spots
is available here:
http://gugus69.free.fr/freebsd/benchs/current/current-pps.png

A confirmed spot is a measurable change between revision N-1 and revision N.

=> Remember that I'm used a single-core before reading the result!
The "regression" of the new SMP pf is not really a regression: The
system is now usable during this high PPS bench and it was not the
case before this improvement.

gnuplot data

Available here: http://gugus69.free.fr/freebsd/benchs/current/plot/
It's the data and plot file used for generating the graph: You can use
them for zooming on it.

ministat data

Available here: http://gugus69.free.fr/freebsd/benchs/current/ministat/

You can use it for comparing result between 2 revision, like as example:
ministat -s 242160.ipfw 242161.ipfw

raw data

Outpout of pkg-gen during all tests:
http://gugus69.free.fr/freebsd/benchs/current/raw/

nanobsd images

All binary mages used for these benchs are here:
http://gugus69.free.fr/freebsd/benchs/current/nanobsd-images/

There is only one "full" image to be used for the first installation,
and all other are "upgrade" image.
They use the serial port as default console too.

Methodology used

First step: building a small lab

I've used 3 old unused servers and a good switch:

One server as netmap pkt-gen packet generator (1.38Mpps of minimum
size packet);

One server as netmap pkt-gen receiver;

One server with 2 NIC in the middle as a router/firewall, serial
connection, and nanobsd image on it (very easy to upgrade): IBM
eServer xSeries 306m with one core (Intel Pentium4 3.00GHz,
hyper-threading disabled) and a dual NIC 82546GB connected to the
PCI-X Bus;

a Cisco Catalyst switch for connecting all (its own statistics can
be used as a tie breaker if I've got a doubt regarding the result
given by netmap pkt-gen).

All servers have another NIC for the admin network (bench script send
SSH commands and nanobsd image upgrade over this dedicated NIC).

I've used netmap pkt-gen for generating smallest packet size from the
generator to the receiver like that:
pkt-gen -i em0 -t 0 -l 42 -d 1.1.1.1 -D 00:0e:0c45:df -s 2.2.2.2 -w 10
Results was collected on the pkt-gen receiver.

Second step: building small nanobsd images

Now we need lot's of small nanobsd images generated from the svn
revision number selected for the bench: cf script [1].
About 50 revisions were selected between 236884 to 249506: Candidate
chosen by reading the svn commit log.

Third step: auto-bench script

This auto-bench script [2] do these tasks:
1. Upgrading the server to the release to be tested;
2. Uploading configuration set to be tested (forwarding-only, ipfw
or pf) & reboot;
3. Start the bench test, collecting the result, and reboot: 5
times for each configuration-set;
4 Loop to next configuration set;
5. Loop to next release.

Last step: converting result for ministat and gnuplot

I've used a last script for interpreting the output of pkt-gen
receiver for ministat and gnuplot [3].

Because I'm not sure if I've used the good method for preparing my
data, here is how I've generated the ministat and gnuplot graph:

For just one test, the output of pkt-gen in receive mode is lot's of
lines like that:
main [1085] 400198 pps
main [1085] 400287 pps
main [1085] 400240 pps
main [1085] 400235 pps
main [1085] 400245 pps
…

I've calculated the median value [3] (thanks ministat) all these
results: This give me only one number for the test.
=> I did the same for each of the 5 same bench tests (same
configuration-set, just a reboot between them). And I've put these 5
numbers in the file named SVN-REV.CONFIG-SET.
=> From these 5 numbers, I've calculated the "median" value again:
This give me a unique performance number that I've used as gnuplot
data file.

Bisection

From this first result, I've selected others svn revision to
generated: The goal was to spot the exact commit that brings the
change.
But it was not feasible for all regression spotted, because of
unbuildable source or non-bootable resulting nanobsd image.

Final: a full re-run

Once all my benchs done, I've wait few days and re-started all tests a
second time: Before to publish my result, I would to check that all my
results were reproducible.

Annexes

configuration sets

common to all configuration

Forwarding enabled
Ethernet flow-control disabled (dev.em.0.fc=0 and/or dev.em.0.flow_control=0)
NIC drivers tunned:
hw.em.rx_process_limit: 500
hw.em.txd: 4096
hw.em.rxd: 4096
static ARP entry configured on all server and static MAC/Pport entry
on the switch too (prevent the switch to age out the packet receiver's
MAC address).

forwarding

nothing special

ipfw

/etc/ipfw.rules:
#!/bin/sh
fwcmd="/sbin/ipfw"
# Flush out the list before we begin.
${fwcmd} -f flush
${fwcmd} add 3000 allow ip from any to any

pf

/etc/pf.conf:
set skip on lo0
pass

[1] http://sourceforge.net/p/bsdrp/code/HEAD/tree/trunk/BSDRP/tools/bisection-gen.sh
[2] http://sourceforge.net/p/bsdrp/code/HEAD/tree/trunk/BSDRP/tools/bench-lab.sh
[3] http://sourceforge.net/p/bsdrp/code/HEAD/tree/trunk/BSDRP/tools/bench-lab-ministat.sh

A related note:

http://lists.freebsd.org/pipermail/freebsd-current/2013-April/041326.html
forwarding/ipfw/pf evolution (in pps) on -current
Andre Oppermann andre at freebsd.org
Wed Apr 24 12:35:14 UTC 2013

On 24.04.2013 12:45, Olivier Cochard-Labbé wrote:

Hi all,

here is the result of my simple-and-dummy bench script regarding
forwarding/ipfw/pf performance evolution on -current on a single-core
server with one flow only.
It's the result of more than 810 bench tests (including reboot between
each) done twice for validating my methodology.

Thanks for your excellent work in doing this benchmark time-series,

One server with 2 NIC in the middle as a router/firewall, serial
connection, and nanobsd image on it (very easy to upgrade): IBM
eServer xSeries 306m with one core (Intel Pentium4 3.00GHz,
hyper-threading disabled) and a dual NIC 82546GB connected to the
PCI-X Bus;

however I want to point out that the Pentium4 has about the worst
lock overhead of all cpu architectures, even on UP. This may cause
certain changes to look much worse than they are on currently popular
architectures.

For an estimate and time-series comparison your bench test is very
helpful though.

current-pps.png_thumb