Benchmark of pf / ipfw / forwarding on FreeBSD-HEAD



  • Olivier Cochard-Labbé did an interesting benchmark of the two FreeBSD packet filters (pf and ipfw) running on various -HEAD (i.e. what will eventually become FreeBSD v10), and posted the results in freebsd-current mailing list.

    http://lists.freebsd.org/pipermail/freebsd-current/2013-April/041323.html
    forwarding/ipfw/pf evolution (in pps) on -current
    Olivier Cochard-Labbé olivier at cochard.me
    Wed Apr 24 10:45:53 UTC 2013

    Hi all,

    here is the result of my simple-and-dummy bench script regarding
    forwarding/ipfw/pf performance evolution on -current on a single-core
    server with one flow only.
    It's the result of more than 810 bench tests (including reboot between
    each) done twice for validating my methodology.

    Disclaimer

    1. It's not a "max performance" bench: The purpose is to graph the
    variation of the performance only.
    2. I know that using a single-core server in 2013 is a stupid idea but
    it's all I've got on my lab :-(

    Why all these benchs ?

    I've found performance regression regarding packet forwarding/ipfw/pf
    speed on -current comparing to 9.1 on my old server.
    glebius@ ask me to do some bisection hunting on different -current
    revision for spotting the culprit commit.
    But as a lazy guy, in place of doing bisection, I've choose about 50
    svn revision and graph them all: It's a lot's more easy to script this
    than a bisection algorithm :-)
    And the result is interesting…

    The results

    The gnuplot diagram in png format with some confirmed specifics spots
    is available here:
    http://gugus69.free.fr/freebsd/benchs/current/current-pps.png

    A confirmed spot is a measurable change between revision N-1 and revision N.

    => Remember that I'm used a single-core before reading the result!
    The "regression" of the new SMP pf is not really a regression: The
    system is now usable during this high PPS bench and it was not the
    case before this improvement.

    gnuplot data

    Available here: http://gugus69.free.fr/freebsd/benchs/current/plot/
    It's the data and plot file used for generating the graph: You can use
    them for zooming on it.

    ministat data

    Available here: http://gugus69.free.fr/freebsd/benchs/current/ministat/

    You can use it for comparing result between 2 revision, like as example:
    ministat -s 242160.ipfw 242161.ipfw

    raw data

    Outpout of pkg-gen during all tests:
    http://gugus69.free.fr/freebsd/benchs/current/raw/

    nanobsd images

    All binary mages used for these benchs are here:
    http://gugus69.free.fr/freebsd/benchs/current/nanobsd-images/

    There is only one "full" image to be used for the first installation,
    and all other are "upgrade" image.
    They use the serial port as default console too.

    Methodology used

    First step: building a small lab

    I've used 3 old unused servers and a good switch:

    • One server as netmap pkt-gen packet generator (1.38Mpps of minimum
      size packet);
    • One server as netmap pkt-gen receiver;
    • One server with 2 NIC in the middle as a router/firewall, serial
      connection, and nanobsd image on it (very easy to upgrade): IBM
      eServer xSeries 306m with one core (Intel Pentium4 3.00GHz,
      hyper-threading disabled) and a dual NIC 82546GB connected to the
      PCI-X Bus;
    • a Cisco Catalyst switch for connecting all (its own statistics can
      be used as a tie breaker if I've got a doubt regarding the result
      given by netmap pkt-gen).

    All servers have another NIC for the admin network (bench script send
    SSH commands and nanobsd image upgrade over this dedicated NIC).

    I've used netmap pkt-gen for generating smallest packet size from the
    generator to the receiver like that:
    pkt-gen -i em0 -t 0 -l 42 -d 1.1.1.1 -D 00:0e:0c🇩🇪45:df -s 2.2.2.2 -w 10
    Results was collected on the pkt-gen receiver.

    Second step: building small nanobsd images

    Now we need lot's of small nanobsd images generated from the svn
    revision number selected for the bench: cf script [1].
    About 50 revisions were selected between 236884 to 249506: Candidate
    chosen by reading the svn commit log.

    Third step: auto-bench script

    This auto-bench script [2] do these tasks:
    1. Upgrading the server to the release to be tested;
    2.  Uploading configuration set to be tested (forwarding-only, ipfw
    or pf) & reboot;
    3.    Start the bench test, collecting the result, and reboot: 5
    times for each configuration-set;
    4    Loop to next configuration set;
    5. Loop to next release.

    Last step: converting result for ministat and gnuplot

    I've used a last script for interpreting the output of pkt-gen
    receiver for ministat and gnuplot [3].

    Because I'm not sure if I've used the good method for preparing my
    data, here is how I've generated the ministat and gnuplot graph:

    For just one test, the output of pkt-gen in receive mode is lot's of
    lines like that:
    main [1085] 400198 pps
    main [1085] 400287 pps
    main [1085] 400240 pps
    main [1085] 400235 pps
    main [1085] 400245 pps

    I've calculated the median value [3] (thanks ministat) all these
    results: This give me only one number for the test.
    => I did the same for each of the 5 same bench tests (same
    configuration-set, just a reboot between them). And I've put these 5
    numbers in the file named SVN-REV.CONFIG-SET.
    => From these 5 numbers, I've calculated the "median" value again:
    This give me a unique performance number that I've used as gnuplot
    data file.

    Bisection

    From this first result, I've selected others svn revision to
    generated: The goal was to spot the exact commit that brings the
    change.
    But it was not feasible for all regression spotted, because of
    unbuildable source or non-bootable resulting nanobsd image.

    Final: a full re-run

    Once all my benchs done, I've wait few days and re-started all tests a
    second time: Before to publish my result, I would to check that all my
    results were reproducible.

    Annexes

    configuration sets

    common to all configuration

    Forwarding enabled
    Ethernet flow-control disabled (dev.em.0.fc=0 and/or dev.em.0.flow_control=0)
    NIC drivers tunned:
      hw.em.rx_process_limit: 500
      hw.em.txd: 4096
      hw.em.rxd: 4096
    static ARP entry configured on all server and static MAC/Pport entry
    on the switch too (prevent the switch to age out the packet receiver's
    MAC address).

    forwarding

    nothing special

    ipfw

    /etc/ipfw.rules:
      #!/bin/sh
      fwcmd="/sbin/ipfw"
      # Flush out the list before we begin.
      ${fwcmd} -f flush
      ${fwcmd} add 3000 allow ip from any to any

    pf

    /etc/pf.conf:
      set skip on lo0
      pass

    [1] http://sourceforge.net/p/bsdrp/code/HEAD/tree/trunk/BSDRP/tools/bisection-gen.sh
    [2] http://sourceforge.net/p/bsdrp/code/HEAD/tree/trunk/BSDRP/tools/bench-lab.sh
    [3] http://sourceforge.net/p/bsdrp/code/HEAD/tree/trunk/BSDRP/tools/bench-lab-ministat.sh

    A related note:

    http://lists.freebsd.org/pipermail/freebsd-current/2013-April/041326.html
    forwarding/ipfw/pf evolution (in pps) on -current
    Andre Oppermann andre at freebsd.org
    Wed Apr 24 12:35:14 UTC 2013

    On 24.04.2013 12:45, Olivier Cochard-Labbé wrote:

    Hi all,

    here is the result of my simple-and-dummy bench script regarding
    forwarding/ipfw/pf performance evolution on -current on a single-core
    server with one flow only.
    It's the result of more than 810 bench tests (including reboot between
    each) done twice for validating my methodology.

    Thanks for your excellent work in doing this benchmark time-series,

    • One server with 2 NIC in the middle as a router/firewall, serial
      connection, and nanobsd image on it (very easy to upgrade): IBM
      eServer xSeries 306m with one core (Intel Pentium4 3.00GHz,
      hyper-threading disabled) and a dual NIC 82546GB connected to the
      PCI-X Bus;

    however I want to point out that the Pentium4 has about the worst
    lock overhead of all cpu architectures, even on UP.  This may cause
    certain changes to look much worse than they are on currently popular
    architectures.

    For an estimate and time-series comparison your bench test is very
    helpful though.



Log in to reply