Performance issue with filtering enabled



  • Hello,

    I have been testing performance on a platform that I resell.
    I have made a simple configuration which looks like that :

    Station_1 <<< WAN >>> pfSense_FW <<< LAN >>> Station_2
    1.2.3.4 <> 1.2.3.5/24  192.168.1.1 <> DHCP

    I have been conducting these tests with pfSense 1.2.3

    The hardware I have been testing the solution on is the following :

    • Intel®  Atom  N270  1.6  GHz 
    • Intel®  945GSE  North  &  ICH7- M  South  Bridge  Chipset
    • 512MB  DDR2  RAM on  board  +  1  SODIMM  1024MB Slot
    • 5  LAN  Ports  (4  Gigabit  Intel  82574L  +  1  FE  Intel  82551ER)

    Stations were connected on Giga ports.

    I have tried all sort of things to optimize the settings on the firewall, all in all I have obtained the following results:

    WITH PACKET FILTERING ENABLED
    gregober 18:24:15 ~ -> iperf -c 1.2.3.4
    –----------------------------------------------------------
    Client connecting to 1.2.3.4, TCP port 5001
    TCP window size:  129 KByte (default)

    [  3] local 192.168.1.199 port 53298 connected with 1.2.3.4 port 5001
    [ ID] Interval      Transfer    Bandwidth
    [  3]  0.0-10.0 sec  257 MBytes  216 Mbits/sec

    WITHOUT PACKET FILTERING ENABLED
    gregober 18:40:12 ~ -> iperf -c 1.2.3.4
    –----------------------------------------------------------
    Client connecting to 1.2.3.4, TCP port 5001
    TCP window size:  129 KByte (default)

    [  3] local 192.168.1.199 port 53391 connected with 1.2.3.4 port 5001
    [ ID] Interval      Transfer    Bandwidth
    [  3]  0.0-10.0 sec  1.03 GBytes  882 Mbits/sec

    This means that when firewall is enabled, performances of the firewall are reduced by 75%

    I found this quite surprising because my hardware is very far from beeing saturated, It is in fact not impacted at all by these tests.

    • I was wondering if this is normal ?

    • Is there any settings I might optimize somewhere ?

    • I have seen that em dirver from Intel is not so optimized, is this still the case ?

    Any proposition on how to optimize that will be welcome.



  • Not really sure what more you can expect from a single-core Atom.



  • @jasonlitka:

    Not really sure what more you can expect from a single-core Atom.

    What strikes me is the difference between filtering enable and filtering disabled… ??

    Is the filtering process so intensive that It'll reduce by 75% the throughput ?

    This figure is really impressive, I'll try to investigate the reasons.



  • @gregober:

    I found this quite surprising because my hardware is very far from beeing saturated, It is in fact not impacted at all by these tests.

    I presume by hardware here you mean the network interfaces and not the CPU. Is that correct?

    I believe there are some em related sysctls that could be modified to attempt to reduce interrupt overhead.

    If you enable jumbo frames all around you could use a much larger MTU which would also reduce interrupt overhead but this would probably not be of much use if you will have serious internet traffic.

    I agree that the apparent overhead of packet filtering is "interesting". Is the CPU fully utilised when you perform that test? Does the N270 have hyperthreading? If so, does it make a difference to run single CPU?

    What version of pfSense are you testing?



  • @wallabybob:

    @gregober:

    I found this quite surprising because my hardware is very far from beeing saturated, It is in fact not impacted at all by these tests.

    I presume by hardware here you mean the network interfaces and not the CPU. Is that correct?

    Yes this is correct, I mean the performance of the network interface.

    @wallabybob:

    I believe there are some em related sysctls that could be modified to attempt to reduce interrupt overhead.

    Yes, It looks like you can tune some parameters for em interfaces, more details here http://www.freebsd.org/cgi/man.cgi?query=em&apropos=0&sektion=0&manpath=FreeBSD+7.3-RELEASE&format=html

    Among all of these parameters I don't really know which one might be of interest… 
    Most of them seem to have correct default values?

    hw.em.rxd
        Number of receive descriptors allocated by the driver.  The
        default value is 256.  The 82542 and 82543-based adapters can
        handle up to 256 descriptors, while others can have up to 4096.

    hw.em.txd
        Number of transmit descriptors allocated by the driver.  The
        default value is 256.  The 82542 and 82543-based adapters can
        handle up to 256 descriptors, while others can have up to 4096.

    hw.em.rx_int_delay
        This value delays the generation of receive interrupts in units
        of 1.024 microseconds.  The default value is 0, since adapters
        may hang with this feature being enabled.

    hw.em.rx_abs_int_delay
        If hw.em.rx_int_delay is non-zero, this tunable limits the maxi-
        mum delay in which a receive interrupt is generated.

    hw.em.tx_int_delay
        This value delays the generation of transmit interrupts in units
        of 1.024 microseconds.  The default value is 64.

    hw.em.tx_abs_int_delay
        If hw.em.tx_int_delay is non-zero, this tunable limits the maxi-
        mum delay in which a transmit interrupt is generated.

    @wallabybob:

    If you enable jumbo frames all around you could use a much larger MTU which would also reduce interrupt overhead but this would probably not be of much use if you will have serious internet traffic.

    My goal is to try to discover why performance of the firewall is reduced by 75% when firewalling is enabled.
    Most parameters should be kept at default value, including MTU, as my purpose is to have an accurate pictures of capacity of the firewall.

    @wallabybob:

    I agree that the apparent overhead of packet filtering is "interesting". Is the CPU fully utilised when you perform that test? Does the N270 have hyperthreading? If so, does it make a difference to run single CPU?

    No, the CPU is very lightly used, I'll take time to make some more tests tomorow (CEST) and post them on this thread, I'll use advise (diagnostic) from JimP to try to give as much info as possible in order to understand what's going on precisely.

    @wallabybob:

    What version of pfSense are you testing?

    1.2.3 from july 2010



  • @gregober:

    No, the CPU is very lightly used,

    On both tests? What are you using to deermine the "light" usage?



  • Filtering has vastly more overhead than simply routing, a drop along those lines is to be expected and those numbers are close to what I've seen on similar hardware.


Log in to reply