Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Performance issue with filtering enabled

    Scheduled Pinned Locked Moved Hardware
    7 Posts 4 Posters 4.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G
      gregober
      last edited by

      Hello,

      I have been testing performance on a platform that I resell.
      I have made a simple configuration which looks like that :

      Station_1 <<< WAN >>> pfSense_FW <<< LAN >>> Station_2
      1.2.3.4 <> 1.2.3.5/24  192.168.1.1 <> DHCP

      I have been conducting these tests with pfSense 1.2.3

      The hardware I have been testing the solution on is the following :

      • Intel®  Atom  N270  1.6  GHz 
      • Intel®  945GSE  North  &  ICH7- M  South  Bridge  Chipset
      • 512MB  DDR2  RAM on  board  +  1  SODIMM  1024MB Slot
      • 5  LAN  Ports  (4  Gigabit  Intel  82574L  +  1  FE  Intel  82551ER)

      Stations were connected on Giga ports.

      I have tried all sort of things to optimize the settings on the firewall, all in all I have obtained the following results:

      WITH PACKET FILTERING ENABLED
      gregober 18:24:15 ~ -> iperf -c 1.2.3.4
      –----------------------------------------------------------
      Client connecting to 1.2.3.4, TCP port 5001
      TCP window size:  129 KByte (default)

      [  3] local 192.168.1.199 port 53298 connected with 1.2.3.4 port 5001
      [ ID] Interval      Transfer    Bandwidth
      [  3]  0.0-10.0 sec  257 MBytes  216 Mbits/sec

      WITHOUT PACKET FILTERING ENABLED
      gregober 18:40:12 ~ -> iperf -c 1.2.3.4
      –----------------------------------------------------------
      Client connecting to 1.2.3.4, TCP port 5001
      TCP window size:  129 KByte (default)

      [  3] local 192.168.1.199 port 53391 connected with 1.2.3.4 port 5001
      [ ID] Interval      Transfer    Bandwidth
      [  3]  0.0-10.0 sec  1.03 GBytes  882 Mbits/sec

      This means that when firewall is enabled, performances of the firewall are reduced by 75%

      I found this quite surprising because my hardware is very far from beeing saturated, It is in fact not impacted at all by these tests.

      • I was wondering if this is normal ?

      • Is there any settings I might optimize somewhere ?

      • I have seen that em dirver from Intel is not so optimized, is this still the case ?

      Any proposition on how to optimize that will be welcome.

      1 Reply Last reply Reply Quote 0
      • J
        jasonlitka
        last edited by

        Not really sure what more you can expect from a single-core Atom.

        I can break anything.

        1 Reply Last reply Reply Quote 0
        • G
          gregober
          last edited by

          @jasonlitka:

          Not really sure what more you can expect from a single-core Atom.

          What strikes me is the difference between filtering enable and filtering disabled… ??

          Is the filtering process so intensive that It'll reduce by 75% the throughput ?

          This figure is really impressive, I'll try to investigate the reasons.

          1 Reply Last reply Reply Quote 0
          • W
            wallabybob
            last edited by

            @gregober:

            I found this quite surprising because my hardware is very far from beeing saturated, It is in fact not impacted at all by these tests.

            I presume by hardware here you mean the network interfaces and not the CPU. Is that correct?

            I believe there are some em related sysctls that could be modified to attempt to reduce interrupt overhead.

            If you enable jumbo frames all around you could use a much larger MTU which would also reduce interrupt overhead but this would probably not be of much use if you will have serious internet traffic.

            I agree that the apparent overhead of packet filtering is "interesting". Is the CPU fully utilised when you perform that test? Does the N270 have hyperthreading? If so, does it make a difference to run single CPU?

            What version of pfSense are you testing?

            1 Reply Last reply Reply Quote 0
            • G
              gregober
              last edited by

              @wallabybob:

              @gregober:

              I found this quite surprising because my hardware is very far from beeing saturated, It is in fact not impacted at all by these tests.

              I presume by hardware here you mean the network interfaces and not the CPU. Is that correct?

              Yes this is correct, I mean the performance of the network interface.

              @wallabybob:

              I believe there are some em related sysctls that could be modified to attempt to reduce interrupt overhead.

              Yes, It looks like you can tune some parameters for em interfaces, more details here http://www.freebsd.org/cgi/man.cgi?query=em&apropos=0&sektion=0&manpath=FreeBSD+7.3-RELEASE&format=html

              Among all of these parameters I don't really know which one might be of interest… 
              Most of them seem to have correct default values?

              hw.em.rxd
                  Number of receive descriptors allocated by the driver.  The
                  default value is 256.  The 82542 and 82543-based adapters can
                  handle up to 256 descriptors, while others can have up to 4096.

              hw.em.txd
                  Number of transmit descriptors allocated by the driver.  The
                  default value is 256.  The 82542 and 82543-based adapters can
                  handle up to 256 descriptors, while others can have up to 4096.

              hw.em.rx_int_delay
                  This value delays the generation of receive interrupts in units
                  of 1.024 microseconds.  The default value is 0, since adapters
                  may hang with this feature being enabled.

              hw.em.rx_abs_int_delay
                  If hw.em.rx_int_delay is non-zero, this tunable limits the maxi-
                  mum delay in which a receive interrupt is generated.

              hw.em.tx_int_delay
                  This value delays the generation of transmit interrupts in units
                  of 1.024 microseconds.  The default value is 64.

              hw.em.tx_abs_int_delay
                  If hw.em.tx_int_delay is non-zero, this tunable limits the maxi-
                  mum delay in which a transmit interrupt is generated.

              @wallabybob:

              If you enable jumbo frames all around you could use a much larger MTU which would also reduce interrupt overhead but this would probably not be of much use if you will have serious internet traffic.

              My goal is to try to discover why performance of the firewall is reduced by 75% when firewalling is enabled.
              Most parameters should be kept at default value, including MTU, as my purpose is to have an accurate pictures of capacity of the firewall.

              @wallabybob:

              I agree that the apparent overhead of packet filtering is "interesting". Is the CPU fully utilised when you perform that test? Does the N270 have hyperthreading? If so, does it make a difference to run single CPU?

              No, the CPU is very lightly used, I'll take time to make some more tests tomorow (CEST) and post them on this thread, I'll use advise (diagnostic) from JimP to try to give as much info as possible in order to understand what's going on precisely.

              @wallabybob:

              What version of pfSense are you testing?

              1.2.3 from july 2010

              1 Reply Last reply Reply Quote 0
              • W
                wallabybob
                last edited by

                @gregober:

                No, the CPU is very lightly used,

                On both tests? What are you using to deermine the "light" usage?

                1 Reply Last reply Reply Quote 0
                • C
                  cmb
                  last edited by

                  Filtering has vastly more overhead than simply routing, a drop along those lines is to be expected and those numbers are close to what I've seen on similar hardware.

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.