Performance issue with filtering enabled

gregober

Hello,

I have been testing performance on a platform that I resell.
I have made a simple configuration which looks like that :

Station_1 <<< WAN >>> pfSense_FW <<< LAN >>> Station_2
1.2.3.4 <> 1.2.3.5/24 192.168.1.1 <> DHCP

I have been conducting these tests with pfSense 1.2.3

The hardware I have been testing the solution on is the following :

• Intel Atom N270 1.6 GHz
• Intel 945GSE North & ICH7- M South Bridge Chipset
• 512MB DDR2 RAM on board + 1 SODIMM 1024MB Slot
• 5 LAN Ports (4 Gigabit Intel 82574L + 1 FE Intel 82551ER)

Stations were connected on Giga ports.

I have tried all sort of things to optimize the settings on the firewall, all in all I have obtained the following results:

WITH PACKET FILTERING ENABLED
gregober 18:24:15 ~ -> iperf -c 1.2.3.4
–----------------------------------------------------------
Client connecting to 1.2.3.4, TCP port 5001
TCP window size: 129 KByte (default)

[ 3] local 192.168.1.199 port 53298 connected with 1.2.3.4 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 257 MBytes 216 Mbits/sec

WITHOUT PACKET FILTERING ENABLED
gregober 18:40:12 ~ -> iperf -c 1.2.3.4
–----------------------------------------------------------
Client connecting to 1.2.3.4, TCP port 5001
TCP window size: 129 KByte (default)

[ 3] local 192.168.1.199 port 53391 connected with 1.2.3.4 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.03 GBytes 882 Mbits/sec

This means that when firewall is enabled, performances of the firewall are reduced by 75%

I found this quite surprising because my hardware is very far from beeing saturated, It is in fact not impacted at all by these tests.

• I was wondering if this is normal ?

• Is there any settings I might optimize somewhere ?

• I have seen that em dirver from Intel is not so optimized, is this still the case ?

Any proposition on how to optimize that will be welcome.

jasonlitka

Not really sure what more you can expect from a single-core Atom.

gregober

@jasonlitka:

Not really sure what more you can expect from a single-core Atom.

What strikes me is the difference between filtering enable and filtering disabled… ??

Is the filtering process so intensive that It'll reduce by 75% the throughput ?

This figure is really impressive, I'll try to investigate the reasons.

wallabybob

@gregober:

I found this quite surprising because my hardware is very far from beeing saturated, It is in fact not impacted at all by these tests.

I presume by hardware here you mean the network interfaces and not the CPU. Is that correct?

I believe there are some em related sysctls that could be modified to attempt to reduce interrupt overhead.

If you enable jumbo frames all around you could use a much larger MTU which would also reduce interrupt overhead but this would probably not be of much use if you will have serious internet traffic.

I agree that the apparent overhead of packet filtering is "interesting". Is the CPU fully utilised when you perform that test? Does the N270 have hyperthreading? If so, does it make a difference to run single CPU?

What version of pfSense are you testing?

gregober

@wallabybob:

@gregober:

I found this quite surprising because my hardware is very far from beeing saturated, It is in fact not impacted at all by these tests.

I presume by hardware here you mean the network interfaces and not the CPU. Is that correct?

Yes this is correct, I mean the performance of the network interface.

@wallabybob:

I believe there are some em related sysctls that could be modified to attempt to reduce interrupt overhead.

Yes, It looks like you can tune some parameters for em interfaces, more details here http://www.freebsd.org/cgi/man.cgi?query=em&apropos=0&sektion=0&manpath=FreeBSD+7.3-RELEASE&format=html

Among all of these parameters I don't really know which one might be of interest…
Most of them seem to have correct default values?

hw.em.rxd
Number of receive descriptors allocated by the driver. The
default value is 256. The 82542 and 82543-based adapters can
handle up to 256 descriptors, while others can have up to 4096.

hw.em.txd
Number of transmit descriptors allocated by the driver. The
default value is 256. The 82542 and 82543-based adapters can
handle up to 256 descriptors, while others can have up to 4096.

hw.em.rx_int_delay
This value delays the generation of receive interrupts in units
of 1.024 microseconds. The default value is 0, since adapters
may hang with this feature being enabled.

hw.em.rx_abs_int_delay
If hw.em.rx_int_delay is non-zero, this tunable limits the maxi-
mum delay in which a receive interrupt is generated.

hw.em.tx_int_delay
This value delays the generation of transmit interrupts in units
of 1.024 microseconds. The default value is 64.

hw.em.tx_abs_int_delay
If hw.em.tx_int_delay is non-zero, this tunable limits the maxi-
mum delay in which a transmit interrupt is generated.

@wallabybob:

If you enable jumbo frames all around you could use a much larger MTU which would also reduce interrupt overhead but this would probably not be of much use if you will have serious internet traffic.

My goal is to try to discover why performance of the firewall is reduced by 75% when firewalling is enabled.
Most parameters should be kept at default value, including MTU, as my purpose is to have an accurate pictures of capacity of the firewall.

@wallabybob:

I agree that the apparent overhead of packet filtering is "interesting". Is the CPU fully utilised when you perform that test? Does the N270 have hyperthreading? If so, does it make a difference to run single CPU?

No, the CPU is very lightly used, I'll take time to make some more tests tomorow (CEST) and post them on this thread, I'll use advise (diagnostic) from JimP to try to give as much info as possible in order to understand what's going on precisely.

@wallabybob:

What version of pfSense are you testing?

1.2.3 from july 2010

wallabybob

@gregober:

No, the CPU is very lightly used,

On both tests? What are you using to deermine the "light" usage?

cmb

Filtering has vastly more overhead than simply routing, a drop along those lines is to be expected and those numbers are close to what I've seen on similar hardware.

Performance issue with filtering enabled

WITH PACKET FILTERING ENABLED gregober 18:24:15 ~ -> iperf -c 1.2.3.4 –---------------------------------------------------------- Client connecting to 1.2.3.4, TCP port 5001 TCP window size: 129 KByte (default)

WITHOUT PACKET FILTERING ENABLED gregober 18:40:12 ~ -> iperf -c 1.2.3.4 –---------------------------------------------------------- Client connecting to 1.2.3.4, TCP port 5001 TCP window size: 129 KByte (default)

WITH PACKET FILTERING ENABLED
gregober 18:24:15 ~ -> iperf -c 1.2.3.4
–----------------------------------------------------------
Client connecting to 1.2.3.4, TCP port 5001
TCP window size: 129 KByte (default)

WITHOUT PACKET FILTERING ENABLED
gregober 18:40:12 ~ -> iperf -c 1.2.3.4
–----------------------------------------------------------
Client connecting to 1.2.3.4, TCP port 5001
TCP window size: 129 KByte (default)