PfSense performance on an Atom D525 box (my experiences so far)

Rural

The situation: Schools. Low budget. Over-worked tech staff in need of better tools. Old Cisco 2821 routers are maxing out at 400Mb/s between VLANs/subnets. L3 switches are way out of our budget and don't offer good tools for traffic monitoring, something we sorely need. Looking at pfSense on fairly modest hardware (Supermicro SYS-5015A-EHF-D525, Atom D525, 4GB of RAM, Intel NIs on board, and two additional Intel NIs on a PCI-E card).

These boxes seem to be hitting a performance wall at about 640 Mb/s. Running IPerf between VLANs/subnets, using all kinds and number of workstations yields an aggregate throughput no higher than about 640 Mb/s. Although this isn't horrible, I was expecting more than 1 Gb/s (using LAGG) because of the throughput figures I've read in several vendors literature (ie. here).

My configuration is simple: LAN, OPT1, and OPT2 are all on different VLANs on top of a 3xLAGG. We're using Squidguard on top of Squid configured as a transparent proxy. Nothing but accept all rules on LAN, OPT1, and OPT2 for testing purposes. We are using aggregate links between switches, but the bulk of our testing is between workstations on a single switch.

One thing that concerns me is that I'm seeing the following printed out on the console after running IPerf tests:

interrupt storm detected on "irq258" throttling interrupt source

irq258 corresponds to the em2 interface (RX). Nothing else seems to be using it according to vmstat -i. To my untrained eye, it looks like the D525 just can't keep up.

I'm going to mess with the configuration a bit (disable Squid and Squidguard) to see if I can get better numbers. We also have a much higher powered box that we can do testing on as well. I'll update this thread if I discover anything interesting.

In the meantime, I'd be open to suggestions as to how to boost performance above 1 Gb/s.

stephenw10

Probably not what you want to hear but 640Mbps seems quite good to me.
It looks reasonable compared to Databeestje's results with an Atom D510 here.

I'm not sure how they are measuring 'throughput' on the site you linked to.
It does seem to tie in with this though. :-\

Is it actually maxing out the CPU?

Steve

Rural

Thanks. I wasn't expecting any miracles. Hoping, but not expecting.

Doing a simple Iperf between two hosts, top shows about 27% system 36% interrupt and gets me 665 Mb/s throughput. So the Atom is being taxed. Doing a bidirectional test (iperf -d) gets about 50% and 50%, but the aggregate throughput falls to 581 Mb/s (538 and 43.3 Mb/s). That lopsidedness is very interesting.

Mine might be one more data-point for somebody in the market for hardware to run pfSense on. I'm also calling out the advertisers claiming 1.5-1.6 Mb/s performance on a D525 based board with Intel hardware. I'd very much like to see their testing methodology. Maybe they were just bridging.

stephenw10

I suspect you are seeing a limit of the NIC or driver or both.
I also think you could probably get a total throughput approaching 1.5Gbps if you had, say, three WAN and three LAN interfaces and tried to max out all of them! Although just how you would do that with only one PCI slot….. ::)

Edit: I'm getting confused between threads. Your board has PCI-e. :-[

You could try enabling device polling. This isn't normally recommended, it will use all spare CPU cycles so the web GUI and everything else can become slow. It may give you better throughput though. I played around with it for a while but I don't need more bandwidth. I had to do some command line tweaking to get polling working.

You could try turning on or off some of the hardware offloading options that might be slowing something.

Steve

wallabybob

If I recall correctly, em devices can be configured with a non-zero interrupt timer causing interrupt requests to be delayed for the specified time. The delay can reduce overhead by reducing the number of interrupts and interrupt overheads because multiple frames can be processed on an interrupt.

Tikimotel

Wallabybob mentioned em device tweaks.

this is my : /boot/loader.conf.local

# Increase nmbclusters for Squid and intel
kern.ipc.nmbclusters="131072"

# Max. backlog size
kern.ipc.somaxconn="4096"

# On some systems HPET is almost 2 times faster than default ACPI-fast
# Useful on systems with lots of clock_gettime / gettimeofday calls
# See http://old.nabble.com/ACPI-fast-default-timecounter,-but-HPET-83--faster-td23248172.html
# After revision 222222 HPET became default: http://svnweb.freebsd.org/base?view=revision&revision=222222
kern.timecounter.hardware="HPET"

# Tweaks hardware
Coretemp_load="yes"
legal.intel_wpi.license_ack="1"
legal.intel_ipw.license_ack="1"

# Usefull if you are using Intel-Gigabit NIC
hw.em.rxd="4096"
hw.em.txd="4096"
hw.em.tx_int_delay="512"
hw.em.rx_int_delay="512"
hw.em.tx_abs_int_delay="1024"
hw.em.rx_abs_int_delay="1024"
hw.em.enable_msix="1"
hw.em.msix_queues="2"
hw.em.rx_process_limit="100"
hw.em.fc_setting="0"

I also use "Hardware Checksum Offloading" and "Hardware TCP Segmentation Offloading"
My iperf result on my intel dual nic PCI-X card is arround 230-245Mbit/s (pfsense –> client PC), but with almost no CPU load.

Rural

Thanks Tikimotel! I'll try your config (adding piece-by-piece) on Monday or Tuesday and get some numbers back here.

I'm curious how you enable hardware checksum and segmentation off-loading with pfSense. Searching for it now.

Would it be too much trouble to ask for a line-by-line explanation of your posted config where it doesn't exist already. I can guess what most of it means, but would hate to have guessed wrong.

@Tikimotel:

Wallabybob mentioned em device tweaks.

this is my : /boot/loader.conf.local

# Increase nmbclusters for Squid and intel
kern.ipc.nmbclusters="131072"

# Max. backlog size
kern.ipc.somaxconn="4096"

# On some systems HPET is almost 2 times faster than default ACPI-fast
# Useful on systems with lots of clock_gettime / gettimeofday calls
# See http://old.nabble.com/ACPI-fast-default-timecounter,-but-HPET-83--faster-td23248172.html
# After revision 222222 HPET became default: http://svnweb.freebsd.org/base?view=revision&revision=222222
kern.timecounter.hardware="HPET"

# Tweaks hardware
Coretemp_load="yes"
legal.intel_wpi.license_ack="1"
legal.intel_ipw.license_ack="1"

# Usefull if you are using Intel-Gigabit NIC
hw.em.rxd="4096"
hw.em.txd="4096"
hw.em.tx_int_delay="512"
hw.em.rx_int_delay="512"
hw.em.tx_abs_int_delay="1024"
hw.em.rx_abs_int_delay="1024"
hw.em.enable_msix="1"
hw.em.msix_queues="2"
hw.em.rx_process_limit="100"
hw.em.fc_setting="0"

I also use "Hardware Checksum Offloading" and "Hardware TCP Segmentation Offloading"
My iperf result on my intel dual nic PCI-X card is arround 230-245Mbit/s (pfsense –> client PC), but with almost no CPU load.

ptt

@Rural:

I'm curious how you enable hardware checksum and segmentation off-loading with pfSense. Searching for it now.

In: System –> Advanced - Networking Tab, you have that options (both enabled by default)

stephenw10

Bare in mind though that your results seem to show your CPU is not running at 100% so there is no need to offload calculations to the NIC. In fact if the NIC/driver is the bottle neck in your system it may be better to have your CPU doing those calculations.

Steve