Incredibly Poor Performance - AMD K10 & Intel i350-T4 (igb driver)

  • Hi Everyone,

    I'm running pfSense 2.4.4-RELEASE-p1 on an older system with the following specs:

    • AMD Athlon II X2 255 (3.2GHz Dual Core)
    • 8GB DDR3-1333
    • Intel I350-T4 Quad Port NIC

    I'm not running any sort of packet inspection and I've made sure that powerd is set to maximum performance. The CPU clock is always at 3200MHz. The issue I'm having is insanely high CPU loads under relatively modest bandwidths, to the point that I can only push ~70mbit over a 1Gbit interface with iperf -s running on the pfsense box.

    I've tried disabling different hardware offloads but this hasn't changed anything. According to the documentation of the igb(4) driver only LRO should be disabled because it's not compatible with packet forwarding. So currently I'm running with TSO and Checksum offloading enabled.

    A quick overview of my setup:
    igb0: WAN
    igb1.1: LAN (default VLAN)
    igb1.2: PRIV (VLAN 10)
    igb2: SERVE

    My WAN connection is capable of a consistent 110/5. Initially when running a speedtest from a linux server behind igb2 I was only able to get 80/5 and top -aSH would show 100% WCPU assigned to kernel{igb0 : que 0}. The overall usage was approx 65% System and 25% interrupts.

    I had a read about using hw.igb.num_queues=1 to limit the number of queues and stop the queues contending for cores. This seems to have helped in that I can now get a consistent 110/5 on speedtests and the system load has disappeared completely, however I still have 40% interrupt load at these speeds.

    The most telling tests I've done are iperf tests between pfsense and the directly connected server. Running iperf with pfsense as the server results in 70mbit over a 1Gbit interface - top -aSH is showing 70% system CPU usage and 2.5% interrupt usage. The process kernel{igb2: que} is using 100% WCPU, iperf -s is also using 50% WCPU. It wouldn't surprise me if running the server on the older pfsense box was slower than when running it on my new server, but 70mbit seems very wrong.

    If the other server is the iperf server then I get results between 500 and 800mbit with 60% system usage in top and 20% interrupts at the start of the test, dropping to 5% by the end. All the system usage is again kernel{igb2: que} with about 80% WCPU, the iperf -c process is down to 15% WCPU.

    It seems as though the igb driver in this case is happy to send data (still not as fast as it should) but really dislikes receiving it. The other thing I find confusing is that during the speedtests where I'm able to get 110/5 on a server behind the pfsense box the pfsense box is receiving all that traffic, running through firewall rules and then routing it out, but isn't locking up a thread with kernel{igb0: que} any more.

    The interrupt load still seems high and I'll admit I haven't yet tried forcing MSI rather than MSI-X interrupts to see if that changes anything, but to me this seems like some sort of a thread locking issue, I just don't know enough about the internals of the igb driver to know how to check.

    At this point my main concern is that I can't push data to the pfsense box faster than about 70mbit, which is a huge problem considering I'm running a backup ZFS pool in it. I know this hardware is relatively old these days but surely 2010 hardware should be perfectly capable of receiving (not routing) 1Gbit of data.

    I hope someone might be able to shed some light or point me in the direction of some more tests I can run to help narrow down what's causing this bottle neck. For reference I've always been a linux guy, this is my first venture into freebsd. Comfortable with unix style terminals, just don't know all the equivelent commands to Linux.

  • Netgate Administrator

    pfSense will perform better routing traffic than receiving it, that's what it's meant to do.

    But that it truly terrible performance! Something is very badly wrong there.
    iperf should not be using 50% of a core to push 70Mbps. Are you sure it's actually running at 3.2GHz? It seems more like 300MHz...

    Try enabling powerd if you have not or disabling it if you have. Sys > Adv > Misc.


  • So I managed to solve this with a BIOS update in the end. Part of the update process was clearing the CMOS so I had to change a bunch of settings back. The only setting I know was different that I chose to leave default this time was the ACPI HPET Table option. Previously it was enabled, now it's disabled. I don't really see how this would affect performance to the extent I was seeing, so I suspect it was something in the BIOS updates that solved the issue. Also MSI interrupts are definitely slower than MSI-X, they took me down to 50mbit.

    For the next sod that goes googling this, the Motherboard was an ASRock N68-S3 UCC. Initial BIOS version was 1.4, updated to 1.6. I'm now running all defaults except that I've restricted my queues to 1 per NIC with the following in /boot/loader.conf.local


    I'm doing this because I have a dual core CPU and 4 NICs, so I'm trying to reduce the amount of context switching. It may work fine as a default, but after the nightmare of getting it to this point I'm just going to leave it be.

Log in to reply