Underwhelmed by inter-subnet routing and LAGG performance

Rural

The initial reason I started playing with pfSense is that our aging Cisco 2821 routers topped out at about 400Mbps between VLANs/sub-nets. So I slapped pfSense on a Supermicro Atom D525 server and started playing around. Quickly came to the conclusion that even if the performance were identical we would use it. My saga is documented here on the forums (1, 2).

So fast forward a few months and I have some time to test the inter-VLAN/subnet performance of pfSense on decent hardware. This is a rack-mount box running a fairly current (but inexpensive) E5506 Intel CPU. Four GigE Intel NICs (two built-in, two through an add-on card). With a 3xGigE LAGG we've been doing IPerfs between VLANs using multiple workstations. We've tried modern and older workstations, but either way we see about 1500Mbps (plus-or-minus 100Mbps) total throughput. I'm not expecting close to the theoretical throughput of 3Mbps, but am expecting a bit higher than 1.5Gbps.

The CPU doesn't seem to be the bottle-neck as top shows less than half of one of the four cores being utilised.

I'm also tempted to blame the D-Link DGS-1210-48 switches. Although they support LACP, I know that if their CPU is involved in any way, it will be the problem.

Are my expectations wonky? Has anyone else had experience using LAGG under pfSense and willing to share what sort of performance they saw?

wallabybob

What iperf parameters have you used?

It seems you check CPU utilisation of pfSense but did you check CPU utilisation of client and server. I expect forwarding a packet (pfSense) will require far less CPU than protocol handling and transferring a packet between kernel and user memory (iperf client and server).

idmud

Some nics can consume a lot of interrupt time on a specific core. I've seen it happen in higher throughput environments (30-40Gbit networks) but a funky driver could cause it at 1Gbit as well.

Do the tests again and run top on your firewall, hit P and S when running to show how much of each core is used. If one core hits 100% then you know the driver isn't spreading its load between the cores and bottlenecking things.

cmb

I've seen LACP at 3 Gbps without maxing out the hardware on similar hardware to what you're using there. You'll never get more than 1 Gbps between one particular source and destination as it's balanced by MAC. With 3-4 hosts you should be able to get near 3 Gbps. Though I've only tested with much better switches than a DLink (HP and Cisco), not sure what caveats that switch may introduce, or if LACP even functions properly on it.

Rural

Very interesting comments.

wallabybob, I don't think that iperf is the bottleneck. We did several tests running iperf as both the client and the server on machines and were able to get performance that was very high. I don't have the numbers handy from the workstations that we were testing on, but I was just able to get 2.43Gbps running both the iperf server and client on an old Atom 230 server that I have handy. We also did the test with the workstations on the same sub-net/VLAN, taking the routing performance of the pfSense box out of the equation, and were able to get much better numbers.

idmud, thanks for the suggestion. I'll give that a try when we have a spare moment.

cmb, I understand the issue with a single source-destination pair and LACP. In all cases we were using at least three pairs of machines, with several tests using eight pair. Your point about the switches is well-taken and probably spot-on. I may PM you regarding suggestions for better switches. (I have some money left in a budget for experimentation, only a month to make use of it, and switches had my attention anyway.)