Extreme load when testing a LAGG on a specific NIC

whosmatt

I have two NICs in my system, both are igb but different chipsets. One is four 82580 ports on x4 PCIe lanes. One is two 82576 ports on x1 PCIe lane. I picked up the x1 card because I plan on replacing the 82580 NIC with a 10Gbps card.

For the past year or so I've been running a 2 port LAGG with LACP to a Dell Powerconnect managed switch. That hosts my VLANs and all my inside interfaces. The WAN has its own port.

When using the 82580 interfaces in the LAGG, or 1 82576 and 1 82580, I get the expected results. I fire up a bunch of iperf3 client-server pairs and just load it up until I'm positive both NICs are being used. I can get 2Gbps throughput aggregate no problem, as I'd expect with optimal loading of the ports, and the system load seems minimal.

However, if both interfaces are on the 82576 card, the throughput is capped at about 1.3Gbps. That's not the odd part though. When I get it loaded to that point, load averages and CPU usage seem to go through the roof. I'm testing on the bench so I only have a photo of the console to show the load, but it's nuts. We're talking load averages in the 20s or higher. The CPU consumption is by [kernel{if_io_tqg_0}] and [kernel{if_io_tqg_1}].

Is there any tuning I can do? I have another 2 port x1 card that I tried (Broadcom chipset) that my motherboard didn't like (like wouldn't even POST) so that's why I'm using this particular one. I don't really need crazy throughput here since my plan is to move most stuff over to the 10Gbps side but I'd really like to figure out what is going on. You know, for science.

EDIT: I also have a Realtek NIC on the mobo that can go into the LAGG just fine and get the full throughput, albeit with the expected CPU interrupt overhead. It's just when both LAGG ports are on the 82576 card.

A Former User

because I plan on replacing the 82580 NIC with a
10Gbps card.

Intel X520-DA2 should be fine.

Is there any tuning I can do?

Set all parts are on jumbo frame size 90xx
Set or activate or deactivate TSO and/or LSO
mbuffsize increasing to 1000000 higher or lower
amount of queues to high up or lower down

Dobby

stephenw10

Hmm, that is weird. What PCIe version is it? If it's v1 I could imagine it might be saturating the PCIe lane and introducing some huge overhead to deal with that. Maybe.

Try running without the LAGG. Test each port on the 82576 NIC to saturation and see if you can replicate it. Maybe test both ports but outside the LAGG if you can.

Steve

whosmatt

@stephenw10 said in Extreme load when testing a LAGG on a specific NIC:

Hmm, that is weird. What PCIe version is it?

The NIC chipset is PCIe 2.0 and the motherboard supports up to 3.0.

Is there a way to tell what the actual version in use is?

Edit: Figured it out. pciconf -lvc igb0 as an example.

It's only using 2.5GT/s despite being PCIe 2.0. In fact, Intel's ark page for this chipset bears this out, stating that the interface is PCIe v2.0 (2.5 GT/s). So

Guess we know why now. Good news is moving my main LAN interface off onto the 10Gbps card has alleviated any practical recurrence of this issue. Can I reproduce it now? If I try really hard. Will any real-world conditions in my (home) network actually cause this condition. Highly unlikely.

Thanks for the help.

stephenw10

Hmm, some low level incompatibility perhaps? BIOS setting?

That does start to look likely though.

whosmatt

@stephenw10

No, I think it's just a limitation of the NIC chipset. Here's the Intel ARK page; clearly states 2.5GT/s.

stephenw10

Ah, yes. Interesting.

whosmatt

TL;DR is read the fine print. I knew that PCIe 2.0 upped the transfer rate to 5GT/s but was not aware that a card or chipset could meet the spec and still run at the speed of the former bus spec, in this case 2.5GT/s. I also learned that I can tell what PCIe revision and number of lanes my cards expect and are actually given using pciconf.

Here's output from asking my TrueNAS box about one of the HBAs.

What it says is that I'm running PCIe 2 with 5GT/s but that the card expects 8 lanes and has only 4.

    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 max read 512
                 link x4(x8) speed 5.0(5.0) ASPM disabled(L0s/L1)

Here's output from the offending NIC in pfSense:

    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
                 max read 512
                 link x1(x4) speed 2.5(2.5) ASPM L1(L0s/L1)

The moar you know....

stephenw10

Yeah, I think that is what you're hitting there but I will say the values reported by pciconf do not always reflect reality. For example the 10G NICs in C3K are shown as being connected via x1 at 2.5GT/s but are actually capable of far more than the 2Gbps that would imply. I suspect that's because they are in the SoC and not a physical card in a slot.

Steve

whosmatt

@stephenw10 Thanks. There's also some Intel NIC chipsets that are, on the ARK page, specified as proprietary instead of being on the PCIe or some other standard bus. I assume that means they are part of SoC or the motherboard chipset. Curious how pciconf handles those. I don't have FreeBSD specific info but 82579LM is an example.