LACP not balancing
-
I have setup a LAG between my router and switch but it doesn't seem to be balancing correctly, heavy traffic always ends up sharing the same NIC.
What I've tried so far for testing is running two iperf3 servers on pfSense and connecting to them from the same client, then from two different clients.
I also tried the reverse, connecting to two different iperf3 servers on the LAN from pfSense.
It always ends up the same way, sending all traffic down a single NIC, no matter which direction I do the test in.
I turned on debugging but nothing jumps out at me:
Nov 26 00:28:45 Router kernel: actor=(0001,08-BD-43-75-2B-45,03E8,0080,0002)
Nov 26 00:28:45 Router kernel: igb4: lacpdu receive
Nov 26 00:28:45 Router kernel: maxdelay=0
Nov 26 00:28:45 Router kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 26 00:28:45 Router kernel: partner=(8000,40-62-31-02-D2-B9,018B,8000,0005)
Nov 26 00:28:46 Router kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 26 00:28:46 Router kernel: actor=(0001,08-BD-43-75-2B-45,03E8,0080,0001)
Nov 26 00:28:46 Router kernel: igb5: lacpdu receive
Nov 26 00:28:46 Router kernel: maxdelay=0
Nov 26 00:28:46 Router kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Nov 26 00:28:46 Router kernel: partner=(8000,40-62-31-02-D2-B9,018B,8000,0006)I understand balancing tends not to work to a single client on the LAN due to how the hashing works, but why is it not working to different clients?
Its not like it sticks to the same NIC all the time either, it will sometimes switch, but still the bulk of the traffic goes down a single NIC so the combined speed of both tests is equal to a single Gigabit port.
Is iperf3 unsuitable for this test and I should just wait until I have two WANs fast enough to need both links and see what happens then? It does look like actual downloads are possibly balancing but it could also just be a lag in the SNMP reporting I'm doing from the switch giving that illusion as it bounces between both ports rather than using them concurrently.
-
What balancing algorythm is used?
Is it matches on switch and router?
May be try "src-dst-ip" algo?c3750x-24-core(config)#port-channel load-balance ? dst-ip Dst IP Addr dst-mac Dst Mac Addr src-dst-ip Src XOR Dst IP Addr src-dst-mac Src XOR Dst Mac Addr src-ip Src IP Addr src-mac Src Mac Addr
-
Yes it depends what the hash you are using is. Though if you tested using different hosts for both servers and clients you would expect at least some variation.
If you look at the lagg(4) man page there is a sysctl you can set to vary how the hash is generated:
The loadbalance and lacp modes will use the RSS hash from the network card if available to avoid computing one, this may give poor traffic dis- tribution if the hash is invalid or uses less of the protocol header in- formation. Local hash computation can be forced per interface by setting the -use_flowid ifconfig(8) flag. The default for new interfaces is set via the net.link.lagg.default_use_flowid
Steve
-
Thanks for the suggestions. I think I have to focus on pfSense specifically here as according to Netgear "Smart Managed Switches offer fixed Layer 2 (MAC) destination parsing only for packets entering the LAG".
I can confirm that "ifconfig lagg0 -use_flowid 1 lagghash l3,l4" causes the traffic to randomly split across both interfaces, so that does seem to work. I then of course had to change my QoS setting for the LAN from 940Mbit to double that and behold, it works.
Its a shame it will still randomly assign both flows to the same interface, but I guess that's the limitation of using a lagg? Or is there something I can tweak to make that less likely to happen? would roundrobin distribute evenly or should I stick to LACP?
-
I would definitely stick to LACP since you have it available. It generally gives better results than of the other lagg types.
I'm not aware of any way to force a more even distribution though. Given enough connections it will average across them.Steve
-
@stephenw10 Pretty much as expected then. Hopefully some new cheap appliances will turn up with 2.5Gbit NICs, although I'm not sure if FreeBSD supports any yet.
-
It supports several. Intel i225, Realtek and Aquantia at least. There are threads here detailing experience running each.
Steve
-
@stephenw10 Does it support the USB versions? Although my current box only has USB 2.0 ports so not immediately helpful.
I did buy a newer appliance with USB 3.0 but I wasn't sure an i5-8250U would be up to PPPoE at 1Gbit as it only clocks to 1.6Ghz on pfSense, yet oddly will boost up to 3.4Ghz in Linux (TDP is unlocked in BIOS). I also use OpenVPN (though I believe 300Mbit is about all you can expect from a single client anyway due to the huge resource cost at the other end) and was hoping to keep QoS enabled to ensure VoIP doesn't get swamped.
My current appliance is an i5 7200U that runs at 2.4Ghz which AFAIK is what I will need to push Gigabit PPPoE.
-
I don't believe pfSense supports any 2.5G USB NICs yet but FreeBSD does. I'd be reluctant to use in anyway though.
-
@stephenw10 I'm not a fan either, at least not for a router where you want the absolute lowest overhead/latency possible.
Though for a home network, it may be worth an experiment if/when its supported.
-
Latency is not what I would worry about. It's the long history of USB NICs failing in interesting ways. And the fact it's possible to unplug it accidentally and the consequences of doing so.
-
@stephenw10 They certainly have a lot more scope for overheating, though personally I've only had one fail on me and it was a dirt cheap model off eBay.
I have an Aquantia model running off that i5-8250U appliance at the moment as I decided if I weren't going to replace my router with it, might as well replace the old router I was using as a switch with a Linux box with the ports bridged and ~3.6Gbit uplink over that adapter.