Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    LACP not balancing

    General pfSense Questions
    3
    12
    1.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      Alex Atkin UK
      last edited by Alex Atkin UK

      I have setup a LAG between my router and switch but it doesn't seem to be balancing correctly, heavy traffic always ends up sharing the same NIC.

      What I've tried so far for testing is running two iperf3 servers on pfSense and connecting to them from the same client, then from two different clients.

      I also tried the reverse, connecting to two different iperf3 servers on the LAN from pfSense.

      It always ends up the same way, sending all traffic down a single NIC, no matter which direction I do the test in.

      I turned on debugging but nothing jumps out at me:

      Nov 26 00:28:45 Router kernel: actor=(0001,08-BD-43-75-2B-45,03E8,0080,0002)
      Nov 26 00:28:45 Router kernel: igb4: lacpdu receive
      Nov 26 00:28:45 Router kernel: maxdelay=0
      Nov 26 00:28:45 Router kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
      Nov 26 00:28:45 Router kernel: partner=(8000,40-62-31-02-D2-B9,018B,8000,0005)
      Nov 26 00:28:46 Router kernel: actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
      Nov 26 00:28:46 Router kernel: actor=(0001,08-BD-43-75-2B-45,03E8,0080,0001)
      Nov 26 00:28:46 Router kernel: igb5: lacpdu receive
      Nov 26 00:28:46 Router kernel: maxdelay=0
      Nov 26 00:28:46 Router kernel: partner.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
      Nov 26 00:28:46 Router kernel: partner=(8000,40-62-31-02-D2-B9,018B,8000,0006)

      I understand balancing tends not to work to a single client on the LAN due to how the hashing works, but why is it not working to different clients?

      Its not like it sticks to the same NIC all the time either, it will sometimes switch, but still the bulk of the traffic goes down a single NIC so the combined speed of both tests is equal to a single Gigabit port.

      Is iperf3 unsuitable for this test and I should just wait until I have two WANs fast enough to need both links and see what happens then? It does look like actual downloads are possibly balancing but it could also just be a lag in the SNMP reporting I'm doing from the switch giving that illusion as it bounces between both ports rather than using them concurrently.

      1 Reply Last reply Reply Quote 0
      • P
        pukoid
        last edited by

        What balancing algorythm is used?
        Is it matches on switch and router?
        May be try "src-dst-ip" algo?

        c3750x-24-core(config)#port-channel load-balance ?
          dst-ip       Dst IP Addr
          dst-mac      Dst Mac Addr
          src-dst-ip   Src XOR Dst IP Addr
          src-dst-mac  Src XOR Dst Mac Addr
          src-ip       Src IP Addr
          src-mac      Src Mac Addr
        
        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Yes it depends what the hash you are using is. Though if you tested using different hosts for both servers and clients you would expect at least some variation.

          If you look at the lagg(4) man page there is a sysctl you can set to vary how the hash is generated:

               The loadbalance and lacp modes will use the RSS hash from the network
               card if available to avoid	computing one, this may	give poor traffic dis-
               tribution if the hash is invalid or uses less of the protocol header in-
               formation.	 Local hash computation	can be forced per interface by setting
               the -use_flowid ifconfig(8) flag.	The default for	new interfaces is set
               via the net.link.lagg.default_use_flowid
          

          Steve

          A 1 Reply Last reply Reply Quote 0
          • A
            Alex Atkin UK @stephenw10
            last edited by Alex Atkin UK

            @pukoid
            @stephenw10

            Thanks for the suggestions. I think I have to focus on pfSense specifically here as according to Netgear "Smart Managed Switches offer fixed Layer 2 (MAC) destination parsing only for packets entering the LAG".

            I can confirm that "ifconfig lagg0 -use_flowid 1 lagghash l3,l4" causes the traffic to randomly split across both interfaces, so that does seem to work. I then of course had to change my QoS setting for the LAN from 940Mbit to double that and behold, it works.

            Its a shame it will still randomly assign both flows to the same interface, but I guess that's the limitation of using a lagg? Or is there something I can tweak to make that less likely to happen? would roundrobin distribute evenly or should I stick to LACP?

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              I would definitely stick to LACP since you have it available. It generally gives better results than of the other lagg types.
              I'm not aware of any way to force a more even distribution though. Given enough connections it will average across them.

              Steve

              A 1 Reply Last reply Reply Quote 0
              • A
                Alex Atkin UK @stephenw10
                last edited by

                @stephenw10 Pretty much as expected then. Hopefully some new cheap appliances will turn up with 2.5Gbit NICs, although I'm not sure if FreeBSD supports any yet.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  It supports several. Intel i225, Realtek and Aquantia at least. There are threads here detailing experience running each.

                  Steve

                  A 1 Reply Last reply Reply Quote 0
                  • A
                    Alex Atkin UK @stephenw10
                    last edited by Alex Atkin UK

                    @stephenw10 Does it support the USB versions? Although my current box only has USB 2.0 ports so not immediately helpful.

                    I did buy a newer appliance with USB 3.0 but I wasn't sure an i5-8250U would be up to PPPoE at 1Gbit as it only clocks to 1.6Ghz on pfSense, yet oddly will boost up to 3.4Ghz in Linux (TDP is unlocked in BIOS). I also use OpenVPN (though I believe 300Mbit is about all you can expect from a single client anyway due to the huge resource cost at the other end) and was hoping to keep QoS enabled to ensure VoIP doesn't get swamped.

                    My current appliance is an i5 7200U that runs at 2.4Ghz which AFAIK is what I will need to push Gigabit PPPoE.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      I don't believe pfSense supports any 2.5G USB NICs yet but FreeBSD does. I'd be reluctant to use in anyway though.

                      A 1 Reply Last reply Reply Quote 0
                      • A
                        Alex Atkin UK @stephenw10
                        last edited by

                        @stephenw10 I'm not a fan either, at least not for a router where you want the absolute lowest overhead/latency possible.

                        Though for a home network, it may be worth an experiment if/when its supported.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Latency is not what I would worry about. It's the long history of USB NICs failing in interesting ways. And the fact it's possible to unplug it accidentally and the consequences of doing so.

                          A 1 Reply Last reply Reply Quote 0
                          • A
                            Alex Atkin UK @stephenw10
                            last edited by

                            @stephenw10 They certainly have a lot more scope for overheating, though personally I've only had one fail on me and it was a dirt cheap model off eBay.

                            I have an Aquantia model running off that i5-8250U appliance at the moment as I decided if I weren't going to replace my router with it, might as well replace the old router I was using as a switch with a Linux box with the ports bridged and ~3.6Gbit uplink over that adapter.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.