Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Low throughput over LAGG with 1Gb clients

    Scheduled Pinned Locked Moved Hardware
    20 Posts 4 Posters 3.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      cursixx
      last edited by

      I'm stuck on a problem with throughput over a switch LAGG to pfSense to my cable modem over a LAGG with 1Gb clients but 10Gb clients are fine.

      Hardware Specs

      pfSense:
      Supermicro X10SLQ, i5-4590S, 8GB DDR3, mSATA SSD, 2x Intel I350-T4 Cards

      Switch:
      Cisco 3850-48P and SFP+ module

      Cable Modem:
      MB8600 with provisioning @ 1200Mbps/43Mbps "Spectrum 1gig plan"

      I have a LAGG for the WAN to the MB8600 LACP
      I have a LAGG for the LAN to the C3850 etherchannel

      C3850 etherchannel config

      HASH#
      port-channel load-balance src-dst-mixed-ip-port

      interface Port-channel1
      switchport mode trunk
      spanning-tree portfast disable

      interface GigabitEthernet1/0/47
      switchport mode trunk
      channel-group 1 mode active
      spanning-tree portfast disable

      interface GigabitEthernet1/0/48
      switchport mode trunk
      channel-group 1 mode active
      spanning-tree portfast disable

      pfSense is using LACP for the LAGG interfaces

      So this is where it gets interesting.
      With a 1Gb client connected to the C3850 and I run a speed test they get between 400-500Mbps
      With a 10Gb client connected to the C3850 and I run a speed test they get between 1100-1200Mbps

      Clearly I don't have a bottle neck with the switch, pfsense or the cable modem but it seems my 1Gb clients can't even reach full line speed. If I remove the LAGG between the switch and use just a 1Gb uplink I can get 940Mbps for 1Gb and 10Gb clients. Same goes for the modem, so if I remove the LAGG between pfsense and mb8600 but leave the LAGG from pfsense to the C3850 I get 940Mbps. This only happens when I have both LAGG interfaces active.

      So it gets even more interesting.
      I added a 10Gb broadcom card to the pfSense box and ran a DAC back to the C3850 and get the same results as the two LAGG interfaces did. I tried fiber modules with the same results too.

      I have tried all loadblance hash the C3850, src,dst mac, ip, ip+ port mixed get the same results with any of them.

      I have tried Broadcom quad port cards in place of the I350-T4 with the same results.

      I just did a completely clean install of pfSesne and still have the same problem. I have tried the latest stable and the latest dev build with the same results. I have tried tuning the Intel and Broadcom cards with no success

      Any ideas? Have you ran into this problem before or something similar with LAGG interfaces on pfSense? I can provide more details and i'm sure I left something out. Thanks

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Hmm, well that's curious!

        Seems like it must be something like a packet size issue from the connected client. If you connect the switch to pfSense with a single 1Gb link it resizes somewhere before it hit's pfSense or the TCP packet size is correctly negotiated for the complete route.

        What happens if you leave the links as LAGG but pull one if the connections so it only has a single 1Gb link?.

        If it is some catastophic fragmentation issue a pcap on the traffic will show it.

        How are you actually testing the speed?

        Steve

        1 Reply Last reply Reply Quote 0
        • johnpozJ
          johnpoz LAYER 8 Global Moderator
          last edited by

          Also curious how exactly your testing the speed.. Have seen issues before where the client just isn't doing correct window size to be able to saturate the connection line speed.

          I take it since your seeing more than 1gig that your doing more than 1 session so your getting sessions over both paths in the lagg for that test.

          An intelligent man is sometimes forced to be drunk to spend time with his fools
          If you get confused: Listen to the Music Play
          Please don't Chat/PM me for help, unless mod related
          SG-4860 24.11 | Lab VMs 2.8, 24.11

          1 Reply Last reply Reply Quote 0
          • C
            cursixx
            last edited by

            I'm using online speed tests. dslreports, google fiber speed test, speedtest.net and the speedtest.net windows application. real world downloads from usenet and other high speed sources. With a single 1Gb client i'm limited to about 500Mbps max and 200Mbps on the low side. In past testing if i used multiple 1Gb clients I could push the speed past 940Mbps but it took 3-4 devices to do it and even with that it was more of a peak speed and would drop off quickly.

            I have also tested with iperf3 in the past and was able to reach 940Mbps with a 1Gb link but I don't have a good way to test with passing the traffic across the modem. In that test I had the two LAGG in place but used different vlans on the Cisco "WAN and LAN" and attached a 1Gb client on the WAN VLAN side and ran the test to an inside device 1GB and 10Gb.

            I have attached a 1Gb device to another interface on the pfsense box and speeds are fine.

            @stephenw10

            "What happens if you leave the links as LAGG but pull one if the connections so it only has a single 1Gb link?"

            Good idea and I have not tried that, when i get home i'll test that on the WAN and LAN sides of the LAGG.

            @johnpoz

            "I take it since your seeing more than 1gig that your doing more than 1 session so your getting sessions over both paths in the lagg for that test."

            Best I can tell yes it seems to be load balancing sessions and the interfaces are clean on the C3850 and pfSense but i'm not sure of the balancing across the links in a percentage. In past testing I even tried to setup a multi WAN with the MB8600 because spectrum allows two ip addresses so this took the LAGG out of it from pfSense to the modem and I still had the same issue. In the bandwidth logs on pfSense it showed each link was balanced 50/50.

            @stephenw10

            "If it is some catastophic fragmentation issue a pcap on the traffic will show it."

            Would it be best to packet capture at the device or do a port span of the etherchannel? or can I do this in pfSense?

            1 Reply Last reply Reply Quote 0
            • C
              cursixx
              last edited by

              Here are two pcap from pfsense

              1Gb client. speed test was 300mbps
              1_1550765155317_packetcapture-1Gb-client.pcap

              10Gb client speed test was 1165Mbps
              0_1550765155316_packetcapture-10Gb-client.pcap

              1 Reply Last reply Reply Quote 0
              • johnpozJ
                johnpoz LAYER 8 Global Moderator
                last edited by johnpoz

                there is no speed tests in those pcaps..

                Looks like you left it at the default 100 packets

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 24.11 | Lab VMs 2.8, 24.11

                1 Reply Last reply Reply Quote 0
                • C
                  cursixx
                  last edited by

                  Here are better captures. looks like a lot of DUP

                  https://drive.google.com/open?id=1IVUkRSVoXe4fKdFwdkZvfAxNXTJWknAE

                  1 Reply Last reply Reply Quote 0
                  • C
                    cursixx
                    last edited by

                    Added two new pcap files to the drive share. Switched the LAN LAGG over to a 10Gb unlink and got the same results.

                    @stephenw10 I took down a member of the LAGG on the LAN side and performance was restored for 1Gb clients. Did the same thing for the WAN LAGG and performance was restored as well.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, well that's interesting.

                      So if either side is reduced to one link but still as a lagg it works as expected?

                      Try running ifconfig -v lagg0 with both links in place and then with one.

                      Steve

                      C 1 Reply Last reply Reply Quote 0
                      • johnpozJ
                        johnpoz LAYER 8 Global Moderator
                        last edited by johnpoz

                        well with the amount of dupes your seeing in the sniff there is no way your going to ever see full speed.

                        You have a sniff of 71k packets, where 12k of that is listed as dupes.. going to be hard to see full wire speed ;)

                        An intelligent man is sometimes forced to be drunk to spend time with his fools
                        If you get confused: Listen to the Music Play
                        Please don't Chat/PM me for help, unless mod related
                        SG-4860 24.11 | Lab VMs 2.8, 24.11

                        1 Reply Last reply Reply Quote 0
                        • C
                          cursixx @stephenw10
                          last edited by

                          @stephenw10 Correct with one member down on any of the LAGG interfaces I see normal performance but I changed the LAGG on the LAN side out for a 10Gb uplink and I see the same problem.

                          WAN LAGG and 1Gb LAN uplink = normal performance
                          WAN LAGG and 10Gb LAN unlink = low performance
                          WAN and LAN LAGG = low performance
                          LAN and WAN LAGG with a WAN LAGG member down = normal performance
                          LAN and WAN LAGG with a LAN LAGG member down = normal performance
                          WAN 1Gb and 1Gb LAN = normal performance
                          WAN 1Gb and 10Gb LAN = normal performance

                          @johnpoz Do you think this is something related to how pfSense is handling the traffic or even FreeBSD? I know a dev build of 2.5 will be released soon so I will try that once it goes public. Long shot I know

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Mmm, dupes on the 10G client test also though unless that was overlapping.

                            The fact that removing either lagg allows full speed points at some size negotiation failure to me.

                            Steve

                            C 1 Reply Last reply Reply Quote 0
                            • C
                              cursixx @stephenw10
                              last edited by

                              @stephenw10 I did see some dupes on the 10Gb but was that on the WAN captures and if so I could see some overlapping because it was all WAN traffic. 10Gb performance seems fine, easily hitting 1100-1200Mbps on speed tests.

                              iperf from inside the network checks out fine between 1Gb and 10Gb clients.

                              1 Reply Last reply Reply Quote 0
                              • C
                                cursixx
                                last edited by

                                Messed around with it a little more tonight and I have switched back to 1Gb on the WAN and LAN. Turned off LACP on the MB8600. Get back the basics approach. and..... performance is good 940Mbps easily but here is an interesting find. I'm still seeing a lot dupes in the captures even with both interfaces running at a gig.

                                Could the dupes be related to packet loss over the cable lines to the CMTS? The reason I ask is because i'm seeing some loss over the modem due to signal issues or something "Spectrum is working on it" I'm only seeing 1-2% on the upstream so i'm not sure it could cause that many dupes? downstream has been pretty clean without packet loss.

                                1 Reply Last reply Reply Quote 0
                                • johnpozJ
                                  johnpoz LAYER 8 Global Moderator
                                  last edited by

                                  There is dupes and then there is 17% of the whole trace being dupes..

                                  An intelligent man is sometimes forced to be drunk to spend time with his fools
                                  If you get confused: Listen to the Music Play
                                  Please don't Chat/PM me for help, unless mod related
                                  SG-4860 24.11 | Lab VMs 2.8, 24.11

                                  1 Reply Last reply Reply Quote 0
                                  • C
                                    cursixx
                                    last edited by

                                    I'm going to work on tracking down the source of the errors. Pretty sure I have the Cisco switch ruled out so far.

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      cursixx
                                      last edited by

                                      I have some updated info.

                                      I created a new vlan on the cisco switch and setup two etherchannels. One for the pfsense wan and one for the modem and put them both on the that vlan. basically a "WAN" bridge between the two devices I also added a gig port to the new vlan as well. This way I can test speeds between the modem and/or pfsense WAN side. My test computer picked up a public ip on the WAN vlan and ran some iperf tests for ingress and egress to an inside client. That produced 970Mbps both ways with no dupes or other problems in the packet capture. So I feel this rules out any issues with my inside network and pfsense.

                                      Running a speed test across the etherchannel to the cable modem from a 1Gb connection I was able to reproduce the exact same issue I had with low throughput. I even used new cables and tested them with my fluke to be sure. So it seems like this is an issue with modem+lacp or firmware. If it was a modem firmware problem I'm SOL anyways. I'm little burned out on it right now so I'll come back in a few days and keep working at it. I have Mikrotik a could try, maybe a windows computer with a LAGG directly connected to the modem.

                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        cursixx
                                        last edited by

                                        The issue ended up being the Cisco switch output drops due to the higher interface speed. Buffer increase and QOS rule fixed it switch side.

                                        https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html

                                        and

                                        https://community.cisco.com/t5/switching/catalyst-3850-high-total-output-drops-and-output-errors/td-p/2896553

                                        Hope this helps someone else

                                        1 Reply Last reply Reply Quote 3
                                        • DerelictD
                                          Derelict LAYER 8 Netgate
                                          last edited by

                                          Amazing.

                                          Chattanooga, Tennessee, USA
                                          A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                          DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                          Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            Wow. Fun.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.