Low throughput over LAGG with 1Gb clients
-
there is no speed tests in those pcaps..
Looks like you left it at the default 100 packets
-
Here are better captures. looks like a lot of DUP
https://drive.google.com/open?id=1IVUkRSVoXe4fKdFwdkZvfAxNXTJWknAE
-
Added two new pcap files to the drive share. Switched the LAN LAGG over to a 10Gb unlink and got the same results.
@stephenw10 I took down a member of the LAGG on the LAN side and performance was restored for 1Gb clients. Did the same thing for the WAN LAGG and performance was restored as well.
-
Hmm, well that's interesting.
So if either side is reduced to one link but still as a lagg it works as expected?
Try running
ifconfig -v lagg0
with both links in place and then with one.Steve
-
well with the amount of dupes your seeing in the sniff there is no way your going to ever see full speed.
You have a sniff of 71k packets, where 12k of that is listed as dupes.. going to be hard to see full wire speed ;)
-
@stephenw10 Correct with one member down on any of the LAGG interfaces I see normal performance but I changed the LAGG on the LAN side out for a 10Gb uplink and I see the same problem.
WAN LAGG and 1Gb LAN uplink = normal performance
WAN LAGG and 10Gb LAN unlink = low performance
WAN and LAN LAGG = low performance
LAN and WAN LAGG with a WAN LAGG member down = normal performance
LAN and WAN LAGG with a LAN LAGG member down = normal performance
WAN 1Gb and 1Gb LAN = normal performance
WAN 1Gb and 10Gb LAN = normal performance@johnpoz Do you think this is something related to how pfSense is handling the traffic or even FreeBSD? I know a dev build of 2.5 will be released soon so I will try that once it goes public. Long shot I know
-
Mmm, dupes on the 10G client test also though unless that was overlapping.
The fact that removing either lagg allows full speed points at some size negotiation failure to me.
Steve
-
@stephenw10 I did see some dupes on the 10Gb but was that on the WAN captures and if so I could see some overlapping because it was all WAN traffic. 10Gb performance seems fine, easily hitting 1100-1200Mbps on speed tests.
iperf from inside the network checks out fine between 1Gb and 10Gb clients.
-
Messed around with it a little more tonight and I have switched back to 1Gb on the WAN and LAN. Turned off LACP on the MB8600. Get back the basics approach. and..... performance is good 940Mbps easily but here is an interesting find. I'm still seeing a lot dupes in the captures even with both interfaces running at a gig.
Could the dupes be related to packet loss over the cable lines to the CMTS? The reason I ask is because i'm seeing some loss over the modem due to signal issues or something "Spectrum is working on it" I'm only seeing 1-2% on the upstream so i'm not sure it could cause that many dupes? downstream has been pretty clean without packet loss.
-
There is dupes and then there is 17% of the whole trace being dupes..
-
I'm going to work on tracking down the source of the errors. Pretty sure I have the Cisco switch ruled out so far.
-
I have some updated info.
I created a new vlan on the cisco switch and setup two etherchannels. One for the pfsense wan and one for the modem and put them both on the that vlan. basically a "WAN" bridge between the two devices I also added a gig port to the new vlan as well. This way I can test speeds between the modem and/or pfsense WAN side. My test computer picked up a public ip on the WAN vlan and ran some iperf tests for ingress and egress to an inside client. That produced 970Mbps both ways with no dupes or other problems in the packet capture. So I feel this rules out any issues with my inside network and pfsense.
Running a speed test across the etherchannel to the cable modem from a 1Gb connection I was able to reproduce the exact same issue I had with low throughput. I even used new cables and tested them with my fluke to be sure. So it seems like this is an issue with modem+lacp or firmware. If it was a modem firmware problem I'm SOL anyways. I'm little burned out on it right now so I'll come back in a few days and keep working at it. I have Mikrotik a could try, maybe a windows computer with a LAGG directly connected to the modem.
-
The issue ended up being the Cisco switch output drops due to the higher interface speed. Buffer increase and QOS rule fixed it switch side.
and
Hope this helps someone else
-
Amazing.
-
Wow. Fun.