Understanding BufferBloat and LAGG
-
I have Comcast gigabit service and I'm trying to take advantage of the 1.2Gbs over-provisioning by using LAGG.
Modem: Motorola MB8600
pfSense Appliance: Tometek MAX-TTS
Switch: D-Link DGS-1510-20As you can see from the pictures below, if I have the switch do the LAGG and bring the WAN over pfSense via VLAN I consistently have a bufferbloat of "C".
If on the other hand I have pfSense do the LAGG directly, I have identical speeds, but a bufferbloat of "D", very often "F".
Does it mean that pfSense is more efficient than the switch in doing the LAGG? Or vice-versa? Which configuration is better overall?
Switch doing WAN LAGG:
pfSense doing WAN LAGG:
DSLReports speed test:
-
I'm surprised that makes much difference. Do you have the actual numbers from those tests?
Where are you testing from internally, how is that connected?
Steve
-
@mircolino said in Understanding BufferBloat and LAGG:
pfSense Appliance: Tometek MAX-TTS
hi,
Sorry, it is not closely related to your question, but allow me to ask..- these Tometek boxes so they sell as pfSense appliance?
BTW:
this LAGG issue is a really interesting topic, I will pay attention to the thread -
@DaddyGo, yes I bought mine directly from Tometek on Alibaba.
The model I got has an Intel 7th gen dual core Celeron 3865U, TDP 15W, 8GB RAM, 64GB SSD, 4 SFP+ and 2 SFP (all Intel).
I negotiated a price of $380 plus CC fees and shipping for a total of $420. Received it (in California) the week after.
I put it in service 10 days ago and so far I've had zero problems (restarted it yesterday to setup LAGG).
The only initial issue was pfSense complaining about having to generate a new UUID because it was unable to read it from the BIOS DMI.
Tometek support gave me the AMI DmiEdit utility and after rewriting the DMI now everything is OK.A total overkill I know
-
@stephenw10, I'll re-run the tests tonight when nobody is using the Internet (right now my wife is on zoom with 16 other coworkers).
But overall the numbers, whether it's the switch doing the WAN LAGG or the pfSense appliance, are similar. Always in the 1.2Gb/s range ยฑ 20Mb/s. It's only the bufferbloat that's higher when pfSense is handling the aggregation.
Is it because pfsense LAGG is too fast and the rest of the firewall can't keep up?
I'm running all the tests from Chrome on a Windows Server 2019.
Windows Server (Intel X520-DA1) SFP+ Twinax DAC Switch port 20 (setup as a vlan trunk)
-
I sure hope you installed pfSense yourself on that.....
-
@stephenw10 said in Understanding BufferBloat and LAGG:
I sure hope you installed pfSense yourself on that.....
I did. Why?
UPDATE: it actually came with Ubuntu preinstalled.
-
Why is that?? Isnt PFsense OS? and free to use?
-
Several reasons. But for me the biggest is; if you buy a firewall direct from China you have no idea what's actually installed on it. Even if it came with pfSense installed (which it shouldn't because that's commercial redistribution) you should format it and reinstall.
Steve
-
Check out this video https://www.youtube.com/watch?v=iXqExAALzR8
I went from an F to an A+ on bufferbloat. -
@winger46146, thank you for the link. Yes, I was going to setup limiters next.
I obviously rather have pfSense handle the WAN directly, instead of going through the switch first.
Just don't understand why, by having pfSense do the WAN LAGG, the overall performance is slightly degrading. I'd expect to be the opposite. -
I did not see any difference with bufferbloat on that test going from non-LAGG to LAGG on my MB8600 to pfsense (on my XTM5 box). Did you see a difference?
Im on an M400 box now so could try that test with it but its kind of one of those buzzwords that DSLR seems to have brought into the picture and made everyone worry..
Do you get your full speed from your ISP? When you max out your connection while on Zoom,VOIP ect.. does your jitter increase to the point where the call suffers?
I can not say I see any issue from my "D" grade on my bufferbloat as reported by DSLR.. Im not sure the effort is worth the payback.. But that said.. I am curious. :)
-
I really wanted to "remove" WAN traffic from the switch and let pfSense handle it directly.
But after reading here that LAGG interfaces don't support limiters, while VLANs do, I basically had no choice but let the switch handle the Motorola MB8600 LAGG.
Not my preferred choice, but after adding limiters to the WAN interface, following the link posted by @winger46146, these is the outcome:I'll take a 50Mb/s speed penalty for all straight As
-
LAGG interfaces can use Limiters no problem. They can't use ALTQ based traffic shaping.
It would be interesting to test without the switch in play at all if you can. So modem - pfSense - test client directly.
And, yeah, fixing buffer bloat can make a big difference to some things if you have it bad, like 'F'!
Steve
-
@stephenw10 said in Understanding BufferBloat and LAGG:
LAGG interfaces can use Limiters no problem. They can't use ALTQ based traffic shaping.
Didn't know that .
It would be interesting to test without the switch in play at all if you can. So modem - pfSense - test client directly.
That's pretty easy to try. Tonight when again nobody's using the Internet I'll run another set of tests and post the results.
In the meantime, this is the slightly redacted "netstat -i" output with the switch doing the WAN LAGG:
ix0: LAN
ix0.2: WAN
ix0.3: DMZ
ix0.4: IOT
ix0.5: GUESTName Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll ix0 1500 <Link#1> 00:f0:xx:xx:xx:44 34496952 0 0 34515981 0 0 ix0 - 172.xx.8.0/24 edge 5747 - - 4307 - - ix0 - fe80::%ix0/64 fe80::1:1%ix0 634 - - 5006 - - ix0 - 2601:646:8302 edge 13558 - - 15001 - - ix1* 1500 <Link#2> 00:f0:xx:xx:xx:45 0 0 0 0 0 0 igb0* 1500 <Link#3> 00:f0:xx:xx:xx:b5 0 0 0 0 0 0 igb1* 1500 <Link#4> 00:f0:xx:xx:xx:b6 0 0 0 0 0 0 ix2* 1500 <Link#5> 00:f0:xx:xx:xx:46 0 0 0 0 0 0 ix3* 1500 <Link#6> 00:f0:xx:xx:xx:47 0 0 0 0 0 0 lo0 16384 <Link#7> lo0 80 0 0 80 0 0 lo0 - localhost localhost 0 - - 0 - - lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 0 - - lo0 - your-net localhost 80 - - 80 - - enc0* 1536 <Link#8> enc0 0 0 0 0 0 0 pfsyn 1500 <Link#9> pfsync0 0 0 0 0 0 0 pflog 33160 <Link#10> pflog0 0 0 0 5441 0 0 ix0.3 1500 <Link#11> 00:f0:xx:xx:xx:44 69106 0 0 42185 0 0 ix0.3 - 172.xx.9.0/24 edge-dmz 8 - - 8 - - ix0.3 - fe80::%ix0.3/ fe80::1:1%ix0.3 270 - - 4147 - - ix0.3 - 2601:646:8302 edge-dmz 292 - - 151 - - ix0.4 1500 <Link#12> 00:f0:xx:xx:xx:44 4123549 0 0 2684756 0 0 ix0.4 - 172.xx.10.0/2 edge-iot 2110 - - 1798 - - ix0.4 - fe80::%ix0.4/ fe80::1:1%ix0.4 927 - - 5309 - - ix0.4 - 2601:646:8302 edge-iot 1217 - - 622 - - ix0.5 1500 <Link#13> 00:f0:xx:xx:xx:44 1861 0 0 3738 0 0 ix0.5 - 172.xx.11.0/2 edge-guest 0 - - 0 - - ix0.5 - fe80::%ix0.5/ fe80::1:1%ix0.5 0 - - 3732 - - ix0.5 - 2601:646:8302 edge-guest 0 - - 0 - - ix0.2 1500 <Link#14> 00:f0:xx:xx:xx:44 25567476 0 0 9123326 0 0 ix0.2 - fe80::%ix0.2/ fe80::xxx:xxxx:fe 46506 - - 46538 - - ix0.2 - 73.xxx.xx.0/2 c-73-xxx-xx-189.h 106079 - - 46471 - - ix0.2 - 2001:558:6045 2001:558:6045:xx: 58138 - - 12 - -
-
OK. Reconfigured the WAN with pfSense doing the LAGG and connected the Windows Server directly to the appliance (nothing else connected).
Speed test without limiters:
Speed test with limiters (CoDel 1200Mbs down, 50Mbs up, queue lenght left empty, both IPv4 and IPv6 floating rules):
Pretty impressive I have to say
I can probably gain a bit more by playing with up/down speeds and queue length, but for now I'll leave it alone.The following is "netstat -i" output:
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll ix0 1500 <Link#1> 00:f0:xx:xx:xx:44 4033957 0 0 6836137 0 0 ix0 - 172.xx.8.0/24 edge 371 - - 563 - - ix0 - fe80::%ix0/64 fe80::1:1%ix0 31 - - 134 - - ix0 - 2601:646:8302 edge 254 - - 300 - - ix1* 1500 <Link#2> 00:f0:xx:xx:xx:45 0 0 0 0 0 0 igb0 1500 <Link#3> 00:f0:xx:xx:xx:b5 2440366 0 0 1306587 0 0 igb1 1500 <Link#4> 00:f0:xx:xx:xx:b5 4545738 0 0 2941815 0 0 ix2* 1500 <Link#5> 00:f0:xx:xx:xx:46 0 0 0 0 0 0 ix3* 1500 <Link#6> 00:f0:xx:xx:xx:47 0 0 0 0 0 0 lo0 16384 <Link#7> lo0 77 0 0 77 0 0 lo0 - localhost localhost 0 - - 0 - - lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 0 - - lo0 - your-net localhost 77 - - 77 - - enc0* 1536 <Link#8> enc0 0 0 0 0 0 0 pfsyn 1500 <Link#9> pfsync0 0 0 0 0 0 0 pflog 33160 <Link#10> pflog0 0 0 0 5607 0 0 lagg0 1500 <Link#11> 00:f0:xx:xx:xx:b5 6986138 0 0 4248402 5 0 lagg0 - fe80::%lagg0/ fe80::xxx:xxxx:fe 32878 - - 32915 - - lagg0 - 73.xxx.xx.0/2 c-73-xxx-xx-178.h 91007 - - 4 - - lagg0 - 2001:558:6045 2001:558:6045:xx: 2153 - - 0 - - ix0.3 1500 <Link#12> 00:f0:xx:xx:xx:44 64950 0 0 25768 0 0 ix0.3 - 172.xx.9.0/24 edge-dmz 0 - - 0 - - ix0.3 - fe80::%ix0.3/ fe80::1:1%ix0.3 0 - - 153 - - ix0.3 - 2601:646:8302 edge-dmz 0 - - 0 - - ix0.4 1500 <Link#13> 00:f0:xx:xx:xx:44 3090700 0 0 1940401 0 0 ix0.4 - 172.xx.10.0/2 edge-iot 384 - - 323 - - ix0.4 - fe80::%ix0.4/ fe80::1:1%ix0.4 11 - - 107 - - ix0.4 - 2601:646:8302 edge-iot 25 - - 21 - - ix0.5 1500 <Link#14> 00:f0:xx:xx:xx:44 1342 0 0 2721 0 0 ix0.5 - 172.xx.11.0/2 edge-guest 0 - - 0 - - ix0.5 - fe80::%ix0.5/ fe80::1:1%ix0.5 0 - - 90 - - ix0.5 - 2601:646:8302 edge-guest 0 - - 0 - -
I cannot prove it, but i still have the feeling that the switch is ever so slightly better at doing the LAGG.
However the convenience of having WAN traffic out of the way, easily outweigh that. -
I would not be at all surprised that a switch is better at a Layer 2 protocol like LACP than FreeBSD.
-
With the CoDel limiters now in place, I noticed a new warning in the log, every time the system boots up:
config_aqm Unable to configure flowset, flowset busy!
I read somewhere else on this forum that this message can be ignored.
Is it true? Anyway to prevent it? -
If it only appears at boot then, yes, it probably can be ignored.
It looks like it's also associated with setting the QMA to CoDel which is not usually necessary. Leaving it as Taildrop with FQ-CoDel as the Scheduler should get the same results.
Steve