Understanding BufferBloat and LAGG



  • I have Comcast gigabit service and I'm trying to take advantage of the 1.2Gbs over-provisioning by using LAGG.

    Modem: Motorola MB8600
    pfSense Appliance: Tometek MAX-TTS
    Switch: D-Link DGS-1510-20

    As you can see from the pictures below, if I have the switch do the LAGG and bring the WAN over pfSense via VLAN I consistently have a bufferbloat of "C".

    If on the other hand I have pfSense do the LAGG directly, I have identical speeds, but a bufferbloat of "D", very often "F".

    Does it mean that pfSense is more efficient than the switch in doing the LAGG? Or vice-versa? Which configuration is better overall?

    Switch doing WAN LAGG:
    Switch doing LAGG

    pfSense doing WAN LAGG:
    pfSense doing LAGG

    DSLReports speed test:
    DSLReports speed test


  • Netgate Administrator

    I'm surprised that makes much difference. Do you have the actual numbers from those tests?

    Where are you testing from internally, how is that connected?

    Steve



  • @mircolino said in Understanding BufferBloat and LAGG:

    pfSense Appliance: Tometek MAX-TTS

    hi,
    Sorry, it is not closely related to your question, but allow me to ask..

    • these Tometek boxes so they sell as pfSense appliance?

    be9e0362-fb84-4204-8090-8fd38624b2a5-image.png

    0e9eea85-96a9-43e9-9179-86307d20ac23-image.png

    29eb4994-d411-4578-bbe6-24d6e2bd998b-image.png

    BTW:
    this LAGG issue is a really interesting topic, I will pay attention to the thread



  • @DaddyGo, yes I bought mine directly from Tometek on Alibaba.

    The model I got has an Intel 7th gen dual core Celeron 3865U, TDP 15W, 8GB RAM, 64GB SSD, 4 SFP+ and 2 SFP (all Intel).

    I negotiated a price of $380 plus CC fees and shipping for a total of $420. Received it (in California) the week after.

    I put it in service 10 days ago and so far I've had zero problems (restarted it yesterday to setup LAGG).
    The only initial issue was pfSense complaining about having to generate a new UUID because it was unable to read it from the BIOS DMI.
    Tometek support gave me the AMI DmiEdit utility and after rewriting the DMI now everything is OK.

    pfSense

    A total overkill I know ๐Ÿ˜Ž



  • @stephenw10, I'll re-run the tests tonight when nobody is using the Internet (right now my wife is on zoom with 16 other coworkers).

    But overall the numbers, whether it's the switch doing the WAN LAGG or the pfSense appliance, are similar. Always in the 1.2Gb/s range ยฑ 20Mb/s. It's only the bufferbloat that's higher when pfSense is handling the aggregation.

    Is it because pfsense LAGG is too fast and the rest of the firewall can't keep up?

    I'm running all the tests from Chrome on a Windows Server 2019.

    Windows Server (Intel X520-DA1) โ†” SFP+ Twinax DAC โ†” Switch port 20 (setup as a vlan trunk)


  • Netgate Administrator

    I sure hope you installed pfSense yourself on that.....



  • @stephenw10 said in Understanding BufferBloat and LAGG:

    I sure hope you installed pfSense yourself on that.....

    I did. Why?

    UPDATE: it actually came with Ubuntu preinstalled.



  • @stephenw10

    Why is that?? Isnt PFsense OS? and free to use?


  • Netgate Administrator

    Several reasons. But for me the biggest is; if you buy a firewall direct from China you have no idea what's actually installed on it. Even if it came with pfSense installed (which it shouldn't because that's commercial redistribution) you should format it and reinstall.

    Steve



  • Check out this video https://www.youtube.com/watch?v=iXqExAALzR8
    I went from an F to an A+ on bufferbloat.



  • @winger46146, thank you for the link. Yes, I was going to setup limiters next.

    I obviously rather have pfSense handle the WAN directly, instead of going through the switch first.
    Just don't understand why, by having pfSense do the WAN LAGG, the overall performance is slightly degrading. I'd expect to be the opposite.



  • I did not see any difference with bufferbloat on that test going from non-LAGG to LAGG on my MB8600 to pfsense (on my XTM5 box). Did you see a difference?

    Im on an M400 box now so could try that test with it but its kind of one of those buzzwords that DSLR seems to have brought into the picture and made everyone worry..

    Do you get your full speed from your ISP? When you max out your connection while on Zoom,VOIP ect.. does your jitter increase to the point where the call suffers?

    I can not say I see any issue from my "D" grade on my bufferbloat as reported by DSLR.. Im not sure the effort is worth the payback.. But that said.. I am curious. :)



  • I really wanted to "remove" WAN traffic from the switch and let pfSense handle it directly.
    But after reading here that LAGG interfaces don't support limiters, while VLANs do, I basically had no choice but let the switch handle the Motorola MB8600 LAGG.
    Not my preferred choice, but after adding limiters to the WAN interface, following the link posted by @winger46146, these is the outcome:

    alt text

    I'll take a 50Mb/s speed penalty for all straight As ๐Ÿ™‚


  • Netgate Administrator

    LAGG interfaces can use Limiters no problem. They can't use ALTQ based traffic shaping.

    It would be interesting to test without the switch in play at all if you can. So modem - pfSense - test client directly.

    And, yeah, fixing buffer bloat can make a big difference to some things if you have it bad, like 'F'!

    Steve



  • @stephenw10 said in Understanding BufferBloat and LAGG:

    LAGG interfaces can use Limiters no problem. They can't use ALTQ based traffic shaping.

    Didn't know that ๐Ÿ˜ž.

    It would be interesting to test without the switch in play at all if you can. So modem - pfSense - test client directly.

    That's pretty easy to try. Tonight when again nobody's using the Internet I'll run another set of tests and post the results.

    In the meantime, this is the slightly redacted "netstat -i" output with the switch doing the WAN LAGG:

    ix0: LAN
    ix0.2: WAN
    ix0.3: DMZ
    ix0.4: IOT
    ix0.5: GUEST

    Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
    ix0    1500 <Link#1>      00:f0:xx:xx:xx:44 34496952     0     0 34515981     0     0
    ix0       - 172.xx.8.0/24 edge                  5747     -     -     4307     -     -
    ix0       - fe80::%ix0/64 fe80::1:1%ix0          634     -     -     5006     -     -
    ix0       - 2601:646:8302 edge                 13558     -     -    15001     -     -
    ix1*   1500 <Link#2>      00:f0:xx:xx:xx:45        0     0     0        0     0     0
    igb0*  1500 <Link#3>      00:f0:xx:xx:xx:b5        0     0     0        0     0     0
    igb1*  1500 <Link#4>      00:f0:xx:xx:xx:b6        0     0     0        0     0     0
    ix2*   1500 <Link#5>      00:f0:xx:xx:xx:46        0     0     0        0     0     0
    ix3*   1500 <Link#6>      00:f0:xx:xx:xx:47        0     0     0        0     0     0
    lo0   16384 <Link#7>      lo0                     80     0     0       80     0     0
    lo0       - localhost     localhost                0     -     -        0     -     -
    lo0       - fe80::%lo0/64 fe80::1%lo0              0     -     -        0     -     -
    lo0       - your-net      localhost               80     -     -       80     -     -
    enc0*  1536 <Link#8>      enc0                     0     0     0        0     0     0
    pfsyn  1500 <Link#9>      pfsync0                  0     0     0        0     0     0
    pflog 33160 <Link#10>     pflog0                   0     0     0     5441     0     0
    ix0.3  1500 <Link#11>     00:f0:xx:xx:xx:44    69106     0     0    42185     0     0
    ix0.3     - 172.xx.9.0/24 edge-dmz                 8     -     -        8     -     -
    ix0.3     - fe80::%ix0.3/ fe80::1:1%ix0.3        270     -     -     4147     -     -
    ix0.3     - 2601:646:8302 edge-dmz               292     -     -      151     -     -
    ix0.4  1500 <Link#12>     00:f0:xx:xx:xx:44  4123549     0     0  2684756     0     0
    ix0.4     - 172.xx.10.0/2 edge-iot              2110     -     -     1798     -     -
    ix0.4     - fe80::%ix0.4/ fe80::1:1%ix0.4        927     -     -     5309     -     -
    ix0.4     - 2601:646:8302 edge-iot              1217     -     -      622     -     -
    ix0.5  1500 <Link#13>     00:f0:xx:xx:xx:44     1861     0     0     3738     0     0
    ix0.5     - 172.xx.11.0/2 edge-guest               0     -     -        0     -     -
    ix0.5     - fe80::%ix0.5/ fe80::1:1%ix0.5          0     -     -     3732     -     -
    ix0.5     - 2601:646:8302 edge-guest               0     -     -        0     -     -
    ix0.2  1500 <Link#14>     00:f0:xx:xx:xx:44 25567476     0     0  9123326     0     0
    ix0.2     - fe80::%ix0.2/ fe80::xxx:xxxx:fe    46506     -     -    46538     -     -
    ix0.2     - 73.xxx.xx.0/2 c-73-xxx-xx-189.h   106079     -     -    46471     -     -
    ix0.2     - 2001:558:6045 2001:558:6045:xx:    58138     -     -       12     -     -
    


  • OK. Reconfigured the WAN with pfSense doing the LAGG and connected the Windows Server directly to the appliance (nothing else connected).

    Speed test without limiters:

    alt text

    Speed test with limiters (CoDel 1200Mbs down, 50Mbs up, queue lenght left empty, both IPv4 and IPv6 floating rules):

    alt text

    Pretty impressive I have to say ๐Ÿ™‚
    I can probably gain a bit more by playing with up/down speeds and queue length, but for now I'll leave it alone.

    The following is "netstat -i" output:

    Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
    ix0    1500 <Link#1>      00:f0:xx:xx:xx:44  4033957     0     0  6836137     0     0
    ix0       - 172.xx.8.0/24 edge                   371     -     -      563     -     -
    ix0       - fe80::%ix0/64 fe80::1:1%ix0           31     -     -      134     -     -
    ix0       - 2601:646:8302 edge                   254     -     -      300     -     -
    ix1*   1500 <Link#2>      00:f0:xx:xx:xx:45        0     0     0        0     0     0
    igb0   1500 <Link#3>      00:f0:xx:xx:xx:b5  2440366     0     0  1306587     0     0
    igb1   1500 <Link#4>      00:f0:xx:xx:xx:b5  4545738     0     0  2941815     0     0
    ix2*   1500 <Link#5>      00:f0:xx:xx:xx:46        0     0     0        0     0     0
    ix3*   1500 <Link#6>      00:f0:xx:xx:xx:47        0     0     0        0     0     0
    lo0   16384 <Link#7>      lo0                     77     0     0       77     0     0
    lo0       - localhost     localhost                0     -     -        0     -     -
    lo0       - fe80::%lo0/64 fe80::1%lo0              0     -     -        0     -     -
    lo0       - your-net      localhost               77     -     -       77     -     -
    enc0*  1536 <Link#8>      enc0                     0     0     0        0     0     0
    pfsyn  1500 <Link#9>      pfsync0                  0     0     0        0     0     0
    pflog 33160 <Link#10>     pflog0                   0     0     0     5607     0     0
    lagg0  1500 <Link#11>     00:f0:xx:xx:xx:b5  6986138     0     0  4248402     5     0
    lagg0     - fe80::%lagg0/ fe80::xxx:xxxx:fe    32878     -     -    32915     -     -
    lagg0     - 73.xxx.xx.0/2 c-73-xxx-xx-178.h    91007     -     -        4     -     -
    lagg0     - 2001:558:6045 2001:558:6045:xx:     2153     -     -        0     -     -
    ix0.3  1500 <Link#12>     00:f0:xx:xx:xx:44    64950     0     0    25768     0     0
    ix0.3     - 172.xx.9.0/24 edge-dmz                 0     -     -        0     -     -
    ix0.3     - fe80::%ix0.3/ fe80::1:1%ix0.3          0     -     -      153     -     -
    ix0.3     - 2601:646:8302 edge-dmz                 0     -     -        0     -     -
    ix0.4  1500 <Link#13>     00:f0:xx:xx:xx:44  3090700     0     0  1940401     0     0
    ix0.4     - 172.xx.10.0/2 edge-iot               384     -     -      323     -     -
    ix0.4     - fe80::%ix0.4/ fe80::1:1%ix0.4         11     -     -      107     -     -
    ix0.4     - 2601:646:8302 edge-iot                25     -     -       21     -     -
    ix0.5  1500 <Link#14>     00:f0:xx:xx:xx:44     1342     0     0     2721     0     0
    ix0.5     - 172.xx.11.0/2 edge-guest               0     -     -        0     -     -
    ix0.5     - fe80::%ix0.5/ fe80::1:1%ix0.5          0     -     -       90     -     -
    ix0.5     - 2601:646:8302 edge-guest               0     -     -        0     -     -
    

    I cannot prove it, but i still have the feeling that the switch is ever so slightly better at doing the LAGG.
    However the convenience of having WAN traffic out of the way, easily outweigh that.


  • LAYER 8 Netgate

    I would not be at all surprised that a switch is better at a Layer 2 protocol like LACP than FreeBSD.



  • With the CoDel limiters now in place, I noticed a new warning in the log, every time the system boots up:

    config_aqm Unable to configure flowset, flowset busy!
    

    I read somewhere else on this forum that this message can be ignored.
    Is it true? Anyway to prevent it?


  • Netgate Administrator

    If it only appears at boot then, yes, it probably can be ignored.

    It looks like it's also associated with setting the QMA to CoDel which is not usually necessary. Leaving it as Taildrop with FQ-CoDel as the Scheduler should get the same results.

    Steve


Log in to reply