How I fixed apinger and packet loss (Hint: It wasn't broken)



  • Preface:  For anyone not aware of bufferbloat, a rather simple explanation is here:  http://www.bufferbloat.net/projects/bloat/wiki/introduction

    Hi everyone.. I'm brand new to pfSense, and purchased a Netgate RCC-VE 2440 last week, along with a pfSense Gold subscription to get me started.  I wanted to post this message for new users of pfSense that may be going through the same thing I went through, and anyone else that it may help.  One of the issues that plagued me from the beginning was problems related to apinger (or so I thought).  The box is used on a home single WAN cable modem install (Cablevision/Optimum Online 101/35 package) that is serving about 50 clients behind it.  Previously, I was using an Asus RT-AC68U for gateway routing - that is now relegated to being one of two APs in the house.  With the RT-AC68U, we didn't have too many issues - but in the evenings, using things like iPhones and iPads became a bit laggy - but not annoyingly so.  My connection is stable.. I'm a participant of the FCC SamKnows program for several years now, and my report cards reflected that.

    My initial configuration was very straight-forward.. Single WAN with DHCP, and the default LAN rules for allowing outgoing traffic.  No extra packages installed.. I'm running the 2.2.4-RELEASE (ADI Community Edition).  While things seemed to go well initially, it wasn't long before I started having apinger issues.  I also started experiencing real packet loss issues - this was verified not only by what apinger was complaining about - but a remote monitor I have at DSL Reports was also showing a good deal of packet loss. In the evenings, clients experienced the symptoms of packet loss as well – delayed responses, pages that wouldn't completely load, severe latency issues with my VoIP service for 2 lines through an OBi box, etc.

    Over the past week, I searched for a solution.  One of the things I found was that most solutions to the issues surrounding apinger identifying packet loss and resetting the WAN were either a) raise apinger's threshold values so high that it wouldn't reset the connection, b) reduce the frequency of probes, or c) disable gateway monitoring completely. All of those "solutions" temporarily helped -- but it was only a matter of time before I would experience the issues again.  Some posts that put me on the right track were to prioritize ICMP traffic.

    I had no desire to do any type of QoS or traffic shaping.. I had never used it in the past, and everything here just worked well (enough) without it.  Even though I had read many accounts of the bufferbloat issue, I never considered it a problem here.  I ran a few speed tests at DSL Reports, and received a bufferbloat score of F on all of them - while at the same time reporting speeds in excess of my subscribed tier (Generally, 120 down, 40 up).  It became clear that the reason apinger was having such a hard time was due to bufferbloat on my connection.  The fact that the SamKnows box was running its tests every hour or two would cause the apinger probes to get delayed. Additionally, I run a weather website and have a PC as well as a plug computer that is constantly uploading data to both my web hosting provider as well as a few weather sites, such as Weather Underground.

    If you have an excessive amount of bufferbloat, apinger will cause you grief once your WAN traffic increases.  For me, loading up my bandwidth introduced latency as high as 2000ms - of course apringer is going to complain.  Apinger may have other issues that I haven't yet encountered - but at the very least, with bufferbloat present, apinger simply won't work as designed.  Its monitoring is really quite simple.  A ping every second should not bring a network to its knees.  For me (and I'm guessing numerous other home users), consumer grade gateway devices simply don't do gateway monitoring - so never really encounter this issue.

    If you do not currently utilize any shapers at all, you should, at the very least - have a very basic CODELQ shaper on both your WAN and LAN connection to get a handle on bufferbloat.  Initially, I started with one of the wizards, but in the end it wasn't necessary for what I needed.

    Steps – this assumes you have NO shaper currently in place.  If you follow these instructions, you will lose what you previously had:

    1. Head over to DSLReports and run a speed test to measure your bufferbloat:  http://www.dslreports.com/speedtest
    2. Also get a few speedtests from the Ookla speedtest at http://www.speedtest.net using prerably, a server hosted by your ISP

    3. Go to Firewall, Traffic Shaper.  The first tab, 'By Interface' is where you want to be.
    4. Start with the WAN connection by clicking on the icon to its left
    5. Make the following changes:
        a) Make sure "Enable/disable discipline and its children" is CHECKED
        b) Change Scheduler Type to "CODELQ"
        c) Set your average UPLOAD bandwidth in the Bandwidth box.  I used the average upload reported by my SamKnows reports, but you can use an average of the Ookla upload results.  For me, that was 39     
        d) Make sure you change units accordingly, since they default to Kbit/s. 
        e) Click Save
    6. Now do the LAN:
        a) Make sure "Enable/disable discipline and its children" is CHECKED
        b) Change Scheduler Type to "CODELQ"
        c) For bandwidth, you want this based on your DOWNLOAD speed here. I entered a value of 92.5% of my average 120 Mbit/s (111 Mbit/s)
        d) Click Save
    7. Reset states by going to Diagnostics, States, Reset States

    Now, head back over to DSL Reports and perform a speedtest.  Compare your bufferbloat scores to what you had previously.  You may need to tweak your bandwidth values entered in the shaper until you get optimal results. If you do, be sure to reset states after each change.  For me, the values I reported were what worked out best and got me an 'A' score on the bufferbloat report.  This simple step reduced my bufferbloat latency from 2000ms under load, to about 10ms.. From what I understand, CODELQ does not require bandwidth values or adjustment - but my testing did find that providing it with values does help it significantly.  Also, if you decide to tweak your bandwidth values, you should notice that when implemented properly, your speed results should be the same, or better for the tested speeds - while improving the bufferbloat letter grade.  In other words, the shaper shouldn't negatively effect your speeds seen FOR THE TESTS.

    ALSO - after making these changes, I removed any custom thresholds for gateway monitoring and let it use the default values.. If you normally have low latency to your gateway (I would say less than 50ms or so), the default values are realistic and good for determining loss and latency.  I tested my new configuration by hammering my WAN connection with numerous streaming apps on multiple devices - Netflix on Roku, HBO Go and Showtime Anytime on iPads.. Large download on a computer -- all done simulatenously -- and noting that my Quality RRD graphs did not deviate much during the tests -- proving that the shaper was doing its job correctly.

    Like I said -- as a new user, I wish I would have had an easy guide like this.  I hope this helps anyone that may have been facing the same problems.

    Regards,
    Rick



  • CoDel is an AQM and does not require bandwidth settings because it does not shape any traffic. CODELQ is not an AQM, but a traffic shaper with CoDel AQM built in, and the shaper does need a bandwidth setting. CoDel can only work if there is a backlog of packets, and if you don't rate limit your interface, you will send packets at line rate and almost never have a backlog.

    I have a 100Mb connection




    Apinger does have some major issues and is broken, but not all connections trigger the bug. When apinger claims you have 0% packetloss and a 1ms ping on a satellite connection, it's lying or the speed of light has changed.



  • Hi,

    I found this thread very interesting. Had a go at applying CODELQ for my interfaces and managed to improve my buffer-bloat score.

    As pointed out by Harvy66 the CODELQ algorithm doesn't actually use peak bandwidth as input parameter. Obviously, you want to do before- and after measurements, but there is no need to enter bandwidth value in the web form (for the sake of simplicity, the web GUI uses the same form for all scheduler types).

    Best Regards

    //Jimmy



  • I get a good buffer bloat score using just the standard CBQ traffic shaper setup on pfSense.



  • The default queue depth created for traffic shaping is 50 packets. You may have a good bloat result because of this, but your burst bandwidth and many-flow average bandwidth may take a sizable hit and you may notice some bursty packetloss.

    The issue codel attempts to solve is a buffer large enough to handle bursts is typically too large to handle sustained. Codel is elastic and will allow a burst, but if the burst continues too long, packets start getting dropped at an increasing rate until the queue size comes back down.



  • Thanks all for the tips, my internet is 250/20. By setting up the traffic shaper using CODELQ without specifying the bandwidth on LANs, and disabled the gateway monitoring. My BufferBloat scores A (was C before), Quality scores A+, and Speed scores A~A+.



  • My BufferBloat scores A (was C before), Quality scores A+, and Speed scores A~A+.

    Where & how is this "scoring" done?



  • @KOM:

    My BufferBloat scores A (was C before), Quality scores A+, and Speed scores A~A+.

    Where & how is this "scoring" done?

    I think they are using this test: http://www.dslreports.com/speedtest



  • @KOM:

    My BufferBloat scores A (was C before), Quality scores A+, and Speed scores A~A+.

    Where & how is this "scoring" done?

    I was using:  http://www.dslreports.com/speedtest



  • I have a 250/25 cable connection and am running the latest pfSense version without any traffic shaper currently.

    http://www.dslreports.com/speedtest tells me I have a high download buffer bloat but a very low upload buffer bloat. This is a grade C in the end.

    I would like to improve this now. I applied the CODELQ shaper to my WAN interface and set it to 250 MBit/s (I had to set a value btw, otherwise I get an error message).

    Afterwards I deleted the state tables and ran the test again. No change. Same speeds, same ratings.

    I miss something or is there just nothing I can do?


  • Netgate

    You can probably do nothing about download bloat upstream. Shaping controls traffic leaving an interface. You cannot control what happens upstream or when traffic arrives on WAN.

    You almost certainly have higher bandwidth outbound LAN than inbound WAN. If you are 100M anywhere on LAN, you'll want to make that all gig.



  • I realize this is total necro, but this post shows up on the first page of DuckDuckGo results.

    I was getting Ds and mostly Fs on DSL Reports bandwidth test.

    In 2.4.2, setting CODELQ without bandwidth was not permitted by the interface.

    Setting bandwidth to a number higher than my ISP advertised rate resulted in no change in bufferbloat.

    Setting bandwidth to my ISP's advertised rate resulted in all As.

    What I found interesting is that even though I can get ~10% higher than advertised actual speed, setting bandwidth to even 50 kbps higher than advertised resulted in increased bufferbloat.