BufferBloat with Traffic Shaping

cityfanminimos

Hi,

I am trying to figure out why buffer bloat occurs when I enable traffic shaping. The setup is in a test environment behind my current router which has SQM enabled (gets A+ on bufferbloat), so the PFsense box is getting an address from this gateway. I want to eventually remove my exiting router and replace with a bare metal PFsense build, but my observations so far are not encouraging. It is obviously a rather open ended question so here are my observations.

Default PFsense, no traffic shaping out of the box, A+ rating on WaveForm and DSLReports tests. Ping <5ms under load in terms of jitter.
Enable a FQ_CODEL limiter as per the Netgate documentation along with a floating rue. A rating on buffer bloat with around 5ms jitter on download and upload.
Enable traffic shaping so that traffic can be prioritized to not let 1 device hog all bandwidth. C rating on bufferbloat.

To test these theories I have made the shaping and test very simple.

1 client connected to Pfsense (virtulized in KVM, hardware offloading disabled). Traffic Shaping rule set to PRIQ (tried other types as well) only for 80/443 traffic from the wizard set to move these into a high priority queue, 3 rules in Floating set to match on the outbound WAN with the limiter added to the Up/Down pipe in the floating rule

I can see that the limiter is working, but no matter what I set the algorithm to, I get lots of extra latency with traffic shaping enabled.

I have tried many variations of the floating rules, but no matter what I do, the results are consistent with the above.

I want to eliminate as much buffer bloat as possible whilst preventing a scenario where multiple devices start hogging bandwidth, but it just does not seem possible.

Is there something I have missed, or is this just how PFsense is?

Thanks in advance

mer

@cityfanminimos
Since bufferbloat is mostly related to network latency, the testing is trying to saturate things, and you say the limiter is working, I think that would wind actually increasing latency.
Also is it possible that traffic shaping on pfSense behind your current router causes the current router to affect the test?

That's just me thinking out loud, I've not tried to do anything like this.

cityfanminimos

@mer Thanks for the reply.

I had the same thought process re the other router skewing results, however with no shaping or limiters I get a better results. In my head the PFsense router is shaping before it hits the other router, so I would expect it only help that situation by limiting the amount of traffic so reducing the opportunity for bufferbloat.

Also, the limiter works by itself (albeit with slightly higher ping than without limiter), but even with the limiter in collaboration with Traffic Shaper it pushes the latency higher.

I also though maybe the shaping rules are adding latency, but my other router is arm based so has hardly any grunt and does not have issues with more firewall rules than this, so I can't see this being the case.

It feels like a real mystery or maybe a misconfiguration/bug with how I have it virtulized?

Austin 0

@cityfanminimos Have you tried adjusting the queue length? Also traffic shapers do have a resource impact. Have you checked the system resources when the traffic shaper is applied?

cityfanminimos

@Austin-0 I did have a play with queue lengths as I could see drops in the queue status so my initial thought was that there were tcp resends happening effectively counteracting the fq_codel limiter. It had limited success, I managed to ger the pings down to a band c from f.

I am not an expert on the queue lengths though so may have gone from one extreme to another. Are there any tips on tuning?

In terms of resources I over provisioned the VM with 8gb of ram and 4 cores with affinity set away from other guests. CPU on the dashboard never goes above 3%.

Are there any specific tuneable for virtio that might need setting?

cityfanminimos

Just an update for any others who may come looking in the forum for a similar answer.

I could not get a decent buffer bloat score when using the traffic shaper by itself, or with limiters on top.

I can get an A with the limiters by themselves, so I have setup a number of different pipes to have different levels of bandwidth which I can then apply to various vlans. For example IOT devices wont require that much bandwidth so I can throttle those at the VLAN level and use them like pseudo queues.

If you require a specific device routing to a particular pipe you will need to tag it with a firewall rule in the appropriate interface and then match that tag in a floating rule or you will end up with all your pipes getting matched which will see your traffic flitter between the queues. Don't forget to do it for all traffic types, e.g TCP/UDP.

I see this very much as a fudge and to be honest with just one floating rule and multiple clients trying to do a buffer bloat test it does ok at limiting the bandwidth out to the WAN and keeping latency under control, I post the above if you want a smidgin more control.

Of course I may be in the minority with this, and once my new hardware arrive I may try again to get it working a bit better