Traffic Shaper

MrHorizontal

A month ago I was asked by a fellow forum reader to try and do a Howto in the same vein as my Howto on OVPN but this time for the traffic shaping capabilities. Essentially meticulously reading Ermal's original traffic shaper briefing as well as actually looking at other guides on altq as well (especially OpenBSD and Dragonfly).

For most basic single LAN / single WAN setups, the current setup works just about, if you can get the wizard to work, and if you're not using any exotic setups, and pfSense is ONLY being used as an edge Internet access router.

Unfortunately, in my case as can be evidenced from my previous howto's my setups are anything but normal. But it just goes to prove how extensible and configurable pfSense is for the task, and at least I throw it a curve ball that it usually manages with flying colours even though I'm probably being insane doing it…

This, unfortunately doesn't extend to the traffic shaper. Which is a shame because altq is a feature of pf, and since pfSense is basically the ultimate ambassador to present the power of pf (the clue is in the name), it fails badly with altq.

Worse still, I feel there's actually no point writing a Howto for it at this stage, because IMHO the Traffic Shaper in 2.0 is just too alpha in its development right now, and I feel the end result will be a drastic change to what it is now for it to be sorted out properly. As such, the howto would be too out of date or otherwise limited in what's feasible with the current shaper management tools.

Here are some of the issues I've uncovered:

1. I have a WAN connection that's 18mbps down and 2.5 megs up. However, my WAN gateway is not my WAN interface, but the end point of an OpenVPN tunnel. So aside from the assumption that interface=gateway, OpenVPN has 2 issues with it for traffic shaping:
a) it compresses data so increases speed (I get about 4.3mbps up and on average 11% advantage on the downlink)
b) when an OVPN client is configured as an interface, altq can't determine the speed as the tun device doesn't report speed and throws an error.

2. On the LAN side, pfSense behaves as an inter-VLAN router, with 3 different interfaces. Given that you have to assign the downstream bandwidth to the LAN interface, and given one is a bond of 2 gbit interfaces with jumbo frames, as you can imagine, it's not very useful gimping a 2Gbps connection to 18mbps even when routing traffic between subnets...

So, not to be downheartened, I thought given my ADSL line, why not just have shaping on the uplink (ie WAN), since that matters more (especially with small packets like ACK and VoIP), and in any case if you shape your uplink, your downlink is pretty much optimised by the fact the returning packets have to pretty much come in the same order they were requested out. The asymmetrically higher down bandwidth should compensate for anomalies anyway...

To do this, I thought about assigning a root PRIQ scheduler on each of the VPN tunnels that would split the actual traffic according to %age, and then assigning the actual bandwidth (2.5mbps) on the WAN interface, which all the VPN tunnels share. No luck: having a root PRIQ scheduler throws an alert saying altq can't determine the bandwidth from the underlying interface (being a tun device)... so much for that then.

This of course is an 'advanced' traffic shaping setup that can't be done by the wizard, and whoa betide anyone who dares setup traffic shaping without the wizard. Ahh, the wizard. More to the point the bugs it has:

The wizard's config is saved in the config.xml so if you balls it up, any errors thrown mean the wizard will always fail validation. Surely the point of starting a wizard is to configure it from scratch or just read already-configured values as prefilled fields?
As raised in other threads, the issue of disparities between the unit values of bandwidth (it validates on the absolute value entered in the field, not translating kbps to x*1024 etc).

Next, we have the L7 bucket which just plain doesn't work.

Reading other threads, shaping is almost always configured when VoIP is used, and given that both ACK and VoIP are the out-and-out clear winners of shaping, would it be beneficial to have these auto-queued by default on any WAN interface, much like bogons have a little checkbox on the interface page?

Given that accurate bandwidth measurement is so important to setting up traffic shaping, and measuring DSL speed is a complex affair for laymen (most people enter their sync speed while not taking into account the ATM modulation overhead which is as much as 15% below sync speed), would it also be an advantage to actually measure bandwidth within pfSense when setting up a WAN interface?

So the issues are:

Root scheduler don't work on interfaces that have unknown bandwidth
Traffic shaper doesn't work at all on a Wifi interface (the interface doesn't appear in the list at all with urtw devices)
Traffic shaper doesn't work on interfaces that don't report speed (ie OVPN tun)
Wizards are b0rked.
L7 isn't done.
Downstream traffic is assigned to LAN interfaces and not the incoming data of the WAN interface which means they are shaped even though you don't want them to be.
No info on what the different schedulers (FAIRQ, PRIQ, HSFC, CBQ) do, despite frequent reference that deciding what scheduler to use is pretty critical (for reference FAIRQ is PRIQ with more burstability when another PRI isn't using its quota, CBQ is class-based queuing for different classes (ie subnets), and HSFC is bandwidth and priority delay).
The UI is quite frankly, rubbish and the 'necessity' to use a wizard screams that there's problem with the UI.
Given the number of threads discussing the shaper and L7, it's pretty clear that users are having a lot of problems with it, and it's a necessary target for pfSense 2.0's development.

Now I don't want to insult by saying that 1.2.3's shaper is better than 2.0, because technically, it's inferior. But in terms of UI configurability, it's vastly superior. What are the plans with the shaper then, because nothing's really changed since it was first implemented by Ermal as a bounty in March 2008 (over 2 years ago!).

As such this is a plea to both devs to actually plan and decide actions for the shaper going forward and a plea to forum users to reply to this thread with all the issues they have with the shaper, so we have a central reference thread for all traffic shaping issues and discussion on design choices going forward. In the meantime, I'll promise that a full-on howto will be delivered once the shaper actually works and plays nicely and becomes a worthy feature of pfSense once again.

jimp

There are several open shaper tickets in redmine, and from the sound of it you're hitting a few that weren't already there.

At least one thing you mentioned already has a bug report (sort of):
http://redmine.pfsense.org/issues/302 (Shaper wizard remembers values on error, but are disabled)

You might want to open tickets for each of those individually, with as much detail as possible. You have a lot of good detail except for the L7 bits. There are some people using L7 and working, so I would hesitate to say it's completely broken.

voona

I agree with everything in Mr Horizontal's post, just adding the L7 stuff doesnt work for me either and it doesnt look like the redmine ticket has been looked at for quite some time even after repeated efforts from Mike Stupalov.

http://redmine.pfsense.org/issues/636

MrHorizontal

@jimp:

You might want to open tickets for each of those individually, with as much detail as possible.

I've opened 3 bugs and a feature relating to this.

My real point to this post though is I think a lot of design decisions that have been made with the shaper's interface need to be reconsidered. To put it bluntly, the shaper UI is just plain wrong, and doesn't fit into the rest of pfSense given its awkwardness and complexity and the shaper's relative importance compared to say rules filtering. It just feels out of place, and wrong.

Equally doing things like binding downstream bandwidth delay to the LAN interface instead of the incoming stream of the WAN interface and mistaken assumptions like interface=gateway are all pretty fundamental design decisions that I'm showing have unintended consequences…

As such I think this is more than just a bug, but it may well be worthwhile seeing whether a better UI and approach to handling the shaper can be developed for the shaper that's not as complex and doesn't constrain the system as much as the current implementation does.

As for the L7 stuff, I didn't really try much, because a) it was a headache to set up with the interface and b) for my use I can identify traffic by ports, source and destinations so can effectively control an app's traffic by the traffic it generates in OSI Layer 3/4 - in other words using the firewall rules.

In any case, does L7 actually do anything differently other than build some sort of special alias that identifies an 'App' and create a rule? Does it not use pf itself or does it use some other program / module?

It's also this sort of vagueness and 'dark arts' employed by 2.0's shaper that just confuses the hell out of administering it, which I only highlighted by the lack of explanation of everything about it.

So when I say it needs a good hard look at it, I'm essentially saying 'weigh the advantages of fixing the shaper's UI versus rewriting it from scratch', because in it's current form it really is pfSense 2.0's most poorly implemented feature IMO.

jimp

@MrHorizontal:

Equally doing things like binding downstream bandwidth delay to the LAN interface instead of the incoming stream of the WAN interface

This is just how ALTQ works. You have to shape when exiting an interface. You can't shape incoming traffic, only outgoing.

As for the other points, I haven't spent enough time with the shaper in 2.0 to really say one way or the other.

stompro

@jimp:

This is just how ALTQ works. You have to shape when exiting an interface. You can't shape incoming traffic, only outgoing.

As for the other points, I haven't spent enough time with the shaper in 2.0 to really say one way or the other.

Does this also mean that multi lan doesn't really limit all the lan interfaces(in a multi-lan setup) as a whole to the downlink speed? But only individually?

If I wanted to limit my download to 90% of the available download speed to make sure latency stays low, and I go through the multi-lan or multi-all wizards. If both lan and opt1 have downloads going at the same time will they individually be limited to 90% of the bandwidth, so both of them going at once would saturate the download. Or are the queues on lan and opt1 tied together so that the collective downlink speed won't go over 90% ?

Thanks

jimp

I believe they are handled separately, but I'd have to really check to confirm that.

When a request goes out a given WAN, the rule it matches will be tied to a state, and the return traffic for that state (the actual download) is tied to that state as well. If the in/out queues on the outgoing rule specify queues with the appropriate bandwidth, it will work properly.

eri--

I can tell you for your PRIQ problem to just setup with the root scheduler your fixed bandwith in Mbits or kilobits and you will not have problems.

stompro

@jimp:

I believe they are handled separately, but I'd have to really check to confirm that.

When a request goes out a given WAN, the rule it matches will be tied to a state, and the return traffic for that state (the actual download) is tied to that state as well. If the in/out queues on the outgoing rule specify queues with the appropriate bandwidth, it will work properly.

I really want to give another example to make sure my meaning is clear.

For instance, the queues for both the Lan and Opt 1(Lan2) interfaces have 768kbit/s set as the bandwidth(that is what the wizard is given). The actual downlink connection is a T1, so 1536kbit/s. Uplink doesn't matter since I never fill it so the queues never have to do anything. If my intent is to never use more than half of my download bandwidth, would this setup handle it. If there is a 768kbit/s download happening on LAN and a 768kbit/s download happening on OPT1, would the shaper limit each of the downloads to 384kbit/s, so the amount of traffic leaving the queues and being sent out on those two interfaces is not greater than 768kbit/s. Once the download on LAN is done the download on OPT1 would scale up to 768kbit/s to use the max download bandwidth I set in the traffic shaper wizard.

Or is multi-lan traffic shaper only useful if you want to subdivide your download bandwidth between different interfaces(X total bandwidth, y=lan bandwidth, z=opt1 bandwidth, y+z=X), or if all you are concerned with is upload shaping?

Thanks

stompro

@ermal:

I can tell you for your PRIQ problem to just setup with the root scheduler your fixed bandwith in Mbits or kilobits and you will not have problems.

Ermal, I'm dealing with connections that have buffer issues, the modems/routers that I have no control over have large data buffers, so any time I get close to our max download bandwidth, the ISP's equipment starts to queue up packets and my latency goes up past 800ms - 2000ms.

So I'm trying to use the packet shaper to make sure I never get too close to the max download bandwidth.

So are you saying that if I use the traffic shaper wizard (multi all) and tell it my max fixed bandwidth, up/down. It will coordinate between the different queues on each of my lan interfaces and make sure as a whole they don't transmit more than my max bandwidth to clients at any one time. Is that just a built in feature that works because the queues on each interface are named the same? Is that the point of qinternet? Is qinternet the root scheduler?

Thanks
Josh

eri--

Tell me what policy you want to configure and i will try my best to guide you to implement it.

Regarding splitting the bandwidth evenly between to lans it will do the even split if you place, in PRIQ case for example, both subnets on the same priority. I.E port 80 from both subnets at priority 8.

Liath.WW

@MrHorizontal:

@jimp:

You might want to open tickets for each of those individually, with as much detail as possible.

I've opened 3 bugs and a feature relating to this.

My real point to this post though is I think a lot of design decisions that have been made with the shaper's interface need to be reconsidered. To put it bluntly, the shaper UI is just plain wrong, and doesn't fit into the rest of pfSense given its awkwardness and complexity and the shaper's relative importance compared to say rules filtering. It just feels out of place, and wrong.

Equally doing things like binding downstream bandwidth delay to the LAN interface instead of the incoming stream of the WAN interface and mistaken assumptions like interface=gateway are all pretty fundamental design decisions that I'm showing have unintended consequences…

As such I think this is more than just a bug, but it may well be worthwhile seeing whether a better UI and approach to handling the shaper can be developed for the shaper that's not as complex and doesn't constrain the system as much as the current implementation does.

As for the L7 stuff, I didn't really try much, because a) it was a headache to set up with the interface and b) for my use I can identify traffic by ports, source and destinations so can effectively control an app's traffic by the traffic it generates in OSI Layer 3/4 - in other words using the firewall rules.

In any case, does L7 actually do anything differently other than build some sort of special alias that identifies an 'App' and create a rule? Does it not use pf itself or does it use some other program / module?

It's also this sort of vagueness and 'dark arts' employed by 2.0's shaper that just confuses the hell out of administering it, which I only highlighted by the lack of explanation of everything about it.

So when I say it needs a good hard look at it, I'm essentially saying 'weigh the advantages of fixing the shaper's UI versus rewriting it from scratch', because in it's current form it really is pfSense 2.0's most poorly implemented feature IMO.

Perhaps this touches on the issues I was having, which ended up with me giving up on 2.0 for the time being. Has any of this been addressed, or are there plans to, around or before 2.0 – or is this slated for 2.x? The only reason I've given up on 2.0 is traffic shaping, which considering 1.2.3 the shaper rules worked perfectly I'd have thought 2.0 would have as well.

I've configured a *BSD machine for this purpose before by hand (google helped lots). And it also worked really nicely, but having a GUI to quickly address any issues quickly and efficiently is what got me to try m0n0wall, and then pfSense -- and I do not look forward to reverting back to mucking around in console text-editors.
While I could edit the pf.conf rules and did so fairly easily once I got the knack of it, I knew exactly what was going on in my configuration because I wrote every line of it. Reading the pfSense rules loses me because I have no idea what goes where when some-one (or -thing) else writes out the file. And I can't figure out how to manually configure the file... or I would :D

I'd love to see the 2.0 shaper working, and the rules to make more sense -- as I'm sure that the shaper does indeed work, but the gui is so mangled that configuring it requires a bit of knowledge about the way pfSense translates the GUI to the real pf rules, which is why certain people seem to have zero problems with it, but others can't get it to work right at all.

If I could get the configuration of pf rules to work in reverse, maybe I could figure out how the GUI options are working, and have one of those 'aha!' moments and be golden.

Also, just to note something: I know I sound really b****y at times, and I want the devs to know that I'm trying not to be as I do appreciate all the work that goes into pfSense -- and for free at that, which is a really strong point in pfSense's favor for me. But, traffic shaping is a huge part of pf and is why I chose pfSense to begin with, as if it a big part of pf itself as MrHorizontal states in the OP. So when the rules in the 2.0 GUI do not work as intuitively as in 1.2.3, I get frustrated -- as 2.0 has so very many improvements in so many areas but one of the core parts of PF seems to have slipped to the wayside.

dszp

I don't have a solution and I agree that pfSense 2.0 traffic shaping is more complicated and doesn't always work (for me, possibly user error) so far. However, I wouldn't assume that a beta works, for one, or is documented, for two, and I would say that "because it worked in 1.2.3 it should work in 2" is a completely invalid statement. The shaper was basically rewritten from scratch to be more flexible. By definition, that means working in 1.2.3 won't necessarily work in 2, and that there are likely bugs to work out and new documentation to write. Which I'm waiting on myself :-) But the shaper's all new, it's all to be expected, IMNSHO :-)

Liath.WW

@David:

However, I wouldn't assume that a beta works, for one, or is documented, for two, and I would say that "because it worked in 1.2.3 it should work in 2" is a completely invalid statement.

Actually it is a valid statement. While I understand that BETA software is buggy, this thread is meant to point out that there are some serious deficiencies with the shaper and/or webGUI that need to be addressed, especially before pfSense goes to RC or Release. One of the best attributes of pfSense is PF – and its ability to shape traffic without murdering performance. I used to run a linux firewall before I learned the awesomeness that is BSD and PF + ALTQ.

In the sake of testing, it's frustrating. I don't know if it is PF itself not acting quite right (less likely) or if there are issues with the GUI (likely) or the code that translates the "WYSIWYG" of the GUI into actual PF rules (maybe, but probably not (Ermal is really good)).

Actually I'm not sure if Ermal coded the traffic shaper part of pfsense, but since he wrote quite a bit on the subject and seems to know his stuff when it comes to that aspect, I'm making a bet that he did :P

sullrich

We are going to review the shaper again in the coming weeks. Unfortunately it has not received enough attention that it requires due to a number of other lingering issues but we'll get there soon..

PS: if anyone wants to help speed up the review/fixes you can use portal.pfsense.org time towards it to make it a priority.

Liath.WW

Ahh okay. That helps a bit. I wish there was another way to get some more priority, something like the bounty system for aspects of pfSense "Base" that people could contribute to.

Most people don't exactly have half a grand laying around, but those of us with 10-50 bucks might be able to 'raise interest' in the areas they need. I think I discussed this in a thread about bounties and such before.

[Edit] I guess a plain and simple donation might work. Considering I've been using pfSense for about… I think 3-4 years now. :P Dunno if there is a comment field in the donation field.

Anyhow, once you all start looking into the traffic shaper, I'll be more than happy to test and provide feedback on the GUI aspect of it. Might be a month or so before I get back online and get a chance to, though -- moving!