Calculating the required bandwidth for ACK queues for asymetric link

• Hi,

First of all I would say thank you to the pfSense developers for this great product.

And I would request for comments on my calculation below. Any comment will be very appretiated.

Thank you and best regards.

Dusan

–------------------------------------------
Motivation

I tried to setup Traffic Shaper for an asymmetric Internet link (uplink = 256 kb/s, downlink = 512 kb/s). The Traffic Shaper Wizard offer the same value of 10% bandwidth for both qWANack and qLANack queues.

This would mean that an ACK queue's bandwidth for a network interface be directly related (proportional) to the total bandwidth of that interface. For me, this doesn't seem reasonable. Imo, as an ACK queue is queuing ACK packets for traffics in the opposite direction, its bandwith should be somehow related to that of the opposite interface. Not very satisfied with the default bandwidth by Traffic Shaper Wizard, I decided to calculate the bandwidths myself.

Formulas

I assume that ACK queues are queuing only ACK packets.

Let A be bandwidth(qWANroot). Let B be bandwidth(qLANroot).
Let X be bandwidth(qWANack). Let Y be bandwidth(qLANack).

The bandwidth of uplink ACKs should be directly related to that of other traffic's downlink, and vice versa:

(1)  X / (B - Y) = avg. packet size(qWANack) / avg. packet size(other incoming traffic)
(2)  Y / (A - X) = avg. packet size(qLANack) / avg. packet size(other outgoing traffic)

Average packet sizes are not known a priori, but it appears a good practice to use the same value, say, 1/R, for the right side of both the equations. R = 9 seems to be a good choice. As we'll see below, one should choose R = 9 to be consistent with the 10% default bandwidth set by the pfSense' Traffic Shaper Wizard in case of symetric (up = down) links. Thus

(1') X / (B - Y) = 1/R
(2') Y / (A - X) = 1/R

The solution of these equations is

(3)  X = (BR - A) / (R^2 - 1)
(4)  Y = (A
R - B) / (R^2 - 1)

We prefer expressing things in ratios, so let K = B/A and the solution be re-written as

(3') X = A * (R*K - 1) / (R^2 - 1)
(4') Y = B * (R/K - 1) / (R^2 - 1)

Examples

For all examples, R = 9, thus R^2-1 = 9^2-1 = 80.

Ex.1. For K = 1 (symetric link):
X = A * (9*1-1)/80 = A * 0.1
Y = B * (9/1-1)/80 = B * 0.1

So, the default bandwidth 10% offered by Traffic Shaper Wizard would be good for symmetric (up = down) links.

Ex. 2. For K = 2 (my "motivation" case)
X = A * (9*2-1)/80 = A * 0.21
Y = B * (9/2-1)/80 = B * 0.044

So, I would increase bandwidth of qWANack to 21% and I might reduce qLANack to 5% or even 4%.

Ex. 3. For K = 4.
X = A * (9*4-1)/80 = A * 0.44
Y = B * (9/4-1)/80 = B * 0.017

i.e. one should set qWANack to 44%, qLANack 2%.

X = A * (9*8-1)/80 = A * 0.89
Y = B * (9/4-1)/80 = B * 0.0016

One should set qWANack to 89%, qLANack 1%.

Discussion

The equations above only recommend bandwidth values, which are only used as m2, the second slope in a HFSC service curve, namely the linkshare curve. For better latency, one should set some enough high value to m1, the first slope, for a service curve (typically the realtime curve) as well.

• I don't quite understand where the equations came from or the variables defined in them.  However, feel free to come up with a patch, or add this information to http://wiki.pfsense.com/wikka.php?wakka=HFSCBandwidthShapingNotes

–Bill

• Well, R represents the ratio of amounts of service that should be allocated to all other traffics and ACK traffic in the same duration (say, 1s).

Whether it is ratio of packet sizes (as shown in equations (1) and (2)) is not important. The point is the bandwidth 10% set by the Traffic Shaper Wizard, which corresponds to R = 9. I believed it is a good choice in the case of symmetric link, and tried to use it as a known parameter in the general case. ;)

• Well, R represents the ratio of amounts of service that should be allocated to all other traffics and ACK traffic in the same duration (say, 1s).

Whether it is ratio of packet sizes (as shown in equations (1) and (2)) is not important. The point is the bandwidth 10% set by the Traffic Shaper Wizard, which corresponds to R = 9. I believed it is a good choice in the case of symmetric link, and tried to use it as a known parameter in the general case. ;)

Ahh, see the 10% value is just a value chosen almost purely at random.  It works…I agree, it may not be optimal, but it's a value that seems to work for now.  If you can come up with a better method, by all means, I'm listening.  HFSC is enough of a black art that I certainly don't claim to fully understand it.

--Bill

• Based on statistical data found via Google (confirmed by pftop on my own network traffic) I've re-calculated the maximum bandwidth of qWANack and qLANack. (I'm totally unfamiliar with traffic engineering and the likes, so please correct me if I'm wrong.) As shown below, the calculation supports our feeling that 10% is good in case of symmetric link, and shows that my figures in post #1 for asymmetric links were too aggressive.

For the purpose of calculation we assume a simple model, with qWANroot consisting of only qWANack and qWANdef, and qLANroot consisting of only qLANack and qLANdef. Thus outgoing and incomming packets other than ACKs are all queued in qWANdef and qLANdef, respectively.

A, B, X, Y are defined as above. Let C = bandwidth(qWANdef), D = bandwith(qLANdef). Thus A = C+X and B = D+Y. In post #1, calculation was based on the system of equations X/D = Y/C = 1/9, which was too simplified:

• It establishes some relation between the amount of data in certain direction and ACKnowledges in the opposite direction, but not yet the relation between the amount of incomming and outgoing data.

Now we fix both the problems by assuming every (type of) traffic be characterized by a pattern, which is a 4-dimensional vector (c,d,x,y), where c, d, x, y is the amount of data passing through qWANdef, qLANdef, qWANack, qLANack, respectively, in the same duration (say, 1s), of that particular traffic.

For example, the traffic pattern for Web surfing is (0.11, 1, 0.075, 0.075), saying that in order to receive 1 kb of useful Web data, 0.11 kb must be transmitted, furthermore 0.075 kb of ACKs must be transmitted and other 0.075 kb of ACKs must be received in 1s.

The actual activity of i-th traffic is then obtained simply by multiplying that vector with a non-negative variable. For example, v*(0.11, 1, 0.075, 0.075) shows how much bandwidth are required for each queue in order to surf Web at v kb/s.

The (new) calculation is based on six types of traffic, corresponding to variables v0 through v5:

v0 * (1, 0.40, 0.01, 0.04) is p2p upload activity
v2 * (1, 0.1, 0.015, 0.05) is other bulk upload activity
v4 * (1, 0.11, 0.075, 0.075) is web server activity
v5 * (0.11, 1, 0.075, 0.075) is web surf activity

and constrained by:

sum(v_i * a_i) <= A
sum(v_i * b_i) <= B

where a_i = c_i + x_i and b_i = d_i + y_i and i running in 0,…,5.

v_0 * c_0 <= A * 0.8
v_1 * d_1 <= B * 0.8

We are now ready to estimate X and Y, the required bandwidth of qWANack and qLANack, in a worst-case fashion:

X = max(sum(v_i * x_i))
Y = max(sum(v_i * y_i))

One can find the max's using any linear programming solver. I used M\$ Excel :-).

Examples

Recall that K = B/A. If (say) A=128 kb/s and B=1024 kb/s, then K=8.

Ex 1. For K=1, namely A=1, B=1 [mb/s], X/A = Y/B = 11.90% at (v4 = v5 = 0.794, others v_i = 0, here and below, unshown v_i are zeros), both links saturated.

Ex 2. For K=2, namely A=1, B=2 [mb/s], X/A = 17.86% and Y/B = 8.93% at (v4 = 0.629, v5 = 1.752), both links saturated.

Ex 3. For K=4, namely A=1, B=4 [mb/s], X/A = 29.76% and Y/B = 7.44% at (v4 = 0.299 ; v5 = 3.67), both links saturated.

Ex 4. For K=8, namely A=1, B=8 [mb/s], maximums of sum(v_ix_i) and sum(v_iy_i) are not reached at the same time:
X/A = 43.58% while sum(v_iy_i) = 3.94% at (v1 = 4.016 ; v5 = 3.669), both links saturated.
Y/B = 5.07% while sum(v_i
x_i) = 40.54% at (v5 = 5.405), uplink is saturated (at A100%), while downlink is loaded at B72.64%.

Notes

The traffic patterns (c_i, d_i, x_i, y_i) used here are illustrative. Nevertheless, the examples of calculation suggest that 10% would suffice in the case of symmetric and nearly-symmetric link, and in the case of very asymmetric link, one might consider setting some larger bandwidth for qWANack (although not too large as suggested in my post#1 above).

Game and VoIP clients do not figure here, as they typically use UDP which – I think -- does not use ACK queues. In a network with game and VoIP continuously backlogged, then ACK queues require -- in percentage -- yet lower bandwidth.

This is a worst-case analysis. But it is possible to estimate the bandwidths more closely to the average case by further constraining with upper limits that shouldn't be much higher than the desired linkshare.

• Maybe you can try to verify your formulas with these " shot in the dark unscientific" testing on ACK queues and an asymetric 16000kbit/s down 800 kbit/s up line:

When I used 10% acks up and down I was not able to get more than about 8 mbit down when up was saturated.

I changed the qwan acks (were showing drops with the 10% setting) to 40% and my downloads raised to 12 mbit with saturated upload.

I changed qwan acks to 60% and my downloads went up and down between 13 and 15 mbit/s with saturated upload.

Some more observations:

Pings with no load are about 7 ms on my line, with full upload I have about 10-20 ms, with full up and down and 60% ack setting (only qwan acks btw) they are between 30 and 110 ms (with even some lost pings judging from the rrd's).

• Thank you for your data.

Based on the "400 kbit/s ack only traffic to download with 16 mbit/s", I added another pattern, "FTP client", which is (epsilon, 1, 0.025, epsilon/5), where epsilon is somewhat arbitrarily chosen (I let it be 2.5E-4).

I've tried using them as a next example of calculation:

K=20. A=0.8, B=16 [mb/s].
"web server" and "web surf" are excluded (i.e. upper limited by 0).

Let v7 be the "activity" variable for FTP client (unconstrained).

Given that, WAN acks reached its maximum = 60,59% (while LAN acks reached 0.34% which isn't its maximum) at (v1 = 1.6; v3=2.483; v7=11.863), both links saturated.

As noted, my traffic patterns are only illustrative ("bulk download" and "bulk download" was based on audio streaming), to be accurate one would collect his/her own FTP download traffic patterns, say, from the values shown by pftop in its "Queues" screen in "P/S" (packets/second) and "B/S" (bytes/second) columns for qWANack, qLANack, qWANothers, qLANothers, qFTPup, qFTPdown when running (almost) solely FTP at full load.

Where qWANothers (qLANothers) is the sum of all uplink (downlink) queues except qWANack (qLANack), qFTPup and qFTPdown are the queues containing FTP traffic.

• If you want to calculate a trafficshapingsetup for my connection and give me advice how to configure it or even sending me the trafficshaperpart of the config.xml (you can backup only this part from your pfSense) I can test it for you and report back how well these settings work. If we get some good results we can start integrating your formulars into the wizard  ;D

• As you can see, even with not-so-cool traffic patterns, my first attempt for an open formula (post#1) fails. For now, I can only offer closed formulas which must be solved by linear programming solvers.

Linear programming based on traffic patterns cannot answer the "What should I do" question, so it cannot recommend any optimal traffic shaping setup. It can only answer the "What if" questions. For example:

• What maximum % qWANack would reach if I run p2p limited by 80% bandwidth, surf Web and check email at most 50% bandwidth and download FTP unlimited?

• Isn't that maximum too overestimated if I want to exploit my downlink at 90%?

Note that if we want to predict the behavior with respect to linkshare setup, we must constrain not only the variables v_i but also ratios between some of them. Then the problem goes beyond the scope of linear programming. However, one can still analyze it with the same "traffic pattern" approach, using a non-linear solver (MS Excel, for example).

Can we still develop an (empirical) open formula? Yes we can, but much works must be done before any second attempt:

1. Collect more realistic traffic patterns.

2. Select several of them as representatives.

3. Analyze and estimate the maximum bandwidth of qWANack (maybe qLANack too) for K=2, 4, 8, 16, 20, 24 and 32 (if any).

4. Let people try them out. Listen to feedback.

If no problems are reported, build the formula and (if it's desired,) integrate it to the wizard.

–--------------------
Hoba,

• What about a userland daemon that watches traffic and dynamically alters the HFSC profile as traffic patterns change?

Thanks!

• Adaptive traffic shaping? Would be great. There is a software named QoSbox that change linkshare dynamically. Maybe you'll find it helpful. For more details, refer to http://www.cs.virginia.edu/~mngroup/projects/qosbox/docs.html.

In the context of ACK queues in HFSC, however, I think it's not so hot topic. Because it's almost surely harmless assigning some large value (say, 60%) to qWANack linkshare or realtime curve's m1. If that bandwidth is needed, it's used. Otherwise it is allocated to other traffics.

Just my opinion.

–----

• Sounds great.  You seem to know a lot more about HFSC than we do.  Want to adopt our code?  :)

qWanAck.xls

• the outcome of this topic sounds really interesting and potentially very useful :)
best of luck

Very thanks!!!

Very thanks!!!

• where find cell for LAN ans WAN ack? I am not uderstand this sheet :-(
Thanks!

• A (UP)
B (DOWN)

• Then

• click Tools/Solver…
• click Set Target Cell, click R15C12 (or R16C12), click Max, click Solve
• click Keep Solver Solution, click OK
The required X [kb/s or mb/s] is shown in R15C12. The required X/A [%] is shown in R16C12.

• dusan can you please explain the rationale behind this in a formal way.

I want to integrate this in the shaper wizard and the excel is not easily readble/understandable.

• Well here's an explaination which – I hope -- is more detailed and formal.

All symbols are defined as before but we rather sumarize them here:

A = bandwidth of qWANroot
B = bandwidth of qLANroot
C = bandwidth of qWANdef
D = bandwidth of qLANdef
X = bandwidth of qWANack
Y = bandwidth of qLANack

In this model, we make use of no other queues:

A = C + X
B = D + Y

(To be exact, we know A, B and don't know C, D, X, Y. The analysis shows how much qWANack and qLANack could be actually utilized assuming they may be as large as needed, i.e. unbounded by anything else than A and B, respectively, that's what would be taken as the required value for X and Y and the rest of A and B would simply become C and D, respectively.)

Consider a single, i-th, traffic. The traffic varies in time, and utilizes four queues qWANdef, qLANdef, qWANack and qLANack at the same time. However, assuming that for the traffic, the four queue utilization "amounts" are directly related (rather than independent) by some constant coefficients (c_i,d_i,x_i,y_i), we represent the traffic "activity" as a single variable (v_i) rather than four. For example, for i=5 (Web surf traffic), the queue utilization "amounts" are assumed to be (v5c5, v5d5, v5x5, v5y5) at every time. The vector (c5, d5, x5, y5) = (0.135, 1, 0.0375, 0.0125), which is assumed constant for every network with any ratio of asymmetricity, is called Web surf traffic pattern.

(The assumption is supported by experimental observation. At time t in network N, it was observed that the qWANdef, qLANdef, qWANack, qLANack utilization "amounts" are

0.135, 1, 0.0375, 0.0125 [kb/s], respectively

and at time t' in network N', it was observed that the four queue utilization "amounts" are

135, 1000, 37.5, 12.5 [kb/s], respectively.

The experiments were made under Web surf traffic solely and no others involved, of course.)

The spreadsheed makes use of 8 traffic patterns, indexed by 0 through 7.

For every i in 0…7, let

a_i = c_i + x_i
b_i = d_i + y_i

At every time, the qWANdef utilization is

v0 * c0 + ... + v7 * c7

the qWANack utilization is

v0 * x0 + ... + v7 * x7

the qWANroot (ie. uplink) utilization is the sum of the two above:

(v0 * c0 + ... + v7 * c7) + (v0 * x0 + ... + v7 * x7)
= (v0 * c0 + v0 * x0) + ... + (v7 * c7 + v7 * x7)
= v0 * (c0 + x0) + ... + v7 * (c7 + x7)
= v0 * a0 + ... + v7 * a7

the qLANdef utilization is

v0 * d0 + ... + v7 * d7

the qLANack utilization is

v0 * y0 + ... + v7 * y7

the qLANroot (ie. downlink) utilization is the sum of the two above:

(v0 * d0 + ... + v7 * d7) + (v0 * y0 + ... + v7 * y7)
= (v0 * d0 + v0 * y0) + ... + (v7 * d7 + v7 * y7)
= v0 * (d0 + y0) + ... + v7 * (d7 + y7)
= v0 * b0 + ... + v7 * b7

Now we can construct a system of inequations, each represents a constraint, of 8 unknowns v_0 through v_7.

(C1) v0 * a0 + ... + v7 * a7 <= A

(C2) v0 * b0 + ... + v7 * b7 <= B

All network traffic must be non-negative:

(C3) v_i >= 0

(C4) v0 * c0 <= A * 0.8

(C5) v1 * d1 <= B * 0.8

The bounds like A and B are to be filled as values in row 3 of the Excel spreadsheet. The bounds like A0.8 and B0.8 are pre-filled as formulars in columns 8 and 9 of the sheet.

The MS Excel Solver find a solution of (C1)-(C5) that maximize a user-selected target cell. Note that we are concerned of queues' utilization implied from the solution, not the solution itself. Of particular interest may be the utilization of qWANack, qLANack, uplink and downlink. The spreadsheet includes formulars for them. We can select one of them as the target and observe others as the implied consequence.

As I've said, the Excel is for the purpose of analysis.  It's not suitable and not worth to integrate in the Wizard as such.

• A = interface/tocken bucket bandwidth
c = observed from the samples.
x = observed from the samples.

SO you are saying that the basic equation for a queue is:
A >= c*x (if we have a single queue under A).

and it transforms to
A >= sum(c_i* x_i) (for i queues).
and each queue gets c_i + x_i

If that is right and one wants to write a daemon that start by the assumtion of the constants calculated/observed by your testings and smaples traffic to adjust the i queues accordingly how can one calculate the c_i/x_i to be used later on a new calculation.
Basically can you provide even the calculation for the variables so one can write such a daemon?!

I hope to have understood your explanation. The rationale of integrating this with the wizard is to be coupled with such a daemon to make sense.

• @eri--:

A = interface/tocken bucket bandwidth
c = observed from the samples.
x = observed from the samples.

SO you are saying that the basic equation for a queue is:
A >= c*x (if we have a single queue under A).

I am not sure if we are using the same symbol definition. My equations doesn't say anything about c*x, only c+x.
And given only one [sub-]queue under qWANroot, there exists either c or x, but not both.

@eri--:

and it transforms to
A >= sum(c_i* x_i) (for i queues).
and each queue gets c_i + x_i

Neither i was defined to be the number of queues, or the index of a queue. It was defined to be the index of a traffic (or, to be precise: a type of traffic). We use four [leaf-level] queues and eight [types of] traffic. The 48 matrix is enough to estimate largest qWANack utilization. You can extend it to, say, 14500 if you see any benefit of the extension and have a good traffic analyzer (see below).

@eri--:

If that is right and one wants to write a daemon that start by the assumtion of the constants calculated/observed by your testings and smaples traffic to adjust the i queues accordingly how can one calculate the c_i/x_i to be used later on a new calculation.
Basically can you provide even the calculation for the variables so one can write such a daemon?!

I hope to have understood your explanation. The rationale of integrating this with the wizard is to be coupled with such a daemon to make sense.

I interpret that such a daemon should

1. analyze traffics to determine their patterns, and

2. optimize linkshare setup using the traffic patterns determined.

As for 1), with firewall's (at the transport and/or the application layer) service we can parse every packet and determine the traffic it belongs to. Packet classification would enable statistical analysis. Based on the analysis we can observe patterns, if any exists in any sense. So the problem is solved, in principle.

As for 2), however, one may cast question: what is the objective, the goal, the target, or the criteria of such an optimization?

• Heh i misunderstood you.

So what you're saying is that your is just an approach based on values observed for a specific traffic and specific config?!

I thought it was some generalized schema to achieve this.

For 1) i concur it is easily solvable.
2) is just providing an adaptive shaping to the patterns/classes of traffic the user selects.

think in terms of RSVP which wants for a specific class reserved traffic.
What i wanted to accomplish was just pessimization or optimization of traffic in behalf of the consumer.
Say you want that if the web traffic increases by 10% and at the same time VoIP traffic does the same we choose to serve VoIP better and pessimize the web traffic.

• @eri--:

Heh i misunderstood you.

So what you're saying is that your is just an approach based on values observed for a specific traffic and specific config?!

I thought it was some generalized schema to achieve this.

Play with the spreadsheet, then try other traffic patterns, play with it again and draw your own conclusion.

@eri--:

For 1) i concur it is easily solvable.

Me too. I just said that it is principally solvable.

@eri--:

1. is just providing an adaptive shaping to the patterns/classes of traffic the user selects.

think in terms of RSVP which wants for a specific class reserved traffic.
What i wanted to accomplish was just pessimization or optimization of traffic in behalf of the consumer.
Say you want that if the web traffic increases by 10% and at the same time VoIP traffic does the same we choose to serve VoIP better and pessimize the web traffic.

That's not an optimization. That's a static (and simple) policy. I see no need to adaptively change the linkshare or realtime service curves. Construct static curves preferring VoIP and pfSense will do the rest for you.

Btw, I don't think there could be a smart deamon that knows what user wants. User must express his/her needs, i.e. shaping policy, in terms of service curves. That's user's job, not the deamon's one.

• Yeah but i cannot teach HFSC or CBQ to anybody in the forum and they need some backroung to undertand/control their behaviour. This daemon configurable by the user would at least make it easier for home users to get right and get me some statistical data to generalize the configuration from a wizard.

• Then the wizard/daemon should simulate the policy-making-and-adjusting process of an exprerienced user, right? That seems to be a different approach (as opposed to optimalization). It's more practical and simpler to implement.

• We get rid of the need to define optimalization criteria. (So, the policy need not be optimal in any sense.)

• There is good chance that such a policy works well.

Problem is, no one know exactly how such a policy looks like. So let's discuss it.

Here a policy that works fine for me (as a home user). The basic rationale behind is to apply run-time curves whenever there is VoIP and/or game traffic, and to apply link-share curves otherwise.

A. Root queues

Set qWANroot and qLANroot to 80% up/downlink bandwidth. (Maybe 90-95% for fast [mb/s] links.)

B. Sub-queues

For each up/down link, use only seven predefined sub-queues – p2p, low, default, high, games, voip, ack, and set the following queue priorities (from lowest to highest).

Priority 1. p2p
Priority 2. low
Priority 3. def[ault]
Priority 4. high
Priority 5. game
Priority 6. voip
Priority 7. ack

C. Upper-limit curves

1. Set upper-limit curves for p2p queues only.

2. Suitable upper limits of qWANp2p and qLANp2p are 80% of qWANroot and qLANroot, respectively.

1. The link share (i.e., m1 and m2) of qWANack should be made as large as required (by calculation, excluding VoIP and game activities).

2. The link share of qWANhigh should be made several (say, 2 - 3) times larger than that of qWANdef.

3. The link share of qWANdef should be made several (say, 4 - 6) times larger than that of qWANlow.

4. The link share of qLANlow should be the same as that of qWANp2p.

5. The link shares of qLANack, high, def, low, p2p are set similarly.

E. Realtime curves

1. If VoIP or games also use ACK queues, set qWANack's m1 as large as required (by calculation, excluding everything in qWANp2p, qWANlow, qWANdef, qWANhigh).

2. Set qWANack's m2 = m1.

3. Set qLANack's m1 and m2 similarly.

4. For qWANvoip, given a required number of concurrent calls and average voip packet size, set m1 such that an average packet shall not delay longer than specified. Set qLANvoip's m1 similarly. If the 'given' parameters are not given then, as the default, set the required number of concurrent calls to 1, the average packet size to 150 bytes and the maximum allowable delay to 10 ms (so, qWANvoip's m1 = qLANvoip's m1 = 120 kb/s per call).

5. qWANgame and qLANgame's m1 are set similarly to voip. Note however that the average packet size (and the maximal allowable packet delays) for need not be the same for incoming and outgoing direction. If the 'given' parameters are not given then, as the default, set the required number of concurent local players to 1, the average packet size to 200 bytes in both directions, and the maximum allowable delay to 50 ms in both directions (so, m1 = 32 kb/s per player for both qWANgame and qLANgame). Also note that in some cases, after allocating m1 to qWANack and qWANvoip, taking into account that the sum of real-time service curves must not exceed 80% qWANroot, few percentage of m1 remains and the best we can do is simply to allocate the rest to qWANgame, thus we can only guarantee the game delay as low as possible.

6. [Part of] the rest of m2 is allocated to qWANvoip's m2 and qWANgame's m2 in amounts required for reasonable traffic. Similarly for qLANvoip's m2 and qLANgame's m2. If not further specified, the 'reasonable traffic' means 32 kb/s per concurrent VoIP call plus 32 kb/s (in both directions) per concurrent local game player.

• Thank you for helping in this, 2 minds are always better than 1.

Then the wizard/daemon should simulate the policy-making-and-adjusting process of an exprerienced user, right? That seems to be a different approach (as opposed to optimalization). It's more practical and simpler to implement.

• We get rid of the need to define optimalization criteria. (So, the policy need not be optimal in any sense.)

• There is good chance that such a policy works well.

Problem is, no one know exactly how such a policy looks like. So let's discuss it.

Well just an update, the new wizard uses percentages values for getting input from user. Since there can be multiple links, which the wizard now supports, it is not preferable to ask the user how much he/she wants for each separate link so to me percentages made the most sense. They allow me to apply the correct policy to all the links selected, which might be of different scheduler even.

My comments and point of discussion are below.

Here a policy that works fine for me (as a home user). The basic rationale behind is to apply run-time curves whenever there is VoIP and/or game traffic, and to apply link-share curves otherwise.

A. Root queues

Set qWANroot and qLANroot to 80% up/downlink bandwidth. (Maybe 90-95% for fast [mb/s] links.)

B. Sub-queues

For each up/down link, use only seven predefined sub-queues – p2p, low, default, high, games, voip, ack, and set the following queue priorities (from lowest to highest).

Priority 1. p2p
Priority 2. low
Priority 3. def[ault]
Priority 4. high
Priority 5. game
Priority 6. voip
Priority 7. ack

The number of queues might be less depending on what the user chooses, but the wizard will not create more than this queues if you select all options.
I select the default queue depending if the user has selected p2pcatchall or not. If he does p2p becomes the default queue otherwise default queue will keep all uncategorized traffic and default queue always has better priority than p2p and lowerPriority queue.

Keep in mind we are talking about the modified wizard that is on RELENG_1 tree.

C. Upper-limit curves

1. Set upper-limit curves for p2p queues only.

2. Suitable upper limits of qWANp2p and qLANp2p are 80% of qWANroot and qLANroot, respectively.

2- Do not need to since default already do that. For your information, there is no more a qWANroot/qLANroot since there is one provided by HFSC discipline by default. That's why i do not need to setup those queues and i setup upperlimit of p2p for now to 20- 30% of the link.

1. The link share (i.e., m1 and m2) of qWANack should be made as large as required (by calculation, excluding VoIP and game activities).

2. The link share of qWANhigh should be made several (say, 2 - 3) times larger than that of qWANdef.

3. The link share of qWANdef should be made several (say, 4 - 6) times larger than that of qWANlow.

4. The link share of qLANlow should be the same as that of qWANp2p.

5. The link shares of qLANack, high, def, low, p2p are set similarly.

To me makes more sense playing with delay more than bandwidth since the overall result is the same.
Even more, for HFSC i would really question the need of an ACK queue unless i know the link latency and configure it to not get in between VoIP and Games shaping. Since VoIP is not a consumer of ACK queue and Games are questionable about that, see below.

E. Realtime curves

1. If VoIP or games also use ACK queues, set qWANack's m1 as large as required (by calculation, excluding everything in qWANp2p, qWANlow, qWANdef, qWANhigh).

2. Set qWANack's m2 = m1.

3. Set qLANack's m1 and m2 similarly.

4. For qWANvoip, given a required number of concurrent calls and average voip packet size, set m1 such that an average packet shall not delay longer than specified. Set qLANvoip's m1 similarly. If the 'given' parameters are not given then, as the default, set the required number of concurrent calls to 1, the average packet size to 150 bytes and the maximum allowable delay to 10 ms (so, qWANvoip's m1 = qLANvoip's m1 = 120 kb/s per call).

5. qWANgame and qLANgame's m1 are set similarly to voip. Note however that the average packet size (and the maximal allowable packet delays) for need not be the same for incoming and outgoing direction. If the 'given' parameters are not given then, as the default, set the required number of concurent local players to 1, the average packet size to 200 bytes in both directions, and the maximum allowable delay to 50 ms in both directions (so, m1 = 32 kb/s per player for both qWANgame and qLANgame). Also note that in some cases, after allocating m1 to qWANack and qWANvoip, taking into account that the sum of real-time service curves must not exceed 80% qWANroot, few percentage of m1 remains and the best we can do is simply to allocate the rest to qWANgame, thus we can only guarantee the game delay as low as possible.

6. [Part of] the rest of m2 is allocated to qWANvoip's m2 and qWANgame's m2 in amounts required for reasonable traffic. Similarly for qLANvoip's m2 and qLANgame's m2. If not further specified, the 'reasonable traffic' means 32 kb/s per concurrent VoIP call plus 32 kb/s (in both directions) per concurrent local game player.

1- VoIP will not use ACK i have not seen any phone using TCP for this kind of traffic and i think we know that tcp gives pretty bad latency ofr such traffic.
2 and 3 - i do not see the reason for that, since i need still to be convinced that ACK queue does not get in the tracks of VoIP and Games in HFSC case.
4- Well now i set m1 = 25% of the link  d = 30ms and m2 = 5%-20% depending on the user choices. That should take care of most home setups others might need to tweak it.

5- In the new shaper i use percentages as input from the user which makes me unbound to the speed specified for LAN/WAN or whatever number of links you have in the box. For Games d = 50ms will be OK(i think, but i am not much of a gamer), and HFSC realtime scheduler should overcommit when needed in the bounds of 80% so i think its better to leave it to choose what to do. Even though that 80% upperlimit for realtime is constructed to allow safe overcommit when needed and giving it some more amount where to overcommit safely is better.
For ACK queue for game i don't think is needed since usual home setups only have 2 - max 4 players and the delay can be guaranteed by the real time scheduler. Taking in consideration that most setups are asymmetric the bandwidth is there already and only delay is the culprit.

6- Well i think that most of the user will choose more then needed bandwidth during the wizard so that amount will be distributed accordingly. I think that delay parameter to the linkshare part of Games, OthersHigh and VoIP queue would need more discussion than the remaining bandwidth.

• Well it appears that in essence, my figures agree with yours. Only few details diverge.

• The delay. As far as I know, the maximum allowable delay is specified by m1, not d. (Thus if m1 = m2, then d does not matter at all.) If we require that an average packet of S bytes shall not delay longer than D ms (assuming single voip call/game player), we do that by setting m1 to

m1 = 8 * S / D   [kb/s]

• The unit of m1 and m2 of real-time curves. I prefer specifying m1 and m2 in kb/s rather than in %. That's because the link is asymmetric, the same m1 kb/s in both directions would result in significantly different % of uplink and downlink. For S=150 bytes, m1 = 120 kb/s (per concurrent voip call) guarantees D=10 ms  regardless of the link, while m1= 25% of 1024 kb/s downlink would guarantee D = 4.7 ms, and m1=25% of 256 kb/s uplink would guarantee D = 18.8 ms. Although 4.7 ms and 18.8 ms may be both acceptable, they look like a choice at random rather than an exact delay specification.

• The meaning of the d parameter. I believe that d should be directly related to D and never less than D. For a few N concurrent voip calls (or concurrent game players) and the required maximum delay D ms, the suitable d is

d = (N+2) * D [ms]

One may also set d = (N+1) * D or N*D.

Thus if N = 1 and D = 10 ms (VoIP) then d = 10-30 ms,
and if N = 1 and D = 50 ms (games) then d = 50-150 ms.

• Right, but i can have the problem that the user selects CBQ for downlink and HFSC for uplink and i could not translate the formula above to have some meaning for CBQ.(Personal thought, i could only if i patch PF to let me set the maxbust packet of CBQ :). I will think about handling different schedulers some more and let you know what i choose, in the mean time if you have a proposal apart a different wizard for different cases(which even might consider after all the discussion and your help) i will consider it.
My choices in percentages are forced from the generalization the wizard should give.
Yeah d is propperly what you understand it to be. As i said above, i would think about this better to see what conclusion i will arrive.

• I have nothing against expressing every bandwidth in percentage or against the single-wizard-for-multiple-scheduler. It is nice and it should work. But I think it should offer at least two set of percentages – one for uplink(s), one for downlink(s) -- that may be specified independently. That's not because of the use of different schedulers, that's because of the link asymmetricity.

requirement like qWANack's linkshare m2 = 40 kb/s become 20%,

requirement like qLANack's linkshare m2 = 16 kb/s become 2%,

requirement like qWANvoip's realtime m1 = qLANvoip's realtime m1 = 120 kb/s become 60% and 15%, respectively.

And these percentages can be translated meaningfully (maybe taken as-is) from HFSC to other percentage-aware schedulers as well.

• Than multiple wizards it is.

1- CBQ or HFSC only WAN-LAN wizard simple setup( 1 uplink/ 1 downlink )
2- CBQ or HFSC only multiple links. You specify which are the uplinks which are the downlinks and you enter values for each of them indipendently.
All values can be specified with whatever you prefer (%|Kb|Mb).

3- Mixed schedulers multiple links. Just specify bandwidths in percentage and tweak it after the wizard.

And some other for DMZ setups and such.

This way everybody is happy and we provide a better product.

• Well it appears that in essence, my figures agree with yours. Only few details diverge.

• The delay. As far as I know, the maximum allowable delay is specified by m1, not d. (Thus if m1 = m2, then d does not matter at all.) If we require that an average packet of S bytes shall not delay longer than D ms (assuming single voip call/game player), we do that by setting m1 to

m1 = 8 * S / D  [kb/s]

• The unit of m1 and m2 of real-time curves. I prefer specifying m1 and m2 in kb/s rather than in %. That's because the link is asymmetric, the same m1 kb/s in both directions would result in significantly different % of uplink and downlink. For S=150 bytes, m1 = 120 kb/s (per concurrent voip call) guarantees D=10 ms  regardless of the link, while m1= 25% of 1024 kb/s downlink would guarantee D = 4.7 ms, and m1=25% of 256 kb/s uplink would guarantee D = 18.8 ms. Although 4.7 ms and 18.8 ms may be both acceptable, they look like a choice at random rather than an exact delay specification.

• The meaning of the d parameter. I believe that d should be directly related to D and never less than D. For a few N concurrent voip calls (or concurrent game players) and the required maximum delay D ms, the suitable d is

d = (N+2) * D [ms]

One may also set d = (N+1) * D or N*D.

Thus if N = 1 and D = 10 ms (VoIP) then d = 10-30 ms,
and if N = 1 and D = 50 ms (games) then d = 50-150 ms.

I want to ask you something i do not really found an answer or am totally sure of.

What is upperlimit m1 meaning:
1- is it burst?!
2- is it that for such delay you will not get more than this bandwidth(although this sounds like 1-)?!

• What is upperlimit m1 meaning:
1- is it burst?!
2- is it that for such delay you will not get more than this bandwidth(although this sounds like 1-)?!

To tell the true, I don't know. It's not very well documented.

Assume that the upper-limit curve does what it should do, i.e. as a upper-limitting curve, then

• A packet (as a point plotted in the time - service coordinate system) receives service only if it is plotted on or below the curve.

• The curve moves whenever the linkshare curve moves.

If this is truth, one may think of upperlimit m1 as a minimal delay specification.

• Here is an (empirical) open formula to calculate qWANack and qLANack for home user:

log(X/A) =  0.8 * log(B/A) + log(0.0558)
log(Y/B) = -0.8 * log(B/A) + log(0.0558)

Linear programming (LP) with 12 traffic patterns was used to estimate maximal qWANack and qLANack utilizations. The formula was then built from the estimated figures. It approximates the figures quite well (see table).

Using LP      Using formula
B/A    X/A    Y/B      X/A    Y/B
===  ======  =====    ======  =====
1    5.58%  5.58%    5.58%  5.58%
2    10.39%  3.17%    9.72%  3.20%
3    15.21%  2.37%    13.44%  2.32%
4    20.02%  1.97%    16.92%  1.84%
5    24.83%  1.73%    20.22%  1.54%
6    29.64%  1.57%    23.40%  1.33%
7    34.09%  1.44%    26.47%  1.18%
8    37.34%  1.31%    29.45%  1.06%
9    40.58%  1.21%    32.36%  0.96%
10    43.83%  1.13%    35.21%  0.88%
11    47.08%  1.07%    38.00%  0.82%
12    50.32%  1.01%    40.74%  0.76%
13    53.57%  0.97%    43.43%  0.72%
14    56.82%  0.93%    46.08%  0.68%
15    60.06%  0.89%    48.70%  0.64%
16    63.15%  0.86%    51.28%  0.61%
17    65.77%  0.84%    53.83%  0.58%
18    68.39%  0.82%    56.34%  0.55%
19    70.25%  0.79%    58.84%  0.53%
20    71.64%  0.75%    61.30%  0.51%

Edit 2008-01-30: The table is now correct.

• From the code seems that your conclusion makes the more sense.
The code updates the delay of a linkshare curve, after it has been selected as the winner, to that of the upperlimit only if it will exceed what the upperlimit guarantees. So it makes a guarantee that the next time linkshare takes on it will be in the bounds specified in the upperlimit.

So this makes the most sense of it. It might be used as a burst if careful enough though its usage is for something else.

• Here is an (empirical) open formula to calculate qWANack and qLANack for home user:

log(X/A) =  0.8 * log(B/A) + log(0.0558)
log(Y/B) = -0.8 * log(B/A) + log(0.0558)

Linear programming with 12 traffic patterns was used to estimate maximal qWANack and qLANack bandwidths. The formula was then built from the estimated figures. It approximates the figures quite well (see table).

``````	Using LP		Using formula
B/A 	X/A	  Y/B	  X/A	Y/B
1	5.58%	5.58%	5.58%	5.58%
2	9.62%	3.08%	9.72%	3.20%
3	13.53%	2.21%	13.44%	2.32%
4	17.44%	1.78%	16.92%	1.84%
5	21.35%	1.52%	20.22%	1.54%
6	25.26%	1.35%	23.40%	1.33%
7	29.17%	1.23%	26.47%	1.18%
8	33.08%	1.13%	29.45%	1.06%
9	37.00%	1.06%	32.36%	0.96%
10	40.91%	1.00%	35.21%	0.88%
11	44.82%	0.96%	38.00%	0.82%
12	48.73%	0.92%	40.74%	0.76%
13	52.64%	0.88%	43.43%	0.72%
14	56.55%	0.86%	46.08%	0.68%
15	60.06%	0.83%	48.70%	0.64%
16	63.15%	0.79%	51.28%	0.61%
17	65.77%	0.74%	53.83%	0.58%
18	68.39%	0.70%	56.34%	0.55%
19	70.25%	0.67%	58.84%	0.53%
20	71.64%	0.63%	61.30%	0.51%

``````

Great, i was just dumping ACK queue but with this i have an approximation for it.

Thanks.

• Updated formula:

log(X/A) =  0.773765872 * log(B/A) + log(0.0739086792)
log(Y/B) = -0.773765872 * log(B/A) + log(0.0739086792)

X/A, Y/B by linear programming and by the formula, corrected:

Using LP      Using formula
B/A    X/A    Y/B      X/A    Y/B
===  ======  =====    ======  =====
1    5.58%  5.58%    7.39%  7.39%
2    10.39%  3.17%    12.64%  4.32%
3    15.21%  2.37%    17.29%  3.16%
4    20.02%  1.97%    21.60%  2.53%
5    24.83%  1.73%    25.68%  2.13%
6    29.64%  1.57%    29.57%  1.85%
7    34.09%  1.44%    33.31%  1.64%
8    37.34%  1.31%    36.94%  1.48%
9    40.58%  1.21%    40.46%  1.35%
10    43.83%  1.13%    43.90%  1.24%
11    47.08%  1.07%    47.26%  1.16%
12    50.32%  1.01%    50.55%  1.08%
13    53.57%  0.97%    53.78%  1.02%
14    56.82%  0.93%    56.95%  0.96%
15    60.06%  0.89%    60.08%  0.91%
16    63.15%  0.86%    63.15%  0.86%
17    65.77%  0.84%    66.19%  0.83%
18    68.39%  0.82%    69.18%  0.79%
19    70.25%  0.79%    72.14%  0.76%
20    71.64%  0.75%    75.06%  0.73%

• Can i assume this is right now?!

I am just asking to be sure and i would hate to rerun the calculations again on this :).

• The calculation is now formally exact and the formula approximates it tightly. However, for K = 20, the old formula gives X/A = 61% which corresponds to the largest qWANack I ever hear of (see hoba's post in this thread), while the new formula shows something else: X/A=75%, which is clearly more aggressive and may be over-estimate.

Which formula is better, I don't know. I think the better approach is to use the old as the default, users should apply the new only if they'll experience packet drops.