Weird issues with limiters
Hi everyone. I have encountered two problems while setting up traffic shaping on my home network, wondering if anyone else has seen them and if there is an open bug on this.
My setup is as follows. pfSense 2.4.4_p2, the machine maintains a L2TP tunnel to my ISP and all the traffic is configured to go through that interface (WAN_L2TP).
The first problem is as follows. I configured a floating rule (this is the only rule configured, everything else is default):
Action=Pass, Quick=checked, Interface=WAN_L2TP, Direction=out, Address Family=IPV4,Protocol=Any, Gateway=WAN_L2TP_L2TP, In/Out pipes - WanUpQ, WanDownQ. The limiters are simple Tail Drop/FIFO ones, but I also tried FQ_CODEL limiters, the result is the same. With this setup, everything works as far as I can tell, other than... traceroute and mtr. It looks like this:
$ traceroute -n yahoo.com traceroute to yahoo.com (184.108.40.206), 30 hops max, 60 byte packets 1 192.168.1.3 0.544 ms 0.510 ms 0.485 ms 2 220.127.116.11 9.542 ms 9.715 ms 9.854 ms 3 * * * 4 18.104.22.168 9.645 ms 9.693 ms 9.663 ms 5 22.214.171.124 66.242 ms 69.697 ms 66.402 ms 6 126.96.36.199 67.251 ms 67.036 ms 67.506 ms 7 188.8.131.52 75.363 ms 72.511 ms 86.762 ms 8 184.108.40.206 76.549 ms 76.740 ms 153.401 ms 9 220.127.116.11 153.355 ms 161.405 ms 161.197 ms 10 18.104.22.168 169.345 ms 158.462 ms 158.295 ms 11 22.214.171.124 166.803 ms 164.302 ms 163.989 ms 12 126.96.36.199 171.817 ms 176.850 ms 172.274 ms 13 188.8.131.52 165.496 ms 166.961 ms 181.393 ms 14 184.108.40.206 172.873 ms 170.853 ms 172.007 ms $ mtr google.com Host Loss% Snt Last Avg Best Wrst StDev 1. 192.168.1.3 0.0% 4 0.2 0.3 0.2 0.4 0.0 2. 220.127.116.11 0.0% 4 9.0 10.8 9.0 13.3 1.6
As you can see, traceroute shows the same ip (which is the target host's ip) for every hop, but the timings are different. mtr is broken even worse, it only shows my router and then (I guess) the second hop labeled with the target's ip.
If I set out the "out" pipe in the rule to "none" and leave only the "in" pipe filled with WanUpQ, the problem disappears. If I configure the limiter on LAN or WAN interface, the problem disappears.
The second problem is even more mysterious. One of my machines runs a IPSec (strongSwan) tunnel. I tried to configure traffic shaping that would make this tunnel get the lowest priority, so that it doesn't interfere with high-priority interactive tasks. I nearly achieved that, except when the tunnel transfers high amounts (above 50Mbps out ~130Mbps I have in total) of ingress traffic. When that happens, my high-priority traffic starts experiencing packet loss.
The way I configured it is using PRIQ on outgoing and HFSC on incoming traffic. I took a look at pfTop queues, it shows that no high-priority packets are being dropped, which I take to mean that it's my ISP who's dropping the ingress ICMP replies. I know all of this is not very surprising, considering that IPSec works over UDP, and UDP lacks congestion control, but bear with me. When experimenting with limiters, I stumbled onto a configuration that actually removes the dropped packets problem completely. I configured a pair of Tail Drop/FIFO limiters and routed all the IPSec traffic through them. Problem solved, no more packet loss. I then tried to achieve the same effect with other instruments: FQ-CODEL limiters and HFSC with upper limits. Neither of those two solves the problem, only Tail Drop/FIFO limiter does. So this leaves me wondering what is going on here. Furthermore (and here I'm coming back to issue #1), because I want to use FQ-CODEL limiter, but am forced to use Tail Drop limiter to deal with the UDP problem, I reached an awkward setup where I have to install one of the limiters on LAN interface and another on WAN_L2TP interface (I tried to install both of them on the LAN interface, but it seems this is impossible?) , which leads to the traceroute/mtr issue.
Opened an issue in the bugtracker: https://redmine.pfsense.org/issues/9263
@kirillkh Take a look at the following guide as it should explain the issue you are witnessing and show how to workaround it - hint floating rule #1.
@uptownvagrant That's handy, thanks. But actually it's also unhelpful, because the point of traceroute is to diagnose network issues, e.g. measure packet loss and lag. In order to do that, it should be subjected to the same shaping as the other traffic. So circumventing the limiter actually makes traceroute output meaningless as far as debugging traffic shaping is concerned.
@kirillkh What are you trying to debug with regard to traffic shaping using ICMP traceroute? The workaround I posted gets you a proper ICMP trace of hops where you didn't have it before.
I don't use traceroute to measure packet loss - it's not the right tool. If you want to measure the effectiveness of FQ-CoDel, please check out Flent as it's pretty brilliant. I agree that the policy route issue is annoying but the workaround I posted has value.
It is what it is:
@uptownvagrant E.g. I want to see is there is a packet loss when I download or upload under certain conditions (full saturation of bandwidth, many open sockets, also is destination/source prioritization working correctly, etc). I didn't try Flent, but doesn't it also use ICMP TTL as part of its operation?
@kirillkh Flent is a python frontend for netperf, iperf, and irtt and has many test options. E.g. the objective of the Flent RRUL test is to use ICMP and UDP to measure RTT while loading the pipe with eight TCP streams (four downloads, four uploads) - there are options to change down/up streams and there are many other tests. Flent RRUL test uses ping timestamps with ICMP to determine latency/loss - not the incrementing TTL function that you are executing with traceroute.