Weird issues with limiters



  • Hi everyone. I have encountered two problems while setting up traffic shaping on my home network, wondering if anyone else has seen them and if there is an open bug on this.

    My setup is as follows. pfSense 2.4.4_p2, the machine maintains a L2TP tunnel to my ISP and all the traffic is configured to go through that interface (WAN_L2TP).

    The first problem is as follows. I configured a floating rule (this is the only rule configured, everything else is default):
    Action=Pass, Quick=checked, Interface=WAN_L2TP, Direction=out, Address Family=IPV4,Protocol=Any, Gateway=WAN_L2TP_L2TP, In/Out pipes - WanUpQ, WanDownQ. The limiters are simple Tail Drop/FIFO ones, but I also tried FQ_CODEL limiters, the result is the same. With this setup, everything works as far as I can tell, other than... traceroute and mtr. It looks like this:

    $ traceroute -n yahoo.com
    traceroute to yahoo.com (72.30.35.10), 30 hops max, 60 byte packets
     1  192.168.1.3  0.544 ms  0.510 ms  0.485 ms
     2  72.30.35.10  9.542 ms  9.715 ms  9.854 ms
     3  * * *
     4  72.30.35.10  9.645 ms  9.693 ms  9.663 ms
     5  72.30.35.10  66.242 ms  69.697 ms  66.402 ms
     6  72.30.35.10  67.251 ms  67.036 ms  67.506 ms
     7  72.30.35.10  75.363 ms  72.511 ms  86.762 ms
     8  72.30.35.10  76.549 ms  76.740 ms  153.401 ms
     9  72.30.35.10  153.355 ms  161.405 ms  161.197 ms
    10  72.30.35.10  169.345 ms  158.462 ms  158.295 ms
    11  72.30.35.10  166.803 ms  164.302 ms  163.989 ms
    12  72.30.35.10  171.817 ms  176.850 ms  172.274 ms
    13  72.30.35.10  165.496 ms  166.961 ms  181.393 ms
    14  72.30.35.10  172.873 ms  170.853 ms  172.007 ms
    
    
    $ mtr google.com
    Host                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
    1. 192.168.1.3                0.0%     4    0.2   0.3   0.2   0.4   0.0
    2. 72.30.35.10                0.0%     4    9.0  10.8   9.0  13.3   1.6
    

    As you can see, traceroute shows the same ip (which is the target host's ip) for every hop, but the timings are different. mtr is broken even worse, it only shows my router and then (I guess) the second hop labeled with the target's ip.

    If I set out the "out" pipe in the rule to "none" and leave only the "in" pipe filled with WanUpQ, the problem disappears. If I configure the limiter on LAN or WAN interface, the problem disappears.

    The second problem is even more mysterious. One of my machines runs a IPSec (strongSwan) tunnel. I tried to configure traffic shaping that would make this tunnel get the lowest priority, so that it doesn't interfere with high-priority interactive tasks. I nearly achieved that, except when the tunnel transfers high amounts (above 50Mbps out ~130Mbps I have in total) of ingress traffic. When that happens, my high-priority traffic starts experiencing packet loss.

    The way I configured it is using PRIQ on outgoing and HFSC on incoming traffic. I took a look at pfTop queues, it shows that no high-priority packets are being dropped, which I take to mean that it's my ISP who's dropping the ingress ICMP replies. I know all of this is not very surprising, considering that IPSec works over UDP, and UDP lacks congestion control, but bear with me. When experimenting with limiters, I stumbled onto a configuration that actually removes the dropped packets problem completely. I configured a pair of Tail Drop/FIFO limiters and routed all the IPSec traffic through them. Problem solved, no more packet loss. I then tried to achieve the same effect with other instruments: FQ-CODEL limiters and HFSC with upper limits. Neither of those two solves the problem, only Tail Drop/FIFO limiter does. So this leaves me wondering what is going on here. Furthermore (and here I'm coming back to issue #1), because I want to use FQ-CODEL limiter, but am forced to use Tail Drop limiter to deal with the UDP problem, I reached an awkward setup where I have to install one of the limiters on LAN interface and another on WAN_L2TP interface (I tried to install both of them on the LAN interface, but it seems this is impossible?) , which leads to the traceroute/mtr issue.



  • Opened an issue in the bugtracker: https://redmine.pfsense.org/issues/9263



  • @kirillkh Take a look at the following guide as it should explain the issue you are witnessing and show how to workaround it - hint floating rule #1.

    https://forum.netgate.com/post/807490



  • @uptownvagrant That's handy, thanks. But actually it's also unhelpful, because the point of traceroute is to diagnose network issues, e.g. measure packet loss and lag. In order to do that, it should be subjected to the same shaping as the other traffic. So circumventing the limiter actually makes traceroute output meaningless as far as debugging traffic shaping is concerned.



  • @kirillkh What are you trying to debug with regard to traffic shaping using ICMP traceroute? The workaround I posted gets you a proper ICMP trace of hops where you didn't have it before.

    I don't use traceroute to measure packet loss - it's not the right tool. If you want to measure the effectiveness of FQ-CoDel, please check out Flent as it's pretty brilliant. I agree that the policy route issue is annoying but the workaround I posted has value.

    It is what it is:
    https://www.netgate.com/docs/pfsense/routing/troubleshooting-traceroute-output.html



  • @uptownvagrant E.g. I want to see is there is a packet loss when I download or upload under certain conditions (full saturation of bandwidth, many open sockets, also is destination/source prioritization working correctly, etc). I didn't try Flent, but doesn't it also use ICMP TTL as part of its operation?



  • @kirillkh Flent is a python frontend for netperf, iperf, and irtt and has many test options. E.g. the objective of the Flent RRUL test is to use ICMP and UDP to measure RTT while loading the pipe with eight TCP streams (four downloads, four uploads) - there are options to change down/up streams and there are many other tests. Flent RRUL test uses ping timestamps with ICMP to determine latency/loss - not the incrementing TTL function that you are executing with traceroute.

    https://flent.org
    https://github.com/heistp/irtt