Multi-WAN traffic shaper: packet loss



  • In multi-WAN configuration, with several gateway groups each consisting of several tiers, the traffic shaper causes ~30% packet loss on the LAN interface. The loss does not occur when the LAN root queue is disabled.

    My system is pretty simple:

    – LAN, WAN, OPT1,...  interfaces are all with static IP addresses.

    -- DHCP server enabled on the LAN interface.

    -- Virtual machine with E1000 (em0, em1,...) NICs hosted on VMware ESXi 4.

    This was observed on the Sat Mar 13 22:12:42 EST snapshot.



  • You can incerease queue lengths!
    It really depends on the speeds you use but increasing the queue lengths will make your queue drops decrease.

    Probably trying with an higher HZ setting would help too.

    It is how the traffic shaper works it is not a bug/problem.



  • It is a problem, as pfsense warns me about packet loss in the Status: Gateways page because of dead interface detection. The loss was confirmed by ping-ing from a local PC to the pfsens's LAN interface.

    WAN link is 2 mb/s, all OPTx link are >10 mb/s. At the time the loss was measured, the current throughput of LAN interface is ~0.8 mb/s out and ~0.1 mb/s in, i.e. the network was almost idle.



  • There is a setting in FreeBSD to limit the number of icmp's.
    There is even a bug open at redmine.pfsense.org about htis.

    I never was able to reproduce this but seems it might affect your case.



  • There is a setting in FreeBSD to limit the number of icmp's.
    There is even a bug open at redmine.pfsense.org about htis.

    I never was able to reproduce this but seems it might affect your case.



  • I also experience strange packet loss at the wan tests. I see ICMP requests leave the interface, arrive at the test host, the test host replies but the answer does not arrive at the interface of pfsense.

    I placed a tap between the cable modem and pfsense box (ALIX board) and that confirms that the ICMP reply does not come out of the cable box and into pfsense. pfsense concludes (and so it should) that the link is down.

    I don't know why the replies don't always make it back to my pfsense box. ADSL and cable modem have the same problems :(



  • @TuxTiger:

    I don't know why the replies don't always make it back to my pfsense box. ADSL and cable modem have the same problems :(

    After pinging from pfsense to other hosts….suddenly the replies do get back! It must be some strange quirk in my cable and adsl providers network.
    I do think we should be able to add more test hosts to each gateway and to slow down the ping frequency.

    Vyattas solution with 1...n test hosts per gateway where you can select which test method to use per test host (icmp, udp, tcp, ttl) is great.



  • @dusan
    actually now that i remember a fix for ALTQ with em(4) was committed lately.
    It will be available in pfSense as soon as the snapshot get to the date of the fix.



  • @ermal:

    @dusan
    actually now that i remember a fix for ALTQ with em(4) was committed lately.
    It will be available in pfSense as soon as the snapshot get to the date of the fix.

    Great news. I hope the fix also solves the problem of incorrect policy-based routing, as it appears to be a NIC-specific problem too – nobody has reproduced it.

    I'll turn on traffic shaper and report back soon.



  • Mar 26 01:15:32 EDT 2010 snapshot with traffic shaper turned on: it is better now.

    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=393ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=276ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=480ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=539ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62

    Ping statistics for 192.168.0.74:
        Packets: Sent = 1156, Received = 1118, Lost = 38 (3% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 0ms, Maximum = 1006ms, Average = 64ms

    The "red" warnings about packet loss have gone. But there are still many logged apinger ALARMs about link ** delay ** and what's more, pf reloads much more heavily. It reloaded every few minutes before. Now it reloads almost every few seconds so as no packet counters could reach more than few thousands packets.



  • Well basically if you do not classify icmp as a highpriority queue it might do that.
    For the reloads i have not checked yet.



  • Using the traffic shaper, I always set ICMP to qOthersHigh queue. My qOthersHigh is using HFSC at 40% bandwidth and priority 4.

    Maybe I said too prematurely last weekend. Now after a week of testing I can say that it is not better. The problem remains.

    Pinging 192.168.0.74 with 32 bytes of data:

    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time=962ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time=569ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=260ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Request timed out.
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=711ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62
    Reply from 192.168.0.74: bytes=32 time<1ms TTL=62

    Ping statistics for 192.168.0.74:
        Packets: Sent = 61, Received = 49, Lost = 12 (19% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 0ms, Maximum = 962ms, Average = 51ms



  • I should add that current traffic on LAN is 0.15 mb/s in and 0.5 mb/s out which, for my 10 mb/s Internet links, is almost nothing.



  • Can you show me your queues configuration and rules?
    Send it even privately to see this.

    BTW the reloads should be bettter now.



  • On Apr 3 snapshot, packets are not lost anymore. The issue was some how resolved.

    The ping is very slow, however. The SSH console and the Web UI too. Roundtrip delay is now about 900 ms on average. It looks like everything is now getting into the qP2P queue, which is my Penalty and Catch-all queue (it is on the top on my Floating Rules page) and which is the only queue having an upper-limit curve width bandwidth=1%, m1=0, d=1000ms, m2=50%.

    I'll send you my configs later when I'll get access to the local console. Now it is a real pain to work with it remotely.

    I'll test the reloads later, as pftop is inaccessible in the current snapshot. Thanks for consideration.

    Reply from 192.168.0.74: bytes=32 time=997ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1003ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1005ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1002ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1003ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1004ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1003ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1001ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1004ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1001ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1003ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1009ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1004ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1003ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1003ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1001ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1002ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1001ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1001ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1002ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1002ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1001ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1001ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1006ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1001ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1000ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1002ms TTL=62
    Reply from 192.168.0.74: bytes=32 time=1001ms TTL=62

    Ping statistics for 192.168.0.74:
       Packets: Sent = 32840, Received = 32814, Lost = 26 (0% loss),
    Approximate round trip times in milli-seconds:
       Minimum = 0ms, Maximum = 1027ms, Average = 873ms


Log in to reply