Torrents kill DNS lookup?



  • Hi,

    I have PFSense (latest) running as a vm on an esxi server with two nics (WAN and LAN) the WAN is an ADSL2+ connection on a new TPLink modem.

    I also have two linux vms' and one windows server vm running on the same box.

    The problem I am having is that, whenever a torrent is being downloaded, suddenly you can't browse to the internet from any computers on the LAN. I also use reverse proxy for transmission (torrents) and sometimes I can't even get onto the webserver from externally. If I'm remoted onto the server RDP from say work, the RDP connection will stay working even when the internet stops working.

    What's strange is that, I can still google search stuff. Results come up. But if I try to go to a domain name like www.facebook.com it just doesn't go anywhere, saying "resolving proxy" or "connection timeout".

    I ran fiddler and I noticed that when the problem is occurring, ie, when a torrent is being downloaded, th DNS lookups go from 0-2ms to 12000ms. Below is an example of fiddler running on the windows server when a torrent is being downloaded on the linux server (Ubuntu running transmission):

    Request Count:  1
    Bytes Sent:      228 (headers:228; body:0)
    Bytes Received:  207 (headers:207; body:0)

    ACTUAL PERFORMANCE
    –------------
    ClientConnected: 09:40:54.434
    ClientBeginRequest: 09:40:54.437
    GotRequestHeaders: 09:40:54.437
    ClientDoneRequest: 09:40:54.437
    Determine Gateway: 0ms
    DNS Lookup: 12003ms
    TCP/IP Connect: 0ms
    HTTPS Handshake: 0ms
    ServerConnected: 00:00:00.000
    FiddlerBeginRequest: 00:00:00.000
    ServerGotRequest: 00:00:00.000
    ServerBeginResponse: 00:00:00.000
    GotResponseHeaders: 00:00:00.000
    ServerDoneResponse: 00:00:00.000
    ClientBeginResponse: 09:41:06.474
    ClientDoneResponse: 09:41:06.474

    Overall Elapsed: 0:00:12.036

    RESPONSE BYTES (by Content-Type)

    ~headers~: 207

    ESTIMATED WORLDWIDE PERFORMANCE

    The following are VERY rough estimates of download times when hitting servers based in Seattle.

    US West Coast (Modem - 6KB/sec)
    RTT: 0.10s
    Elapsed: 0.10s

    Japan / Northern Europe (Modem)
    RTT: 0.15s
    Elapsed: 0.15s

    China (Modem)
    RTT: 0.45s
    Elapsed: 0.45s

    US West Coast (DSL - 30KB/sec)
    RTT: 0.10s
    Elapsed: 0.10s

    Japan / Northern Europe (DSL)
    RTT: 0.15s
    Elapsed: 0.15s

    China (DSL)
    RTT: 0.45s
    Elapsed: 0.45s

    And here's one when the torrent is turned off:

    ACTUAL PERFORMANCE

    ClientConnected: 09:58:13.435
    ClientBeginRequest: 09:58:13.511
    GotRequestHeaders: 09:58:13.511
    ClientDoneRequest: 09:58:13.511
    Determine Gateway: 0ms
    DNS Lookup: 34ms
    TCP/IP Connect: 174ms
    HTTPS Handshake: 0ms
    ServerConnected: 09:58:13.721
    FiddlerBeginRequest: 09:58:13.721
    ServerGotRequest: 09:58:13.721
    ServerBeginResponse: 09:58:13.903
    GotResponseHeaders: 09:58:13.904
    ServerDoneResponse: 09:58:13.904
    ClientBeginResponse: 09:58:13.904
    ClientDoneResponse: 09:58:13.904

    Overall Elapsed: 0:00:00.393

    RESPONSE BYTES (by Content-Type)

    ~headers~: 1,247
    text/html: 340

    ESTIMATED WORLDWIDE PERFORMANCE

    The following are VERY rough estimates of download times when hitting servers based in Seattle.

    US West Coast (Modem - 6KB/sec)
    RTT: 0.10s
    Elapsed: 0.10s

    Japan / Northern Europe (Modem)
    RTT: 0.15s
    Elapsed: 0.15s

    China (Modem)
    RTT: 0.45s
    Elapsed: 0.45s

    US West Coast (DSL - 30KB/sec)
    RTT: 0.10s
    Elapsed: 0.10s

    Japan / Northern Europe (DSL)
    RTT: 0.15s
    Elapsed: 0.15s

    China (DSL)
    RTT: 0.45s
    Elapsed: 0.45s


    Learn more about HTTP performance at http://fiddler2.com/r/?HTTPPERF

    So I'm really at a loss to explain this.

    PFSense has two CPU cores and 2GB of RAM, it is not being hammered by the torrents, I tried changing my DNS from google to my ISP but that made virtually zero difference, same problem still applies.

    Only packages I'm running are reverse proxy. I did have load balancing on but I turned it off, but it wasn't the cause of the problem.. I have limited transmission to a small number of connections and low bandwidth so it is not maxing out my WAN.

    What could this be? Any help would be appreciated.



  • Sounds to me like state-exhaustion.
    Especially with torrents.
    What is the number of states when this occurs?
    Try to increase the maximum, since you seem to have plenty of RAM.


  • Netgate Administrator

    +1 It does sound like state exhaustion, except that:

    @andyroo54:

    I have limited transmission to a small number of connections and low bandwidth so it is not maxing out my WAN.

    How many states exactly? Is it that limit working correctly? Check the Status: RRD Graphs: system states.

    It certainly looks like you've hit some resource limit somewhere.

    Steve



  • Hi guys,

    Well the states certainly increase when the torrent is downloading, it goes to 1.4k which I'm assuming is 1400? Check out the graph here, I started the torrent around 7:45 and you can see the increase, I turned it off just before 8pm. During the time it was on internet usage on all lan side clients was basically non existant, just "resolving", or it will hang for ages then load a bit of text.

    In system>advanced it says my default for "Firewall Maximum States" is 202000.

    Should I increase this? I don't seem to be hitting this.

    On transmission the bandwidth is limited to 500kB/s down and 5kB up(I have a 10megabit ADSL2 connection)

    For peers I set max peers per torrent to 60 and max peers overall to 240. I was only downloading one torrent though.

    Oddly enough when I download on utorrent on my local machine it seems OK, bandwith is reduced but browsing still works fine. So I think it must be some problem maybe because the ubuntu transmission VM is running on the same ESXI box as the PFSense VM?


  • Netgate Administrator

    Is your 10Mb ADSL connection actually 10Mb or is that max rate given by the ISP?

    @andyroo54:

    Oddly enough when I download on utorrent on my local machine it seems OK, bandwith is reduced but browsing still works fine. So I think it must be some problem maybe because the ubuntu transmission VM is running on the same ESXI box as the PFSense VM?

    This seems like a big clue. I would now look at some state limit within the virtual networking of ESXi. Not something I'm particularly familiar with though.

    Steve



  • @stephenw10:

    Is your 10Mb ADSL connection actually 10Mb or is that max rate given by the ISP?

    @andyroo54:

    Oddly enough when I download on utorrent on my local machine it seems OK, bandwith is reduced but browsing still works fine. So I think it must be some problem maybe because the ubuntu transmission VM is running on the same ESXI box as the PFSense VM?

    This seems like a big clue. I would now look at some state limit within the virtual networking of ESXi. Not something I'm particularly familiar with though.

    Steve

    Actually I get about 12mbps on speed tests..

    Anyway it was PFSense causing the problem.. I was getting about 100kbps on torrents and couldn't browse at all.. I took pfsense out of the equation just leaving my cheap tplink model/router, now I'm downloading right now at 1.3MB/s and still browsing beautifully.

    So.. I have no idea. It's either a configuration problem, or some kind of problem with having pfsense on an AMD ESXI box.. I don't know. It's a shame really. Clearly it isn't my little modem anyway.


  • LAYER 8 Global Moderator

    "th DNS lookups go from 0-2ms to 12000ms"

    And did you tell your torrent client to not do dns..  Or point the client to something outside for dns other than pfsense built in dns forwarder?

    How many dns servers are you forwarding too in pfsense?  Did you change to sequential or leave default..  Pfsense out of the box will send your dns query that is not cached to all the dns servers listed at the same time.  This could become sort of small dns amplification attack if you ask me.  If you have some client asking for thousands of PTR requests for the IPs in the swarm, and then your sending to say 4 dns servers on the outside you just amplified your number of queries using up your upload pipe, etc.



  • I had stuff here that was causing similar issues a couple of years ago, with pfsense 2.0.x.  The advice above matches what I found in the end. Some more things to try:

    1. Check the system RRD graphs, especially quality. A big issue for me was that dropped packets rose from 0.2% to 35-40% under heavy load, if the config didn't allow enough resources.  Worth checking if that's part of your issue.
    2. If you do see lots of dropped packets, it could also be worth considering whether the ISP/Telco/Cable connection is reliable under such load, too, or if you're being disconnected or having ISP issues of any kind when the going gets heavy. (For example in some countries, a series of connection issues - for whatever reason - can lead to changes and reductions in your ISP line tuning, leading to additional pressure on your pipe under load.)
    3. Allow a lot of state space and tweak a few other options, if you haven't already: I use firewall optimisation="conservative", firewall max table entries 100 million (yes, really!), and max states 300,000. Your system has comparable resources, these kinds of levels should work on it.
    4. Consider moving DNS onto the router. I'm using Unbound (package) for LAN lookups, which means lookups are held locally and cached, they don't need to be relooked up after a restart, they aren't subject to the client machine's own DNS caching policy or limits, as a "top-down" resolver it probably won't hammer the same lower level DNS server all the time, and so on. It also allows a lot of DNS flexibility if I want it (I also use it to block a bunch of hosts by domain name which IP based blocklists can't do, and to set a "minimum DNS cache time" policy which can help with short or zero cache DNS entries that can require a new lookup every time they are used). This might be worth trying if your usage impacts/amplifies DNS or makes very heavy demands on it. Either way I would separately try to identify whether DNS issues are a cause of issues, or a consequence of them - is DNS slowed or failing because of high state change/throughput/resources, or are high DNS lookup latency/failure rates causing failed resolution and broken traffic? One tip, if you do use Unbound or another DNS caching package on the router in this way, make sure that the router itself uses ISP assigned or fixed DNS servers for its own purposes. Otherwise when you reinstall Unbound and restore config.xml, the router won't have a way to get DNS itself, to re-install its configured DNS package :)

    Hope these help in addition to the input above!



  • @stilez:

    I had stuff here that was causing similar issues a couple of years ago, with pfsense 2.0.x.  The advice above matches what I found in the end. Some more things to try:

    1. Check the system RRD graphs, especially quality. A big issue for me was that dropped packets rose from 0.2% to 35-40% under heavy load, if the config didn't allow enough resources.  Worth checking if that's part of your issue.

    I got frustrated with this and ended up turning the esxi box off (and pfsense along with it). I set it up about a month ago because I had an assignment for uni where I need to build a test domain environment.

    Anyway I got pfsense running again with clients all using pfsense. I still had the torrenting issue. But I noticed the ram usage was high, even though I gave it I think 4GB of RAM. I decided to turn RRD graphs off.

    Problem solved! For whatever reason, the RRD graphs were killing my browsing for clients, as well as killing the reverse proxy (squid would just stop, service would NOT restart).

    Hopefully this might help people in the future!


Log in to reply