Torrents kill DNS lookup?
-
Hi,
I have PFSense (latest) running as a vm on an esxi server with two nics (WAN and LAN) the WAN is an ADSL2+ connection on a new TPLink modem.
I also have two linux vms' and one windows server vm running on the same box.
The problem I am having is that, whenever a torrent is being downloaded, suddenly you can't browse to the internet from any computers on the LAN. I also use reverse proxy for transmission (torrents) and sometimes I can't even get onto the webserver from externally. If I'm remoted onto the server RDP from say work, the RDP connection will stay working even when the internet stops working.
What's strange is that, I can still google search stuff. Results come up. But if I try to go to a domain name like www.facebook.com it just doesn't go anywhere, saying "resolving proxy" or "connection timeout".
I ran fiddler and I noticed that when the problem is occurring, ie, when a torrent is being downloaded, th DNS lookups go from 0-2ms to 12000ms. Below is an example of fiddler running on the windows server when a torrent is being downloaded on the linux server (Ubuntu running transmission):
Request Count: 1
Bytes Sent: 228 (headers:228; body:0)
Bytes Received: 207 (headers:207; body:0)ACTUAL PERFORMANCE
–------------
ClientConnected: 09:40:54.434
ClientBeginRequest: 09:40:54.437
GotRequestHeaders: 09:40:54.437
ClientDoneRequest: 09:40:54.437
Determine Gateway: 0ms
DNS Lookup: 12003ms
TCP/IP Connect: 0ms
HTTPS Handshake: 0ms
ServerConnected: 00:00:00.000
FiddlerBeginRequest: 00:00:00.000
ServerGotRequest: 00:00:00.000
ServerBeginResponse: 00:00:00.000
GotResponseHeaders: 00:00:00.000
ServerDoneResponse: 00:00:00.000
ClientBeginResponse: 09:41:06.474
ClientDoneResponse: 09:41:06.474Overall Elapsed: 0:00:12.036
RESPONSE BYTES (by Content-Type)
~headers~: 207
ESTIMATED WORLDWIDE PERFORMANCE
The following are VERY rough estimates of download times when hitting servers based in Seattle.
US West Coast (Modem - 6KB/sec)
RTT: 0.10s
Elapsed: 0.10sJapan / Northern Europe (Modem)
RTT: 0.15s
Elapsed: 0.15sChina (Modem)
RTT: 0.45s
Elapsed: 0.45sUS West Coast (DSL - 30KB/sec)
RTT: 0.10s
Elapsed: 0.10sJapan / Northern Europe (DSL)
RTT: 0.15s
Elapsed: 0.15sChina (DSL)
RTT: 0.45s
Elapsed: 0.45sAnd here's one when the torrent is turned off:
ACTUAL PERFORMANCE
ClientConnected: 09:58:13.435
ClientBeginRequest: 09:58:13.511
GotRequestHeaders: 09:58:13.511
ClientDoneRequest: 09:58:13.511
Determine Gateway: 0ms
DNS Lookup: 34ms
TCP/IP Connect: 174ms
HTTPS Handshake: 0ms
ServerConnected: 09:58:13.721
FiddlerBeginRequest: 09:58:13.721
ServerGotRequest: 09:58:13.721
ServerBeginResponse: 09:58:13.903
GotResponseHeaders: 09:58:13.904
ServerDoneResponse: 09:58:13.904
ClientBeginResponse: 09:58:13.904
ClientDoneResponse: 09:58:13.904Overall Elapsed: 0:00:00.393
RESPONSE BYTES (by Content-Type)
~headers~: 1,247
text/html: 340ESTIMATED WORLDWIDE PERFORMANCE
The following are VERY rough estimates of download times when hitting servers based in Seattle.
US West Coast (Modem - 6KB/sec)
RTT: 0.10s
Elapsed: 0.10sJapan / Northern Europe (Modem)
RTT: 0.15s
Elapsed: 0.15sChina (Modem)
RTT: 0.45s
Elapsed: 0.45sUS West Coast (DSL - 30KB/sec)
RTT: 0.10s
Elapsed: 0.10sJapan / Northern Europe (DSL)
RTT: 0.15s
Elapsed: 0.15sChina (DSL)
RTT: 0.45s
Elapsed: 0.45s
Learn more about HTTP performance at http://fiddler2.com/r/?HTTPPERF
So I'm really at a loss to explain this.
PFSense has two CPU cores and 2GB of RAM, it is not being hammered by the torrents, I tried changing my DNS from google to my ISP but that made virtually zero difference, same problem still applies.
Only packages I'm running are reverse proxy. I did have load balancing on but I turned it off, but it wasn't the cause of the problem.. I have limited transmission to a small number of connections and low bandwidth so it is not maxing out my WAN.
What could this be? Any help would be appreciated.
-
Sounds to me like state-exhaustion.
Especially with torrents.
What is the number of states when this occurs?
Try to increase the maximum, since you seem to have plenty of RAM. -
+1 It does sound like state exhaustion, except that:
I have limited transmission to a small number of connections and low bandwidth so it is not maxing out my WAN.
How many states exactly? Is it that limit working correctly? Check the Status: RRD Graphs: system states.
It certainly looks like you've hit some resource limit somewhere.
Steve
-
Hi guys,
Well the states certainly increase when the torrent is downloading, it goes to 1.4k which I'm assuming is 1400? Check out the graph here, I started the torrent around 7:45 and you can see the increase, I turned it off just before 8pm. During the time it was on internet usage on all lan side clients was basically non existant, just "resolving", or it will hang for ages then load a bit of text.
In system>advanced it says my default for "Firewall Maximum States" is 202000.
Should I increase this? I don't seem to be hitting this.
On transmission the bandwidth is limited to 500kB/s down and 5kB up(I have a 10megabit ADSL2 connection)
For peers I set max peers per torrent to 60 and max peers overall to 240. I was only downloading one torrent though.
Oddly enough when I download on utorrent on my local machine it seems OK, bandwith is reduced but browsing still works fine. So I think it must be some problem maybe because the ubuntu transmission VM is running on the same ESXI box as the PFSense VM?
-
Is your 10Mb ADSL connection actually 10Mb or is that max rate given by the ISP?
Oddly enough when I download on utorrent on my local machine it seems OK, bandwith is reduced but browsing still works fine. So I think it must be some problem maybe because the ubuntu transmission VM is running on the same ESXI box as the PFSense VM?
This seems like a big clue. I would now look at some state limit within the virtual networking of ESXi. Not something I'm particularly familiar with though.
Steve
-
Is your 10Mb ADSL connection actually 10Mb or is that max rate given by the ISP?
Oddly enough when I download on utorrent on my local machine it seems OK, bandwith is reduced but browsing still works fine. So I think it must be some problem maybe because the ubuntu transmission VM is running on the same ESXI box as the PFSense VM?
This seems like a big clue. I would now look at some state limit within the virtual networking of ESXi. Not something I'm particularly familiar with though.
Steve
Actually I get about 12mbps on speed tests..
Anyway it was PFSense causing the problem.. I was getting about 100kbps on torrents and couldn't browse at all.. I took pfsense out of the equation just leaving my cheap tplink model/router, now I'm downloading right now at 1.3MB/s and still browsing beautifully.
So.. I have no idea. It's either a configuration problem, or some kind of problem with having pfsense on an AMD ESXI box.. I don't know. It's a shame really. Clearly it isn't my little modem anyway.
-
"th DNS lookups go from 0-2ms to 12000ms"
And did you tell your torrent client to not do dns.. Or point the client to something outside for dns other than pfsense built in dns forwarder?
How many dns servers are you forwarding too in pfsense? Did you change to sequential or leave default.. Pfsense out of the box will send your dns query that is not cached to all the dns servers listed at the same time. This could become sort of small dns amplification attack if you ask me. If you have some client asking for thousands of PTR requests for the IPs in the swarm, and then your sending to say 4 dns servers on the outside you just amplified your number of queries using up your upload pipe, etc.
-
I had stuff here that was causing similar issues a couple of years ago, with pfsense 2.0.x. The advice above matches what I found in the end. Some more things to try:
- Check the system RRD graphs, especially quality. A big issue for me was that dropped packets rose from 0.2% to 35-40% under heavy load, if the config didn't allow enough resources. Worth checking if that's part of your issue.
- If you do see lots of dropped packets, it could also be worth considering whether the ISP/Telco/Cable connection is reliable under such load, too, or if you're being disconnected or having ISP issues of any kind when the going gets heavy. (For example in some countries, a series of connection issues - for whatever reason - can lead to changes and reductions in your ISP line tuning, leading to additional pressure on your pipe under load.)
- Allow a lot of state space and tweak a few other options, if you haven't already: I use firewall optimisation="conservative", firewall max table entries 100 million (yes, really!), and max states 300,000. Your system has comparable resources, these kinds of levels should work on it.
- Consider moving DNS onto the router. I'm using Unbound (package) for LAN lookups, which means lookups are held locally and cached, they don't need to be relooked up after a restart, they aren't subject to the client machine's own DNS caching policy or limits, as a "top-down" resolver it probably won't hammer the same lower level DNS server all the time, and so on. It also allows a lot of DNS flexibility if I want it (I also use it to block a bunch of hosts by domain name which IP based blocklists can't do, and to set a "minimum DNS cache time" policy which can help with short or zero cache DNS entries that can require a new lookup every time they are used). This might be worth trying if your usage impacts/amplifies DNS or makes very heavy demands on it. Either way I would separately try to identify whether DNS issues are a cause of issues, or a consequence of them - is DNS slowed or failing because of high state change/throughput/resources, or are high DNS lookup latency/failure rates causing failed resolution and broken traffic? One tip, if you do use Unbound or another DNS caching package on the router in this way, make sure that the router itself uses ISP assigned or fixed DNS servers for its own purposes. Otherwise when you reinstall Unbound and restore config.xml, the router won't have a way to get DNS itself, to re-install its configured DNS package :)
Hope these help in addition to the input above!
-
I had stuff here that was causing similar issues a couple of years ago, with pfsense 2.0.x. The advice above matches what I found in the end. Some more things to try:
- Check the system RRD graphs, especially quality. A big issue for me was that dropped packets rose from 0.2% to 35-40% under heavy load, if the config didn't allow enough resources. Worth checking if that's part of your issue.
I got frustrated with this and ended up turning the esxi box off (and pfsense along with it). I set it up about a month ago because I had an assignment for uni where I need to build a test domain environment.
Anyway I got pfsense running again with clients all using pfsense. I still had the torrenting issue. But I noticed the ram usage was high, even though I gave it I think 4GB of RAM. I decided to turn RRD graphs off.
Problem solved! For whatever reason, the RRD graphs were killing my browsing for clients, as well as killing the reverse proxy (squid would just stop, service would NOT restart).
Hopefully this might help people in the future!