Super long DNS times intermittently
-
Hi all. I've been having intermittent issues with DNS lookups failing or taking 10's of seconds to happen. I was finally able to catch these unbound logs:
May 4 19:52:10 unbound 63124 [63124:0] info: histogram of recursion processing times
May 4 19:52:10 unbound 63124 [63124:0] info: average recursion processing time 98.123002 sec
May 4 19:52:10 unbound 63124 [63124:0] info: server stats for thread 0: requestlist max 123 avg 22.3883 exceeded 0 jostled 0
May 4 19:52:10 unbound 63124 [63124:0] info: server stats for thread 0: 29610 queries, 11626 answers from cache, 17984 recursions, 0 prefetch, 0 rejected by ip ratelimiting
May 4 19:52:10 unbound 63124 [63124:0] info: service stopped (unbound 1.13.1).Not 100% sure what DNS recursion means, but 98 seconds as an average seems super wrong/bad. My other Google search show sub 1 second avg dns recursion times. I've tried having my pfSense SG1000 box as the DNS source vs. pfSense first then going to 8.8.8.8.
I am trying to figure out why I'm seeing these intermittent super long DNS resolutions, hoping someone can give me some advice based on this log. My Google searches haven't found anything useful. I've looked at Unbound logging levels, but the next most detailed one generates 20+ logs per each DNS lookup and it's too much to even look at.
Thanks for your ideas.
-
@mark-dodrill said in Super long DNS times intermittently:
I've tried having my pfSense SG1000 box as the DNS source vs. pfSense first then going to 8.8.8.8.
Use the default resolver DNS settings.
With only one (1 !) exception :
Do NOT check this :
( It's ok check this option when you fully understood the consequences )
I'm pretty sure that pfSense (Netgate) doesn't want you to communicate all your private DNS info to some known (8.8.8.8) or unknown company. The exception might be that you have a contract with these entities.
-
@gertjan Thanks for the thoughts here. I've tried it both with and without this check box being checked, with the same results, unfortunately.
I think I have all the DNS Resolver settings reverted to defaults: Enable DNS Resolver checked, port 53, default SSL/TLS cert, SSL Port 853, All network interfaces, all outgoing network interfaces, DNSSEC checked, all other options unchecked.Also, in System, General Setup, DNS Resolution Behavior. I presume the best option would be use Local DNS, fall back to Remote DNS, since it is default. Is that correct?
Do you happen to know how to dig into exactly which DNS lookups took so long? Maybe I can take that up with my ISP?
-
@mark-dodrill said in Super long DNS times intermittently:
Is that correct?
Yep.
@mark-dodrill said in Super long DNS times intermittently:
Do you happen to know how to dig into exactly which DNS lookups took so long
Get more details.
What part takes so long ?Ask dig again :
dig forum.netgate.com +trace
and you see it talking to one of the root name servers.
Then to of of the TLD servers.
Then to one off the (at least two) domain name servers.
So it get a result for A - the default query.If the "root name servers" was slow, everybody on the net would suffer. These are not just 13 servers somewhere : they are cloned everywhere.
Same thing for the TLD's.The third step : the final query to the domain name servers could take a while, as there are tyically only 2 or 3 of them. And no CDN's.
Example :netgate.com. 3600 IN NS ns2.netgate.com. netgate.com. 3600 IN NS ns1.netgate.com.
-
@gertjan said in Super long DNS times intermittently:
dig forum.netgate.com +trace
Ha, I didn't know "dig" was actually a Linux command. :) I was thinking in the general sense of trying to find out which DNS queries (what exact names) were taking so long. I have looked at the unbound docs I could find and couldn't see anything. Do you know any specifics of how to identify which exact DNS lookups are taking that long?
-
Did you try :
dig forum.netgate.com +trace
??
Or, why not :
dig forum.netgate.com +trace | grep 'Received'
I saw :
;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms ;; Received 1177 bytes from 192.203.230.10#53(e.root-servers.net) in 24 ms ;; Received 691 bytes from 192.55.83.30#53(m.gtld-servers.net) in 36 ms ;; Received 186 bytes from 208.123.73.80#53(ns1.netgate.com) in 140 ms
A more clearer answer isn't possible.
But you have to know it is the answer ;) -
@gertjan
Thanks. I've enabled the periodic histogram of DNS resolution times to see if that helps. -
Update: I increased the DNS cache size from 20k to 50k, and that seems to have resolved the long average recursion time dropped from 100+ seconds to less than 2 seconds. Number of DNS queries dropped from about 40k per min to less than 400 per min. I guess the too small cache size was the key root cause.
-
@mark-dodrill said in Super long DNS times intermittently:
DNS cache size from 20k to 50k
You changed something on this page : Services > DNS Resolver > Advanced Settings ?
There is no "20k" or "50k" setting. It's more like "50 MB".
-
@gertjan On the DNS Resolver, Advanced Settings, I changed:
Prefetch Support = Checked/Enabled
Message Cache Size = 20MB (4MB default)
Outgoing TCP Buffers = 20 (10 default)
Incoming TCP Buffers = 20 (10 default)
Number of Hosts to cache = 100000 (10k default)On the main screen, I added:
statistics-interval: 60
So that I can see how many requests there were in a minute, how many came from cache, how many recursions.It looked good for a while after the changes, but I'm back to super long recursion times again. :(
May 7 08:32:18 unbound 5057 [5057:0] info: [25%]=0.225168 median[50%]=4.35495 [75%]=128.882
May 7 08:32:18 unbound 5057 [5057:0] info: histogram of recursion processing times
May 7 08:32:18 unbound 5057 [5057:0] info: average recursion processing time 124.006674 sec
May 7 08:32:18 unbound 5057 [5057:0] info: server stats for thread 0: requestlist max 131 avg 14.078 exceeded 0 jostled 0
May 7 08:32:18 unbound 5057 [5057:0] info: server stats for thread 0: 33981 queries, 15757 answers from cache, 18224 recursions, 1325 prefetch, 0 rejected by ip ratelimitingHaving 33,981 DNS queries per minute seems crazy to me (that's 566 per second). This my home network with my work VPN and one Netflix stream going (max) at a time. Not sure how to determine which IP is generating most of these requests, any ideas would be appreciated.
Back to the drawing board. -
For testing it might be worth setting up a DNS server, just to make sure it is not pfSense. I use PiHole and it has good statistics on usage, it will at least help you ID why you have so many queries. There are about 50 devices on my network and I only get ~200k queries per day. And that includes Roku's that can't report usage and cameras that really want to call home.
-
@andyrh Thanks for that idea.
I did a packet capture on my pfSense HW for just DNS queries on the LAN, and it's not showing hundreds per second. I guess it just shows cumulative totals, rather than totals per my one minute output. Interpreting the data that way results in a much smaller amount of queries per minute.
Thanks for your thoughts.