Automatically fill Unbound DNS cache with top hits list?
-
Is there a way that one could automatically (shell commands via cron?) have Unbound resolve a list of top hits in order to keep its cache full of useful DNS entries?
i.e., keep a list of hits on a network, have a Cron job go resolve those hits so that local DNS is faster for the websites actually used on the network.
Unbound Resolution is noticeably slower when it has to resolve the whole thing recursively instead of something that's in the cache.
-
Something like
nslookup < /tmp/list_of_urls
or
dig -f /tmp/list_of_urls
These are pretty slow for running through lists, is there a fast way to do a lot of hosts in a short period of time?
EDIT: Is there some way to run a whole bunch of these processes or something that accomplishes this simultaneously in order to process 100+ DNS requests/sec.
Just doing it this way it looks like it averages ~4 lookups a second with dig -f. So if I could get 25+ instances working on looking up the file entries simultaneously it would be great.I ran the dig -f for maybe 20 minutes and my unbound cache output went from <5k lines to >180k lines and DNS resolution feels much snappier.
unbound-control -c /var/unbound/unbound.conf dump_cache > /tmp/dnsdump
-
First of all why would you want to run all of them at once? As long as each one is re-queried before it's TTL expires should suffice. Running then all at once will create utilization spikes. Potentially impacting network performance during those times.
If I were going to do something like this I'd get the TTL of each and re-query for them at maybe 10% remaining. This would probably reduce the need for them to be fast.
A possible query method might be to use curl (PHP or shell). There may be some other PHP get host type functions available too. You might look at the get host functions pfSense uses for some ideas.
-
Well my thought was that if I set it up on cron to run early in the morning before the network is in use for the first run of the day. The biggest surge would be when no one would notice anyways. Subsequent re-queries would only go outside of the network for the entries with shorter TTLs.
The system is also way overkill for what it needs so I wouldn't think that even pulling in a lot of DNS requests at once would be noticeable.
I tried using host before I tried nslookup or dig, but I couldn't get it to use a file. If you could tell me what I'm messing up there I'd really appreciate it!
Finally, are there any tools that can natively run multiple processes simultaneously? Like can I run multiple instances of dig, nslookup, host, or anything that can pull in DNS requests?
I've no idea how to use cURL or PHP. I don't know if they are simply enough that I could hack something together given a good tutorial or if that would be a bridge to far?
-
host < /tmp/list_of_url
Just returns the instructions for usage of host
What am I doing wrong there?
-
I tried running multiple instances of dig & nslookup in batch mode and it always fails. The shell just hangs after a while and when I Ctrl+C it will tell me one or more of the instances = "Exit 9", and the rest are either "Done" or "+ Done". I don't know what the difference in done and +done are, or what Exit 9 means, or why it's happening.
The way I'm running multiple instances is splitting the file containing the list of URLs into many smaller files, then running
dig -f /tmp/list00 & dig -f /tmp/list01 & dig -f /tmp/list02
This always fails after running for a bit. However, if I open up multiple shells, and in each shell run a single instance, they will all run concurrently until finish with no problems. I've tried this with 8-10 shells all running dig at the same time.
dig -f /tmp/BIGlist
I've tried this with nslookup as well with the same results, still can't get host to work from a file. I don't think it's possible.
It seems like when I run many parallel dig (or nslookup) instances in the background, when the first one finishes then it just halts everything. I don't have any evidence of this, just seems like that?Any input from anyone would be greatly appreciated!
-
Doesn't the prefetch checkbox do this for you?
My understanding is if something does a query and after the cached item has been given to the requester.. If the remaining ttl on the item is less than 10% of its ttl then it will automatically be refreshed.
But to be honest if your having issue with using a resolver, because items take to long to resolve.. Then maybe you should just use a forwarder.. There should be really no noticeable issues with lookups from clients once your resolver has been running for a while.
That being said if your in the part of the world where most of the sites you visit NS are on the other side of the planet from you ;) Then yeah you could have some added latency issues with dns queries. Or if your on some sort of sat internet connection with HIGH latency..
If your going to go to the trouble of prefilling your cache.. might as well just let a forwarder do that for you ;)
-
Yeah indeed, why don't you just let Unbound do its job as suggested above (plus tuning the cache via the GUI if it doesn't fit your needs - TTL/number of hosts).
-
I'm not sure what the parameters are for prefetch, but I've always had it turned on.
When I pull in a bunch of DNS entries this way, the network is noticeably snappier.
I didn't think tweaking TTLs on my end was a good idea? I thought if I set it too long I could end up causing problems?
I've used the resolver because I like the idea of cutting out the middle man and letting my own machine do the work. Also, I use DNSBL which requires Unbound. I guess I could use Unbound in forwarding mode and it might work fine with DNSBL?
Honestly at this point I'm really just curious to make it work by pulling in requests.
If I can figure out an efficient way of doing it, my network is really snappy and I'm still resolving my own requests. Seems like a win-win to me! -
What exactly is taking long time to resolve?? I have not seen any sort of added delay in resolving anything.. And I have been running unbound as the resolve since it was first possible to do so, you are talking a few ms.. This is not going to be noticeable.. If you are saying you can tell the difference between a query that is cached takes say <3 ms and one that takes 30 ms I would say your crazy.. Lets say it took a whole 100ms – your taking .1 of second.. Come on - there is no way this is going to be an issue.
With most domains your NS are going to be cached for extended periods because of longer TTLs of the NS for a domain and down the tree.. So the only thing you are talking about having to look up is the actual record of the thing in question.. say www.domain.com -- what is the ttl of said record and their NS?
If your noticing issues with specific domains - maybe its just their DNS sucks and they have really short ttl with crappy NS that take long time to respond, etc.
If your having issues with resolving of something specific.. I would suggest you troubleshoot that specific issue, what are the TTLs involved in resolving said record. What sort of response time are you getting from their authoritative NS, etc.
Preloading your unbound cache is just seems pointless to shave a few ms of some initial resolve of a fqdn.
-
I would say most sites take ~2-3 seconds to load up unless they've been recently visited.
If I go fetch a lot of the top url DNS entries from a published list (Alexa, Magic Million, etc.) then everything seems to load just about instantly.
My Unbound cache also increases exponentially in size obviously.I don't think it's any issue with Unbound itself, like you stated before my latency is higher which is why (I assume) the network is notably slower without DNS cached locally.
It's really not a huge deal, but I wanted to poke around and see if I could improve performance while still resolving. I found a way, I'm just having trouble making it automatic and reliable.
I think I might have figured it out.
fetch a list, trim the list down to a much smaller number of entries, split the big list into many smaller lists, add an entry before each url for dig (so "google.com" becomes "dig google.com", chmod +x each of the split files, run all of the split files in parallel (/tmp/f0 & /tmp/f1 & /tmp/f2…) this was successful with two files running in parallel. We'll see if it scales up reliably.
-
"I would say most sites take ~2-3 seconds to load up unless they've been recently visited. "
How does relate to dns, and not more to do with your browser caching all the stuff it downloads from the site?? Your saying it takes 2-3 seconds for your dns to resolve.. Then yeah you have something going on with resolving.
You do understand there are multiple dns caches involved here, your OS is going to cache, your browser is going to cache - and then you have the resolver that will cache. Delays in browser loading more often have to do with issues with your browser cache and less to do with how long it takes to query the fqdns of the stuff used on the page.
You sure your not just clearing your browser cache when you close it? And your having download everything vs pulling it from cache..
Lets take forum.pfsense.org for example.. It takes all of 36ms to do that query direct to their authoritative servers.
;; QUESTION SECTION:
;forum.pfsense.org. IN A;; ANSWER SECTION:
forum.pfsense.org. 300 IN A 208.123.73.18;; AUTHORITY SECTION:
pfsense.org. 300 IN NS ns1.netgate.com.
pfsense.org. 300 IN NS ns2.netgate.com.;; ADDITIONAL SECTION:
ns1.netgate.com. 3600 IN A 192.207.126.6
ns2.netgate.com. 3600 IN A 162.208.119.38;; Query time: 36 msec
;; SERVER: 162.208.119.38#53(162.208.119.38)
;; WHEN: Tue Apr 18 13:27:55 CDT 2017
;; MSG SIZE rcvd: 141How does that equate to your 2 to 3 second delay in page loads?? Even if you multiple that by 10 your talking 360ms.. That is .36 of second.. How is that going to be an issue??
If you feel your resolving is taking too long - then pick some domains that are of concern and look into how long they actually take to resolve.. Lets say you shave that .036 it takes to resolve forum.pfsense.org off your load time of 3 seconds. Your talking 1% savings in time out of 3 seconds.. Come on - how is that an issue?
edit: Lets say you want to preload that.. So your going to run your script to lookup forum.pfsense.org every 5 minutes? Since that is the ttl of that record.. So for your preloading to do anything it would have to query forum.pfsense.org ever 5 minutes.. Or you are going to run into an issue where your browser asks for forum.pfsense.org and the resolver will have to resolve it.
So your going to run a script every 5 minutes to query forum.pfsense.org so your cache is loaded, to save .036 seconds of your load time in your browser?? How much more cpu does that use up, so electricity.. How much more bandwidth does that use up? Les call it 140 bytes from my above query. Times every 5 minutes, 24 hours a day just for the 1 record.. What is that about 40Kbytes extra for each record a day to save .036 seconds when you happen to load up the pfsense forums.
-
Haha I don't know what you want to hear man. All I can tell you is that if I set your down at a computer on my network, had you browse under normal conditions, then pulled in a bunch of DNS queries at once, you would be able to tell the difference.
It isn't anything crazy, but it does make it feel noticeably faster.
I am definitely not worried about the few bucks more a year I'll have to pay for the CPU cycles or the kBs of data that it will take to do it. -
Well what you really should do is identify the issue. So it can be resolved specifically and properly.
I'd start with httpWatch. The basic edition is free.
https://www.httpwatch.com/Pretty sure Chrome has similar capabilities that can show where time is being consumed.
-
That's very cool, I'll check it out!
I don't think that there is a real problem per say, I think I just have a high latency connection and that's life. For example, when I run dig on a lot of url's, at a glance it looks like most resolve in ~300ms, but some queries are in the 2-3 second range, as John pointed out he has a normal latency connection and is getting returns in the 30ms range. Again, everything works fine and I wouldn't call it a significant problem at all (after all I am still resolving). It's just something I thought I'd see if I could tweak.
You definitely do tell a difference when you go from 2-3sec in some cases to 0ms. While 2-3 seconds is abysmal for DNS, it's still only 2-3 seconds and those times are the exception not the norm. This really is something I'm just curious about, especially since it's actually a noticeable improvement on my network.I'm not suggesting this is something other should do either. For me it's educational messing around with the commands and trying to hack something together that does what I want.
But I will check that program out, maybe there is something wrong. Thank you.
-
Hopefully the basic edition will show the DNS lookup timing. I have access pro edition.
-
Very cool, getting that now!
-
Alright, I got it working. It's pretty hacked together but it works (so far).
The first cron job retrieves the Majestic Million Top URL List every day, it then pulls out only the column listing the URLs, and cuts it down to the first 5000 lines. Then it adds dig commands limiting the timeout to 3s (I think default = 5s) and only one try (default =3?). It splits the file up into 50 files of 100 lines each and makes them executable, and deletes all of the unnecessary files.
30 04 * * * root
fetch -o /tmp/mm.csv http://downloads.majestic.com/majestic_million.csv && cut -d , -f 3 /tmp/mm.csv | cat >> /tmp/mmf && rm /tmp/mm.csv && sed -I -e '1d;5002,$d' /tmp/mmf && rm /tmp/mmf-e && sed -I -e 's/^/dig +short +time=3 +tries=1 +ttlid /' /tmp/mmf && rm /tmp/mmf-e && split -d -l 100 /tmp/mmf /tmp/mmf && rm /tmp/mmf && chmod +x /tmp/mmf**
The second cron job runs the DNS query every 5 minutes.
0,5,10,15,20,25,30,35,40,45,50,55 * * * * root
/tmp/mmf00 & /tmp/mmf01 & /tmp/mmf02 & /tmp/mmf03 & /tmp/mmf04 & /tmp/mmf05 & /tmp/mmf06 & /tmp/mmf07 & /tmp/mmf08 & /tmp/mmf09 & /tmp/mmf10 & /tmp/mmf11 & /tmp/mmf12 & /tmp/mmf13 & /tmp/mmf14 & /tmp/mmf15 & /tmp/mmf16 & /tmp/mmf17 & /tmp/mmf18 & /tmp/mmf19 & /tmp/mmf20 & /tmp/mmf21 & /tmp/mmf22 & /tmp/mmf23 & /tmp/mmf24 & /tmp/mmf25 & /tmp/mmf26 & /tmp/mmf27 & /tmp/mmf28 & /tmp/mmf29 & /tmp/mmf30 & /tmp/mmf31 & /tmp/mmf32 & /tmp/mmf33 & /tmp/mmf34 & /tmp/mmf35 & /tmp/mmf36 & /tmp/mmf37 & /tmp/mmf38 & /tmp/mmf39 & /tmp/mmf40 & /tmp/mmf41 & /tmp/mmf42 & /tmp/mmf43 & /tmp/mmf44 & /tmp/mmf45 & /tmp/mmf46 & /tmp/mmf47 & /tmp/mmf48 & /tmp/mmf49 &
The initial run takes ~34s right after a flushed cache. This pretty much lines up with my ~300ms/request average (cutting out the really high time outliers).
Subsequent runs take ~13s.
CPU usage during the run is ~40%. This is significant considering my router runs an i5-2400.
Bandwidth usage is ~40kbps during the initial run.
Cache size increases by a factor of ~165 over the size ~ a minute after a flush with normal usage.
It works for me and makes my network noticeably snappier. But it also sucks down a lot of CPU for a router. That doesn't bother me on my network but this is obviously not a useful thing to do for most.
-
So it takes you 300ms to query for forum.pfsense.org from their NS??
Where are you in the world? What is your internet connection? If your latency is that bad, that is going to effect all downloads not just dns queries.. So I again do not see how trimming .3 of second is going to make a freaking difference in your performance..
Lets see your httpwatch traces, etc.
-
What you're doing will cut down on the initial seeding time of the cache but you can't cheat on the TTLs set on the records, they have to refetched when they expire and that's going to equally costly compared to the situation if you just started with an empty cache.
-
So it takes you 300ms to query for forum.pfsense.org from their NS??
Where are you in the world? What is your internet connection? If your latency is that bad, that is going to effect all downloads not just dns queries.. So I again do not see how trimming .3 of second is going to make a freaking difference in your performance..
Lets see your httpwatch traces, etc.
Not sure who that question is directed at. Probably the OPer. But just shy of 300 ms is what it takes to get the IP address using DNS resolver. This can be seen in that httpWatch screen capture. It's not simply just a query to the authoritative NS. Has to walk the chain. So it adds up. 3 tenths of a second is humanly perceptible. Though not by much.
For me going to the pfSense home page the total DNS time was about 400 ms for about 4 lookups. Two of which were to Google services. But by then most of the page is probably already rendered.
The OPer hasn't really given much details about the situation other that it's slow. But snappier if DNS pre cached. But nothing about the service, it's latency, bandwidth etc. Some httpWatch traces could potentially reveal some relevant info.
-
"It's not simply just a query to the authoritative NS. Has to walk the chain."
It only has to walk the whole chain if none of the chain is cached.. But part of the chain should already be cached.. Many NS have much longer TTLs than just the records, etc. Once you ask the roots for the NS of the tlds.. those have a ttl of
;; QUESTION SECTION:
;org. IN NS;; ANSWER SECTION:
org. 86400 IN NS a0.org.afilias-nst.info.
org. 86400 IN NS b2.org.afilias-nst.org.
org. 86400 IN NS b0.org.afilias-nst.org.
org. 86400 IN NS c0.org.afilias-nst.info.
org. 86400 IN NS a2.org.afilias-nst.info.
org. 86400 IN NS d0.org.afilias-nst.org.So you sure do not have to walk the chain to get those unless the ttl has expired.
Now a problem that you might have with pfsense.org is they have their NS with a very low 300 second TTL which doesn't make a lot of sense unless they were about to change their NS..
Looks like someone forgot to update the ttl on those records.. since I show the actual ns1 and ns2 having ttl of 3600
;; AUTHORITY SECTION:
netgate.com. 3600 IN NS ns1.netgate.com.
netgate.com. 3600 IN NS ns2.netgate.com.So normally when there is a low ttl on a record, you would only have to query the authoritative NS directly when it expires, not walk the whole chain again.
-
Suggest you Wireshark DNS of an actually http://pfSense.org/ browsing session after the TTL has expired.
The attached Wireshark screen capture is of browsing to http://pfSense.org/ (in a new browser session with the sites cache and cookies cleared; not that that should matter) after having been there several times already within the past hour and the DNS TTL had expired.
Up the chain it goes to:
Name: a0.org.afilias-nst.info
Address: 199.19.56.1![pfSense.org DNS.jpg](/public/imported_attachments/1/pfSense.org DNS.jpg)
![pfSense.org DNS.jpg_thumb](/public/imported_attachments/1/pfSense.org DNS.jpg_thumb)