DNS Reply Resolver vs Reply and cache
-
@jrey said in DNS Reply Resolver vs Reply and cache:
implies the "resolver" (localhost) doesn't cache results for itself -- correct ?
huh? Yes unbound caches.. Not really sure what your confused about..
Yes a local host also runs its own local cache.. And will cache what it got from unbound for the length of the ttl.
So lets say client asks unbound for www.domain.tld, unbound resolves this from the authoritative NS.. Lets say the ttl on this record from the authoritative ns is 3600 seconds.
Client should get the IP with 3600 second ttl..
Now client 2 comes and asks unbound for the same fqdn www.domain.tld say 60 seconds latter, this client would get the response from unbound with a ttl of 3540 seconds..
client 3 comes in say 3000 seconds after unbound first looked it up, that 3rd client would get an answer with 600 seconds left on the ttl.
Now any of the clients as long as they want to look up this www.domain.tld again will just pull from their local cache until that ttl has expired.. They would then need to go ask unbound again.
Now some devices, especially say iot devices, don't have their own local cache, and any time the device wants to talk to www.domain.tld they would ask unbound.. But windows and linux, and pretty much any actual OS will run its own local cache.. Browsers will also normally run their own cache as well.. For example if you want to view what is in the firefox own cache.. Which will its own cache different then the local machines own cache.
about:networking#dns
unbound be it in resolver mode, or forwarder mode will cache what it looks up, and will answer from cache with whatever is left on the ttl for any client asking it for that..
If you want to see what is in the cache of unbound..
[24.03-RELEASE][admin@sg4860.home.arpa]/: unbound-control -c /var/unbound/unbound.conf dump_cache
you can also see what unbound would use to lookup a specific url and what it has cached for it.
[24.03-RELEASE][admin@sg4860.home.arpa]/: unbound-control -c /var/unbound/unbound.conf lookup forum.netgate.com The following name servers are used for lookup of forum.netgate.com. ;rrset 584 3 0 7 0 netgate.com. 584 IN NS ns3.netgate.com. netgate.com. 584 IN NS ns2.netgate.com. netgate.com. 584 IN NS ns1.netgate.com. ;rrset 584 1 0 3 0 ns1.netgate.com. 584 IN A 208.123.73.80 ;rrset 584 1 0 3 0 ns2.netgate.com. 584 IN A 208.123.73.90 ;rrset 584 1 0 3 0 ns3.netgate.com. 584 IN A 34.197.184.5 Delegation with 3 names, of which 3 can be examined to query further addresses. It provides 3 IP addresses. 34.197.184.5 not in infra cache. 208.123.73.90 not in infra cache. 208.123.73.80 not in infra cache. [24.03-RELEASE][admin@sg4860.home.arpa]/:
If you want to see what unbound has in its cache for a specific record
[24.03-RELEASE][admin@sg4860.home.arpa]/: unbound-control -c /var/unbound/unbound.conf dump_cache | grep forum.netgate.com forum.netgate.com. 484 IN A 208.123.73.71 msg forum.netgate.com. IN A 32896 1 484 0 1 1 3 -1 forum.netgate.com. IN A 0 [24.03-RELEASE][admin@sg4860.home.arpa]/:
If you want to see for example what windows has in its local cache
ipconfig /displaydns
-
@jrey said in DNS Reply Resolver vs Reply and cache:
in the response what is the subtle difference between
resolver vs. replyThe unbound receives a request.
If it's a host name and unbound knows about, the answer is given right away.
For examples : host overrides or static DHCP leases.
If not, the cache is checked, and if its found and TTL is still valid => bingo.
This is what happens normally the most of the time :Grey = hit.
Some other color : resolving took place, so this took some time.Btw, There is always more to it.
Like this one :
The first and fourth are obvious ... and make caching even faster as expired stuff gets renewed 'if ever needed again' so the cache will grow .... -
what is in the local cache for both cases is
unbound-control -c /var/unbound/unbound.conf lookup sample_in_question
"the following name servers are used for lookup of .... and it list the upstreamsThis is the same for both cases. so is the dig
however,
Starting with no record upstream
- on localhost query, cache record is created on the upstream
- delete the upstream cache
- query again the subsequent response, cache record is created on the upstream
when the query is made from the localhost, does not return the "cache" and always creates the record in the upstream if I delete the cache there in between the queries --- ie it is going to the upstream every single time the request is made from localhost. it look like this..
(for clarity this made 2 trips upstream)Now the Client Test
again starting with no record upstream
take exactly the same query and run it on a client,
- netgate returns "reply" for the first one
- creates upstream cache (ie it went out to get it)
- Delete the upstream cache
- query again response is "cache" for subsequent queries with nothing being touched upstream (that is no record added) (as the reply indicates this is coming from the netgate "cache" or it would have created the record upstream if it had reached out ?)
after the ttl expires then the next query from a client will create the cache upstream again "reply" followed, by "cache" for subsequent client queries. again until is expires
(this made 1 upstream trip and 2 cache)because the localhost query will recreate the upstream cache record every time
and and the client query will create it the first time, but not on subsequent queriesI might conclude the localhost queries are not cached on the Netgate as it is going upstream every time (resolver), whereas client queries are getting cache hits from the netgate (reply/cache) and not going upstream until the local expires. then the next query is reply again, followed by cache etc.
A packet capture does confirm that localhost has traffic upstream with every request and client traffic does not only on the first "reply", subsequent queries create no traffic upstream
-
@jrey said in DNS Reply Resolver vs Reply and cache:
I might conclude the localhost queries are not cached on the Netgate as it is going upstream every time (resolver)
huh? What are you actually doing a query for, the name localhost? from where?
Where are you getting this info?
When you say localhost doing a query for some fqdn, are you talking about pfsense itself? Are you using the dns lookup gui page? It is always going to ask all the ns you have listed in general..
Sorry but I am having a hard time understanding what your even asking about.. And what your referring to when you say localhost.. Is this pfsense, is this some other device on your network asking unbound?
-
the client is shown in both screen captures is the client ip making the request ..
the query most often is from the netgate so in that case of the 127.0.0.1 (resolver) that is the netgate making the request itself (so typically for example when pfblocker downloads a list of files from the same source, or there is a reverse (PTR) it has to look it up)
the 192.168.0.19 again is the client going through the netgate (my workstation actually)
my testing was done by actually ssh into the netgate and doing a dig me_a_name (some external name, not internal)
the same dig as on my workstation, giving the same results but as (reply/cache)There is nothing wrong with the reply in either case - just seems that localhost (the netgate) goes upstream for every single query it makes itself and therefore doesn't cache the result
where as the client going through the same resolver - gets the same result but caches it.
its like to the localhost (the netgate) this "the following name servers are used for lookup of" means "go upstream every time",
but when a client hits the same netgate it means "go upstream and cache the result"ie they should either both go upstream every time, based on the "unbound-control -c /var/unbound/unbound.conf lookup sample_in_question"
and response of
"the following name servers are used for lookup of"or they should both cache.
Clearly by the packet packet capture
when the netgate does the dig me_a_name it goes upstream every time
dig me_a_name (round trip /resolver)
dig me_a_name (round trip /resolver)when the client going through the netgate going the same dig me_a_name
dig me_a_name (round trip/reply)
dig me_a_name (cache)
dig me_a_name (cache)dig me_a_name is always external
when you are on the netgate the query would be from itself -> to itself via the local host address 127.0.0.1 -> follow the path upstream
queries on the netgate go upstream every time (packet capture)
when a client makes a request the only difference would be
192.168.0.x (or 19) in my sample which has the DNS of 192.168.0.1 (the netgate) ->follow the path upstreamthis clearly does not - only sending the first query upstream the rest from cache.. (packet capture)
It's not the end of the world just curious - there are no DNS issues as such, everything/everyone is getting the right / same answer .. netgate itself or client on the netgate)
if the packet capture had either (for the same query regardless of that source )
a) traffic for every query upstream that would be ok
or
b) it the traffic where first qurery upstream second query no upstream (cache) that would be ok tooBut that is not the case -- we see actually a lot of "a" (all traffic by the netgate to the netgate) , and little of "b" (clients) doing this all the time.
the data is the DNS-Reply records generated by unbound / pfblocker.
in this data the "resolver" records are always the netgate doing the query to unbound
and the "reply/cache" records always the clientsit's the traffic path... to the upstream / cache and why they are different that is the curiosity.
-
@jrey maybe I am having a bad day, where are you getting cache reply from? I am having a real hard time understanding what your concerned with.. Or why you think its not being returned from cache?
Do you have prefetch enabled? Is will in the background do a refresh for something that ttl is almost expired, etc.
What specific log are you looking at or what info where you getting cache reply from like your screenshot.
If I do a dig on pfsense for something.. that is cached, it sure isn't having to look that up..
[24.03-RELEASE][admin@sg4860.home.arpa]/: dig forum.netgate.com ; <<>> DiG 9.18.20 <<>> forum.netgate.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47027 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;forum.netgate.com. IN A ;; ANSWER SECTION: forum.netgate.com. 3586 IN A 208.123.73.71 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Thu Aug 29 15:08:38 CDT 2024 ;; MSG SIZE rcvd: 62 [24.03-RELEASE][admin@sg4860.home.arpa]/:
Do you really think unbound went out and talked to the authoritative ns and gave me back an answer it 0 ms? ie less than 1..
Please post the output of your dig command.. for example here is one where its not from cache. I then look it up again, and you can see got an answer in 0ms vs the 264 ms it took the first time where unbound had to resolve
[24.03-RELEASE][admin@sg4860.home.arpa]/: dig www.yahoo.com ; <<>> DiG 9.18.20 <<>> www.yahoo.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40981 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;www.yahoo.com. IN A ;; ANSWER SECTION: www.yahoo.com. 3600 IN CNAME me-ycpi-cf-www.g06.yahoodns.net. me-ycpi-cf-www.g06.yahoodns.net. 3600 IN A 69.147.65.251 me-ycpi-cf-www.g06.yahoodns.net. 3600 IN A 69.147.65.252 ;; Query time: 264 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Thu Aug 29 15:12:46 CDT 2024 ;; MSG SIZE rcvd: 119 [24.03-RELEASE][admin@sg4860.home.arpa]/: dig www.yahoo.com ; <<>> DiG 9.18.20 <<>> www.yahoo.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;www.yahoo.com. IN A ;; ANSWER SECTION: www.yahoo.com. 3592 IN CNAME me-ycpi-cf-www.g06.yahoodns.net. me-ycpi-cf-www.g06.yahoodns.net. 3592 IN A 69.147.65.252 me-ycpi-cf-www.g06.yahoodns.net. 3592 IN A 69.147.65.251 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP) ;; WHEN: Thu Aug 29 15:12:54 CDT 2024 ;; MSG SIZE rcvd: 119 [24.03-RELEASE][admin@sg4860.home.arpa]/:
Along with response time, you can also almost always tell when something was served from cache vs having to be resolved because the ttl is some odd number as it has been counting down.. Notice in my first query the ttl is 3600, which is what the authoritative NS has set.
edit: here I started a sniff on my wan for port 53.. did dig for yahoo that is cached, no outside queries, then did one for cnn, and clearly you can see where that was resolved for.
But you don't see anything for yahoo in the sniff.
you can even see the resolve process where it asked gltd server for the ns of cnn.com and then asked that ns and got back cname that it followed. But nothing for yahoo listed, because it didn't need to lookup anything because it served it from cache.
It didn't have to ask roots for gtld servers for .com, because it had those cached as well.
-
@johnpoz said in DNS Reply Resolver vs Reply and cache:
Or why you think its not being returned from cache?
Well, because I can see the hit on the upstream DNS that I also control sitting right in front of me (different screen) and a packet capture says it is going there and creating a cache record there if I delete the cache record there between queries) every time I or the netgate itself queries. 127.0.0.1 (itself, which is the default) You actually have to select localhost on the list this.
but this is interesting
dig microsoft.com using ssh (so most packages) look stuff up (pfblocker for example)
(the DNS gui page same thing) defaults to using 127.0.0.1 (as you mentioned)
so everything by default does in fact go there, there is IMHO no cache going on here (round trip traffic upstream with every query) it makes against itself 127.0.0.1
change nothing still ssh'd into the netgate and then query targetting the LAN's IP of the interface (which is still the netgate's unbound) so
dig microsoft.com (at)192.168.0.1
same response from server except reply/cache (1 round trip upstream / 1 cache)a DNS Lookup from the web page and a dig all default to 127.0.0.1 and all queries have created traffic to the upstream
the same dig from the same ssh session but specify with the server as (at)192.168.0.1 (still the negates resolver)
(1 upstream, 1 cache)because of the proximity and speed of the upstreams, the response time difference is negligible.
However query against 127.0.0.1 it returns "resolver" does not IMHO cache (only because of the traffic in a packet capture that says it went next door for the answer with every query)
query to 192.168.0.1 returns reply/cache (with only 1 trip next door the reply, and 0 trips next door on the query that logged the cache) -
I think the subtle difference might be that you allow your netgate to go directly to root server, where as mine are specifically named internal
-
@jrey said in DNS Reply Resolver vs Reply and cache:
where as mine are specifically named internal
Huh? again where are you seeing this??
And your forwarding in unbound.. What are you doing a query for? microsoft - from where?? This is some client asking unbound on pfsense... Where are you seeing this log that says resolver in it???
Where is the outbound from your dig on pfsense?
This is not an output of dig - that is not any sort of log that I am aware of in pfsense?
is that some pfblocker log your looking at?? Where exactly in the gui of pfsense are you grabbing those screenshots from?
there is no different between asking loopback or the IP unbound is listening on.
first one no cache, second one is cached.
;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;microsoft.com. IN A ;; ANSWER SECTION: microsoft.com. 3600 IN A 20.76.201.171 microsoft.com. 3600 IN A 20.70.246.20 microsoft.com. 3600 IN A 20.236.44.162 microsoft.com. 3600 IN A 20.112.250.133 microsoft.com. 3600 IN A 20.231.239.246 ;; Query time: 32 msec ;; SERVER: 192.168.9.253#53(192.168.9.253) (UDP) ;; WHEN: Thu Aug 29 17:32:56 CDT 2024 ;; MSG SIZE rcvd: 122 [24.03-RELEASE][admin@sg4860.home.arpa]/: dig @192.168.9.253 microsoft.com ; <<>> DiG 9.18.20 <<>> @192.168.9.253 microsoft.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49554 ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;microsoft.com. IN A ;; ANSWER SECTION: microsoft.com. 3597 IN A 20.70.246.20 microsoft.com. 3597 IN A 20.236.44.162 microsoft.com. 3597 IN A 20.112.250.133 microsoft.com. 3597 IN A 20.231.239.246 microsoft.com. 3597 IN A 20.76.201.171 ;; Query time: 0 msec ;; SERVER: 192.168.9.253#53(192.168.9.253) (UDP) ;; WHEN: Thu Aug 29 17:32:59 CDT 2024 ;; MSG SIZE rcvd: 122
unbound doesn't care if it resolves or forwards to some other ns.. Once it gets an answer and something else asks it for that same fqdn, it will return its cache entry for that entry.. Its not going to go asking for it again, until such time that cache has expired. Or if you have prefetch set and something asks for a record and there only some amount of time left on the ttl, then it will answer from cache - and then in the background go and refresh its cache. You would have to look to the unbound specifics when it will refresh its cache, etc..
But I am still at a loss to where your seeing what your posting.. I am not aware of any log in pfsense that would show something like what your showing..
if you enabled query and reply logs in unbound in the custom option box
log-queries: yes log-replies: yes log-tag-queryreply: yes log-servfail: yes log-local-actions: yes
You get stuff like this in the log
Where I did a query on pfsense to its own address 192.168.9.253, and a query for microsoft.com from my pc at 192.168.9.100
you can see on pfsense this was no cache
[24.03-RELEASE][admin@sg4860.home.arpa]/: dig @192.168.9.253 microsoft.com ; <<>> DiG 9.18.20 <<>> @192.168.9.253 microsoft.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21138 ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;microsoft.com. IN A ;; ANSWER SECTION: microsoft.com. 3600 IN A 20.76.201.171 microsoft.com. 3600 IN A 20.70.246.20 microsoft.com. 3600 IN A 20.236.44.162 microsoft.com. 3600 IN A 20.112.250.133 microsoft.com. 3600 IN A 20.231.239.246 ;; Query time: 23 msec ;; SERVER: 192.168.9.253#53(192.168.9.253) (UDP) ;; WHEN: Thu Aug 29 18:07:14 CDT 2024 ;; MSG SIZE rcvd: 122
And then when did from my pc, it was clearly a cached response
$ dig @192.168.9.253 microsoft.com ; <<>> DiG 9.16.50 <<>> @192.168.9.253 microsoft.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40345 ;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;microsoft.com. IN A ;; ANSWER SECTION: microsoft.com. 3586 IN A 20.70.246.20 microsoft.com. 3586 IN A 20.236.44.162 microsoft.com. 3586 IN A 20.112.250.133 microsoft.com. 3586 IN A 20.231.239.246 microsoft.com. 3586 IN A 20.76.201.171 ;; Query time: 0 msec ;; SERVER: 192.168.9.253#53(192.168.9.253) ;; WHEN: Thu Aug 29 18:07:28 Central Daylight Time 2024 ;; MSG SIZE rcvd: 122
Notice the difference in the response time and how the ttl was less then when I did the query on pfsense.
-
Q. where are you seeing this??
I'm actually see that in Graylog but the source of the data is pfSense / pfblocker Unified log, sent in real time. The data is not wrong, I was just trying to clarify the distinction of those named resolver. Sample screen shot of the log file on pfSense
Q. And your forwarding in unbound.
Yes, I'm forwarding in unbound.Q. What are you doing a query for? microsoft - from where??
A1. for microsoft.com (but it could be and is anything)
A2. in this case from the pfSense box with and without the server specified.the response showing as resolver is the default dig microsoft.com the response is to and from 127 addresss and does a round trip next door (server in this reply shows as 127.0.0.1) the second screen capture with the reply/cache is simply dig microsoft.com (at)192.168.0.1 The responding server is just that 192.168.0.1 which is what clients would hit. in this case only the reply caused a query next door the other came from cache didn't even open the door.
Q. Where are you seeing this log that says resolver in it???
A. see first answer unless you mean something else.Statement: there is no different between asking loopback or the IP unbound is listening on.
A. clearly there isunbound-control -c /var/unbound/unbound.conf lookup forum.netgate.com The following name servers are used for lookup of....
and I said (and maybe not clearly enough) all of mine are
unbound-control -c /var/unbound/unbound.conf lookup sample_in_question "the following name servers are used for lookup of .... and it list the upstreams
the list of upstreams is different than the responding servers format you are showing.
so then
What I think the definition for the original question Resolver vs. Reply/Cache is:
when you specify a forward to (and maybe only if it is local) the query on the netgate with localhost (or 127.0.0.1) will always resolve by reaching upstream when the query is against itself on that interface and it reports that as "resolver" (perhaps because in the unified log 127.0.0.1 implies simply I'm going to resolved this, not questions asked about or regarding cache)However when you hit the same unbound on the non-local IP (so 192.168.0.1) -- it says ah here is a query, let me look that up for you, I don't have it go upstream get it, cache the result- next query again = I have that in cache.
the dig structure as shown above is exactly the same except for explicit server on the 192.
no magic query or anything like that. The results are exactly the same except one says it came from 127.0.0.1 and the other from 192.168.0.1 -- queries against 127.0.0.1 in this setup are most assuredly going upstream every time (but I'm not worried about the response time) it's actual not any better or worse than those that return on 192.168.0.1 when it is asked and returns them via cache.the only reason for the question in the first place was to determine the correct filters on the Graylog dashboard. Not really a question about the DNS query or answer. Just to confirm what the difference was. In talking to you and testing I okay with the answer what I think.
As you can see the pfSense box, on its own does a bunch of queries by and for itself (so stuff running on the box, be that pretty much anything that hit 127.. (itself) of blocker, other stuff anything that runs their queries there.
the queries by and for clients is exactly that 55.1% go upstream and 38.6% are cache.
with the pfsense queries in there the numbers are wacked.Clearly in this setup, I'm actually ok with the stuff to and from localhost being isolated and in fact talking upstream every time.. There is no issue with the performance and none of that pfsense dns traffic actually counts against a specific client IP anyway.. That is just the various pfsense bits doing their thing looking stuff up as they need to.
Even if my definition is wrong, and therefore based solely on the observation of what goes up and by whom, I'm really ok with the way it is working.
Thanks, even though you may not realize it a couple of things you said gave me some clues of things to look at. I needed that sounding board. So much appreciated.