Caching of NXDOMAIN

Jesper 1

In my dns log and in the pfBlocker reports I'm getting a lot of reverse resolves from mainly 77.90.x.x, but also other IPs that results in a NXDOMAIN. The log says they have a TTL of 3600 sec (might be because I configured that as the minimum TTL), but the resolver responds every time even if just a couple of seconds has passed since the last resolve. How do I get the dns resolver to cache these responses and not do a new lookup every time?

johnpoz

@Jesper-1 not sure what your asking - why wouldn't the resolver answer the client because it was NX?

The neg should be cached, for the length sent by the SOA, or whatever your min ttl was set to.. Or if you had set a

cache-max-negative-ttl

Value, don't believe that parameter is exposed to the gui. But you could always set it in the custom options.

But if I query for example www.lgsjldjhlsjfsfd.com the SOA for .com would respond NX..

 <<>> DiG 9.16.45 <<>> @192.168.9.253 www.lsjdlsdjfs.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 18839
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.lsjdlsdjfs.com.            IN      A

;; AUTHORITY SECTION:
com.                    3600    IN      SOA     a.gtld-servers.net. nstld.verisign-grs.com. 1701515704 1800 900 604800 86400

;; Query time: 13 msec
;; SERVER: 192.168.9.253#53(192.168.9.253)
;; WHEN: Sat Dec 02 05:15:29 Central Standard Time 2023
;; MSG SIZE  rcvd: 120

From that response there would be a min ttl, but you overrode it with yours.. This min ttl would be the min ttl for records under the soa that don't have their own specific ttl set. Or would be used as the neg ttl, ie how long to cache a NX..

That NX response would be cached, and if a client asks again - then that NX would be served by the cache.. Are you saying that unbound constantly tries to resolve it, and its not being served by the cache?

So for example in the above see how ttl returned is 3600 (I too have min ttl set to 3600) but if I ask it a little bit later notice the ttl has gone down, and notice the query time is only 2 ms, ie it was served to me (the client) from cache.

; <<>> DiG 9.16.45 <<>> @192.168.9.253 www.lsjdlsdjfs.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 24497
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.lsjdlsdjfs.com.            IN      A

;; AUTHORITY SECTION:
com.                    2872    IN      SOA     a.gtld-servers.net. nstld.verisign-grs.com. 1701515704 1800 900 604800 86400

;; Query time: 2 msec
;; SERVER: 192.168.9.253#53(192.168.9.253)
;; WHEN: Sat Dec 02 05:27:37 Central Standard Time 2023
;; MSG SIZE  rcvd: 120

If you have a client that keeps asking for something, unbound will answer and log it, etc. even that answer is a NX, etc. When you have a client that does not have a local cache, you can see it bombing your NS over and over again - even if the response was NX.. You see this a lot in iot devices, and such that do not run a local cache.. But with any request be it got an IP or NX, the client shouldn't ask the NS again until the TTL the NS responded with expires.. But this not the case for many a client.. So yeah your NS logs can be filled with constant asks..

Queries for PTRs quite often have not been set by the owners, so yeah if something really wants the PTR for some IP, and it doesn't exist - if it keeps asking, unbound will keep answering - even if that answer is NX.

example of a PTR.. so this is NX

; <<>> DiG 9.16.45 <<>> -x 1.2.3.4 @192.168.9.253
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 64130
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;4.3.2.1.in-addr.arpa.          IN      PTR

;; AUTHORITY SECTION:
1.in-addr.arpa.         3600    IN      SOA     ns.apnic.net. read-txt-record-of-zone-first-dns-admin.apnic.net. 21462 7200 1800 604800 3600

;; Query time: 319 msec
;; SERVER: 192.168.9.253#53(192.168.9.253)
;; WHEN: Sat Dec 02 05:45:38 Central Standard Time 2023
;; MSG SIZE  rcvd: 137

If I ask again.. Sure unbound answered, but look at the query time = 1 ms, it answered from its cache. And you will notice the ttl counting down.. When you get an odd ball sort of ttl, and not specific setting be it 60, or 300, or 3600 or even 1 day, etc. This is telling you it was pulled from a cache, and not from an authoritative NS.. Be it your NS cache, or the case when you forward, where you forwarded cache.

; <<>> DiG 9.16.45 <<>> -x 1.2.3.4 @192.168.9.253
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 4794
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;4.3.2.1.in-addr.arpa.          IN      PTR

;; AUTHORITY SECTION:
1.in-addr.arpa.         3594    IN      SOA     ns.apnic.net. read-txt-record-of-zone-first-dns-admin.apnic.net. 21462 7200 1800 604800 3600

;; Query time: 1 msec
;; SERVER: 192.168.9.253#53(192.168.9.253)
;; WHEN: Sat Dec 02 05:45:44 Central Standard Time 2023
;; MSG SIZE  rcvd: 137

Jesper 1

@johnpoz

Thanks for your extensive reply.

Yes, so I have a lot of logs from reverse resolves done by the dns server itself. I'm not exactly sure where they are coming from. My guess is from some of the blocklists in pfBlocker. When I view the logs from them it looks like this (pasted from the DNS Reply section of pfBlocker). Some of them are with the 3600 TTL and some with a super long TTL

Dec 2 16:18:52 127.0.0.1
router.<mydomain> resolver PTR PTR 185.185.90.77.in-addr.arpa 17015... NXDOMAIN unk
Dec 2 16:16:36 127.0.0.1
router. resolver PTR PTR 185.185.90.77.in-addr.arpa 17015... NXDOMAIN unk
Dec 2 16:10:21 127.0.0.1
router.<mydomain> resolver PTR PTR 185.185.90.77.in-addr.arpa 3600 NXDOMAIN unk
Dec 2 16:10:21 127.0.0.1
router.<mydomain> resolver PTR PTR 185.185.90.77.in-addr.arpa 17015... NXDOMAIN unk
Dec 2 15:58:13 127.0.0.1
router.<mydomain> resolver PTR PTR 185.185.90.77.in-addr.arpa 17015... NXDOMAIN unk

But if I do like you did, and run a: "dig -x 185.185.90.77.in-addr.arpa @192.168.6.1" from an Ubuntu VM on the same network it works, and it caches the data: Same behavior here, some with 3600 TTL and some with a really long TTL

Dec 2 17:07:57 192.168.6.20
<ubuntu-client> cache PTR PTR arpa.in-addr.77.90.185.185.in-addr.arpa 3596 NXDOMAIN unk
Dec 2 17:07:56 127.0.0.1
router.<mydomain> resolver PTR PTR 71.185.90.77.in-addr.arpa 17015... NXDOMAIN unk
Dec 2 17:07:55 192.168.6.20
<ubuntu-client> cache PTR PTR arpa.in-addr.77.90.185.185.in-addr.arpa 3598 NXDOMAIN unk
Dec 2 17:07:53 192.168.6.20
<ubuntu-client> reply PTR PTR arpa.in-addr.77.90.185.185.in-addr.arpa 3600 NXDOMAIN unk

It seems like it works and does caching when I question manually, but for the ones from the dns resolver itself, it doesn't cache, it just keeps resolving all the time over and over again. Because of this I only get about 20% cache ratio.

johnpoz

@Jesper-1 said in Caching of NXDOMAIN:

router. resolver PTR PTR 185.185.90.77.in-addr.arpa 17015... NXDOMAIN unk
Dec 2 16:10:21 127.0.0.1
router.<mydomain> resolver PTR PTR 185.185.90.77.in-addr.arpa 3600 NXDOMAIN unk

So are you asking why some have 3600 and some have that 17015? Which is also cached..

;; AUTHORITY SECTION:
77.in-addr.arpa.        3600    IN      SOA     pri.authdns.ripe.net. dns.ripe.net. 1701527412 3600 600 864000 3600

that 17015... looks to be the Serial number and your just not seeing it all in whatever your looking at, see the ... there. As you can see from the full SOA I posted for that IP.. the min ttl is set at 3600 per the SOA.. So be it your min setting or pulling it from the SOA record.

but for the ones from the dns resolver itself, it doesn't cache

Not sure what your asking there.. and not really sure where your looking at those logs? A resolver is not going to ask for random IP PTRs - a resolver would only try to resolve after some client asked it.. It would hand the client that asked it the TTL.. be it an actual client, or some NS that is forwarding to your resolver.

The client not adhering to the TTL is up to the client..

Jesper 1

@johnpoz
What I'm asking is why nothing gets cached? All these replies are resolver replies. If I do a dig manually, it caches, but all these replies does not get cached. How do I make the dns to cache them?

I'm looking in the pfBlocker reports section, and also in the dns_reply.log

johnpoz

@Jesper-1 said in Caching of NXDOMAIN:

What I'm asking is why nothing gets cached?

where are you seeing its not cached? Because there was a log that something asked for something?

Unbound is going to cache the answer it gets.. You can adjust the min cache so it stores it longer, but its going to cache answers it gets to things it asked for.. You could set NX to cache for like 1 second if you wanted to via the option I posted above.. But that is not exposed in the gui.

You sure your unbound isn't just constantly restarting - dhcp registration can do that, and that will flush the cache.. If your only seeing like 20% hit ratio, that could maybe account for such a low hit rate..

You can look in your cache directly if you want via

unbound-control -c /var/unbound/unbound.conf dump_cache

I just very recently restarted unbound.. but a look at the stats_noreset

total.num.queries=1855
total.num.queries_ip_ratelimited=0
total.num.queries_cookie_valid=0
total.num.queries_cookie_client=0
total.num.queries_cookie_invalid=0
total.num.cachehits=1480

So from that, cache was hit 1480 out of 1855 or what 79 something percent.

What exactly is not getting cached.. if you query unbound for something.. and then you ask it again your saying the ttl is not going down.. That indicates it served you from cache, also as in above if your first query for something is like 100ms and then your next query is only 2 ms - that was served from cache and not resolved..

Jesper 1

@johnpoz

The server is not restarting (even if I restarted it manually recently to change settings).

I think it is actually the pfBlocker GUI that is showing wrong.

Because when I look in the GUI-Top Reply Type it says 22% cache and 75% from the resolver
When I compare that to the GUI-Top Reply DST IP it says 71% is NXDOMAIN

So I just assumed these were not cached because when I went into the Reports - DNS reply all these NXDOMAIN replies were from the resolver and not from the cache. Like I said before, if I did a dig command from a VM manually for the same IPs, those logs showed cache.

Though if I run the command:
unbound-control -c /var/unbound/unbound.conf stats_noreset | grep total.num

I actually get a very high cache rate. (79%)

total.num.queries=26750
total.num.queries_ip_ratelimited=0
total.num.queries_cookie_valid=0
total.num.queries_cookie_client=0
total.num.queries_cookie_invalid=0
total.num.cachehits=21199
total.num.cachemiss=5551
total.num.prefetch=3823
total.num.queries_timed_out=0
total.num.expired=2430

johnpoz

@Jesper-1 said in Caching of NXDOMAIN:

When I compare that to the GUI-Top Reply DST IP it says 71% is NXDOMAIN

A break down of what answers were found for what is asked has little to do with that answer was actually resolved or from cache.

You could have 0 or 100% cache hits. That really wouldn't have anything to do with they all had answers or all were NX.

The info there like you provided direct from unbound, is the info you would want to look at to know how much was answered from cache by unbound, and how much was not.

How to interpret what pfblocker might be saying I am not sure - I don't use pfblocker to block any dns, I use it to create aliases that I use in my rules. Sorry. Unbound is the resolver - to know your cache hit or miss rate, you should look to the stats directly from unbound.

Keep mind any sort of stats on NX can be skewed, depending even in your settings to response. For example I block some stuff directly in unbound to respond with NX. Even if said thing might resolve to something, unbound returns NX.