DNS Resolver Infrastructure Cache Speed



  • Who knows how to read all the statistics on the Status > DNS Resolver page, is there some documentation on this? (https://pfsense.url/status_unbound.php)
    I'm having a DNS issue and now I'm wondering whether that page can help me or not.


  • Rebel Alliance Global Moderator

    Are you seeing timeouts?

    The infra cache is just information to let unbound know which NS it should talk too for a specific domain or tld.. It keeps a record of every NS it talks too how fast it responds, etc. etc..

    What is the problem your having with dns?

    While the info in you the infra cache can be quite useful in troubleshooting a specific domain.. In general no it not the first place to look to when you have a dns issue. For starters.. is it nothing works, only specific domains or a specific fqdn?

    We can for sure help you figure out what your problem is - but need some details.



  • I'm experiencing a strange issue where the users are presented with an unresolvable address sometimes. This can happen on any name, at any time and will recover a minute or so later.
    The browser will present the DNS servers failure with an 'unresolvable error' on something silly like 'microsoft.com' or so. Wait a few moments and try again and it'll work. When the issue occurs, you can query from command line (or in pfsense itself) with nslookup and you'll get a correct answer immediately. No idea how to tackle or analyze this issue.



  • Hi,

    Play a little bit with a tool like https://www.grc.com/dns/benchmark.htm - see of something pops up.

    Just a wild guess : even if you use the Resolver, upstream, (typically your ISP) could intercept DNS requests "and do their thing". So, play a little bit with another 'sure' source like the DNS-TLS which will take your your ISP out of the equation.


  • Rebel Alliance Global Moderator

    So in such a case where you have random issues with resolving then sure the infra cache could give you some insight... When you have a problem look to the domain that had a problem... Lets say microsoft.com and look to the NS for those..

    You can get better info vs looking at the whole cache by just asking unbound how it would look that up.. And what NS it talked to, etc.. For example lets use www.microsoft.com

    [2.4.4-RELEASE][root@sg4860.local.lan]/root: unbound-control -c /var/unbound/unbound.conf lookup www.microsoft.com
    The following name servers are used for lookup of www.microsoft.com.
    ;rrset 1955 4 0 2 0
    microsoft.com.  88355   IN      NS      ns3.msft.net.
    microsoft.com.  88355   IN      NS      ns1.msft.net.
    microsoft.com.  88355   IN      NS      ns2.msft.net.
    microsoft.com.  88355   IN      NS      ns4.msft.net.
    ;rrset 56823 1 0 8 0
    ns4.msft.net.   143223  IN      A       208.76.45.53
    ;rrset 56823 1 0 8 0
    ns4.msft.net.   143223  IN      AAAA    2620:0:37::53
    ;rrset 56823 1 0 8 0
    ns2.msft.net.   143223  IN      A       208.84.2.53
    ;rrset 56823 1 0 8 0
    ns2.msft.net.   143223  IN      AAAA    2620:0:32::53
    ;rrset 8342 1 0 1 0
    ns1.msft.net.   94742   IN      A       208.84.0.53
    ;rrset 8342 1 0 1 0
    ns1.msft.net.   94742   IN      AAAA    2620:0:30::53
    ;rrset 8342 1 0 1 0
    ns3.msft.net.   94742   IN      A       193.221.113.53
    ;rrset 8342 1 0 1 0
    ns3.msft.net.   94742   IN      AAAA    2620:0:34::53
    Delegation with 4 names, of which 0 can be examined to query further addresses.
    It provides 8 IP addresses.
    2620:0:34::53           expired, rto 146608736 msec, tA 0 tAAAA 0 tother 0.
    193.221.113.53          expired, rto 146608736 msec, tA 0 tAAAA 0 tother 0.
    2620:0:30::53           expired, rto 146608736 msec, tA 0 tAAAA 0 tother 0.
    208.84.0.53             expired, rto 146608736 msec, tA 0 tAAAA 0 tother 0.
    2620:0:32::53           expired, rto 146608736 msec, tA 0 tAAAA 0 tother 0.
    208.84.2.53             expired, rto 146608736 msec, tA 0 tAAAA 0 tother 0.
    2620:0:37::53           expired, rto 146608736 msec, tA 0 tAAAA 0 tother 0.
    208.76.45.53            expired, rto 146608736 msec, tA 0 tAAAA 0 tother 0.
    [2.4.4-RELEASE][root@sg4860.local.lan]/root: 
    

    You sure unbound is not just restarting a lot - and sometimes you ask it when its not really on... Check the log to see if you see it restarting a lot.. dhcp registrations can do that.

    Another thing take a look at your cache hits with the stats_noreset command

    unbound-control -c /var/unbound/unbound.conf stats_noreset

    take a look at

    total.num.queries=153681
    total.num.queries_ip_ratelimited=0
    total.num.cachehits=134493
    total.num.cachemiss=19188
    total.num.prefetch=93624
    total.num.zero_ttl=100855
    

    What is your cache hit %? If its really LOW.. Try setting zero_ttl and prefetch to raise your cache hit %.. When you your cache hit is low then your having to resolve when someone asks for something. Depending on where that domain authoritative NS is, or how well it performs, etc. Maybe you have a slight delay on look ups.. Maybe as mentioned your isp does dick with your queries? Maybe your just on a high latency line like a sat connection or something - in such cases then resolving might not be the best solution.

    turning prefetch on will allow unbound to refresh its ttl of something when someone asks for it and the remaining ttl is 10% or less of the full ttl. This way your ttl should never expire and have to do resolving of said record in full time.

    Also - allowing for zero_ttl means that if client 1 asks for something and the ttl has actually hit 0 they will get an answer, and will update the record with resolving in the back ground, etc..

    Notice my cache hit rate is 87.5% which means 87 out of a 100 times something is asked for - its just served up from unbounds cache vs having to resolve it on the fly for that specific query..

    What do you show for ping times in the status page for the NS... Are many of them really high ping times? As I already asked are you seeing any timeouts, etc. in that status page..

    Here this should be helpful in understanding the details given and how unbound determines which ns to ask most often, etc.
    https://nlnetlabs.nl/documentation/unbound/info-timeout/

    And yeah troubleshooting odd dns problems can be a bit more involved then just looking up what is causing the error code in your log.. ;) And prob going to be a bit of a learning curve for you as well..

    Also no offsense to Gertjan but that tool linked too is not much more than shiny toy for users to play with.. It checks the caches of forwarders not much more than that.. Gives you how fast forwarder X answers vs forwarder Y.. Not really anything to do with running your own resolver.. Might be something billy might use if trying to determine if better to use 8.8.8.8 vs his isp dns.. Other than that is pretty useless. If your problem is your isp is dicking with your dns queries - then sure you could put your queries inside a tunnel.. Or use a vpn to do your queries through... But to be honest if your ISP is dicking with you resolving - prob best to look for a new isp ;) Now if they want to "pretend" they are doing you favors with what they serve up from their own dns.. No biggy you don't have to use them - your resolving! not forwarding.



  • @johnpoz said in DNS Resolver Infrastructure Cache Speed:

    Also no offsense to Gertjan but that tool linked too is not much more than shiny toy for users to play with..

    No offense take. I used the word "play" ;)
    And, indirectly, suggesting another approach to the question :
    How many times per minute/hour/day does unbound it restart ?
    ( DNS-unbound has a log - so checking is easy )
    Are big DNS consumer packages installed ? (== startup delay get longer ...)
    Every time unbound starts, it goes off line for xxx ms, or even seconds, maybe more. The issue could be explained like that - it has been seen before.

    For example : DHCP Lease time 7200 seconds - a small hundred devices. Option "Register DHCP leases in the DNS Resolver" checked. Add some tools like pfBlockerNG (loaded to the attick) an se what happens : no more DNS .... unbound does only one thing, starting up, to stop to starting up ....
    This is a real example taken from this forum.
    (Yeah, right : a DHCP lease of time 7200 secs .... don't ask me ...)



  • Thanks for the advice on how to analyze the issue.
    It might be a difficult one because the issue has always almost instantly gone away when I wanted to diagnose it. The reference to the DHCP registration is a good one, I'll take a look at the logging to see if that correlates.



  • @aslatius said in DNS Resolver Infrastructure Cache Speed:

    The reference to the DHCP registration

    It works like this : every time a DHCP lease comes in, unbound is restarted.