Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Serve Expired - Clearification :)

    Scheduled Pinned Locked Moved DHCP and DNS
    5 Posts 2 Posters 3.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T
      Taz79
      last edited by Taz79

      Hello!

      I have some questions about the feature "Serve Expired". It might be basic DNS knowledge though. I have tried googling about it without finding much about how its handled exactly..

      dd4e7b07-be5d-4d18-83f6-ca0fc02279a5-image.png

      After 2 days my DNS statistics looks like this:

      unbound-control -c /var/unbound/unbound.conf stats_noreset | grep total.num
      total.num.queries=131025
      total.num.queries_ip_ratelimited=0
      total.num.cachehits=127505
      total.num.cachemiss=3520
      total.num.prefetch=15811
      total.num.zero_ttl=15027
      total.num.recursivereplies=3520
      

      With Cache count:

      unbound-control -c /var/unbound/unbound.conf stats_noreset |grep cache.count
      msg.cache.count=4722
      rrset.cache.count=10998
      infra.cache.count=3620
      key.cache.count=680
      

      So what i could gather from this is that more than 10% of the queries ends up using a DNS entry wich has TTL=0. I seem to have very little cachemiss hits.. Only 2,7%

      So my questions is:
      How does Serve Expire work? Will the records stay in the DNS cache with TTL=0 forever until it gets a hit again? Or will the TTL=0 entries be purged by some setting eventually?

      This is what i have found in the documentation from Unbound regarding the different statistic topics:

      num.queries
      number of queries received by thread

      num.cachehits
      number of queries that were successfully answered using a cache
      lookup

      num.cachemiss
      number of queries that needed recursive processing

      num.prefetch
      number of cache prefetches performed. This number is included
      in cachehits, as the original query had the unprefetched answer
      from cache, and resulted in recursive processing, taking a slot
      in the requestlist. Not part of the recursivereplies (or the
      histogram thereof) or cachemiss, as a cache response was sent.

      num.zero_ttl
      number of replies with ttl zero, because they served an expired
      cache entry.

      num.recursivereplies
      The number of replies sent to queries that needed recursive pro-
      cessing. Could be smaller than threadX.num.cachemiss if due to
      timeouts no replies were sent for some queries.

      msg.cache.count
      The number of items (DNS replies) in the message cache.

      rrset.cache.count
      The number of RRsets in the rrset cache. This includes rrsets
      used by the messages in the message cache, but also delegation
      information.

      infra.cache.count
      The number of items in the infra cache. These are IP addresses
      with their timing and protocol support information.

      key.cache.count
      The number of items in the key cache. These are DNSSEC keys,
      one item per delegation point, and their validation status.

      1 Reply Last reply Reply Quote 0
      • T
        Taz79
        last edited by

        Been monitoring this since yesterday and i cannot see that the cache.count is declining at all. So it seems all the TTL=0 records stays in the cache?

        [2.4.4-RELEASE][admin@Fenix.localdomain]/root: unbound-control -c /var/unbound/unbound.conf stats_noreset | egrep 'total.num|cache.count'

        15/4-2019 10:30
        total.num.queries=138229
        total.num.queries_ip_ratelimited=0
        total.num.cachehits=134153
        total.num.cachemiss=4076
        total.num.prefetch=17233
        total.num.zero_ttl=16396
        total.num.recursivereplies=4076
        msg.cache.count=5893
        rrset.cache.count=13071
        infra.cache.count=4319
        key.cache.count=884
        
        15/4-2019  23:11
        total.num.queries=178540
        total.num.queries_ip_ratelimited=0
        total.num.cachehits=173816
        total.num.cachemiss=4724
        total.num.prefetch=23519
        total.num.zero_ttl=22422
        total.num.recursivereplies=4724
        msg.cache.count=6518
        rrset.cache.count=13949
        infra.cache.count=4848
        key.cache.count=957
        
        16/4-2019 08:11
        total.num.queries=203688
        total.num.queries_ip_ratelimited=0
        total.num.cachehits=198712
        total.num.cachemiss=4976
        total.num.prefetch=25949
        total.num.zero_ttl=24683
        total.num.recursivereplies=4976
        msg.cache.count=6774
        rrset.cache.count=14133
        infra.cache.count=5119
        key.cache.count=961
        
        1 Reply Last reply Reply Quote 0
        • T
          Taz79
          last edited by

          I found some more configuration entries for serve-expired.. So this parameters explains it all. The TTL 0 entries will stay in cache if these entries are not used. That is what i was looking for.. :) Case closed! :)

             serve-expired-ttl: <seconds>
                    Limit serving of expired responses to configured seconds after
                    expiration. 0 disables the limit. This option only applies when
                    serve-expired is enabled. The default is 0.
          
             serve-expired-ttl-reset: <yes or no>
                    Set the TTL of expired records to the serve-expired-ttl value
                    after a failed attempt to retrieve the record from upstream.
                    This makes sure that the expired records will be served as long
                    as there are queries for it. Default is "no".
          
          1 Reply Last reply Reply Quote 0
          • C
            chrcoluk
            last edited by

            I am the source of the feature been added to pfsense.

            So basically.

            The reaosn it was added is in the modern itnernet many mainstream services use DNS to route their traffic, and because of things like maintenance, DDOS attacks and so forth, they use extremely low TTL values, so they can reroute very quickly if required.

            TTL values of 30 seconds or less is now fairly common.

            As you can imagine, having to do a new DNS lookup so often has a performance hit.

            The issue with the prefetch feature is it only works if you do a DNS lookup when less than 10% of the TTL is left, so basically with a 30 secs TTL, if you dont do another lookup within the last 3 seconds of the TTL, then prefetch isnt providing you any benefit. Its operating scope is too narrow.

            So unbound implemented serve expired, what it does is when a record is expired, it will stay in the cache with the TTL value as 0, if another lookup comes in from the LAN (or to whatever networks your unbound is serving), then it will be served as a cached record for performance. However at the same time a new lookup is initiated from unbound to the authoritative server, so when there is a newer lookup later, it will server a newer record.

            So its important to note the same expired record isnt served forever, its only served once, then a new one is fetched.

            Newer versions of unbound allow this to be tweaked further and the good news is in the latest stable build of pfsense, we have the newer version (it was updated for security), I am considering getting another commit done to take advantage of it, as there is now an option as well that if e.g. you are uncomfortable perhaps using a cached record that might have been sitting there for a day you can set an effective expiry on the cached record itself using the more granular controls now available, I will see if i can get that field added to the UI as well.

            pfSense CE 2.7.2

            1 Reply Last reply Reply Quote 1
            • C
              chrcoluk
              last edited by

              Since I cannot edit (I cannot fix the typos sorry).

              But also to clarify, there is a reason this is off by default as you can imagine it is down to the admin if they are ok with records been served from a cache after they expired upstream :)

              The description in pfsense I tried to make as understanding as possible whilst as short as possible so it wasnt bloating the interface.

              pfSense CE 2.7.2

              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.