Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbound cache hit rate is anaemic

    Scheduled Pinned Locked Moved DHCP and DNS
    28 Posts 7 Posters 6.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • GertjanG
      Gertjan @stevogas
      last edited by

      @stevogas said in Unbound cache hit rate is anaemic:

      Something is causing Unbound to restart,

      pfBlocker could do so.
      As new DHCP leases (not the static ones, as they are static) but you took care of that.
      Other reasons exists : like (uplink) interfaces going bad. dpinger notices this and restarts all packages that are 'interface' related.
      unbound is one of them.
      See the main system log for restart of packages messages / reasons.

      @stevogas said in Unbound cache hit rate is anaemic:

      Oct 15 12:18:46 pfSense unbound: [38494:0] info: control cmd: stats_noreset

      This command is thrown at unbound to collect stats.
      It means (see manual) : "collect the stats and don't reset them to zero".

      @stevogas said in Unbound cache hit rate is anaemic:

      update unbound every 3 hours

      Even for files that have new versions ones a week ?
      Take note that 'liveupdate' works under special conditions : if they are not met, a classic restart is executed.

      No "help me" PM's please. Use the forum, the community will thank you.
      Edit : and where are the logs ??

      1 Reply Last reply Reply Quote 0
      • S
        stevogas
        last edited by

        Thanks for the suggestions. After 45 hours with no service stopped, about a 38% hit rate.

        uptime: 164536 seconds
        
        total.num.queries=53491
        total.num.queries_ip_ratelimited=0
        total.num.cachehits=20382
        total.num.cachemiss=33109
        total.num.prefetch=0
        total.num.expired=0
        total.num.recursivereplies=33109
        

        I'll have to follow some of @johnpoz clues to track down what's diluting my hit rates. I am going to add prefetching to see what effect that will have, and may consider a pihole setup.

        S 1 Reply Last reply Reply Quote 0
        • S
          stevogas @stevogas
          last edited by stevogas

          @stevogas said in Unbound cache hit rate is anaemic:

          I am going to add prefetching

          edit: Turned on prefetch, qname minimization (not strict), and cache-min-ttl to 1200. msg cache size to 20m. See how this does for a while.

          1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator
            last edited by

            @stevogas said in Unbound cache hit rate is anaemic:

            qname minimization (not strict)

            not strict might be ok, but can tell you for sure that strict will cause stuff not to resolve.. While it great idea in theory... Problem is so many domains are pretty messed up to be honest.. cname to cname to cname to cname, etc..

            Cache size is going to do nothing, your not hitting the default, so what is making it bigger going to do.. Only time you would even need to update that is if your seeing evictions from the chace.

            You could bump the min ttl to 3600.. Been running that for long time have never had any issues - I hate the trend of moving to 60, and 5 minute ttls - its just asinine!!!

            I would suggest you enable 0 ttl answering.. if what your wanting to do is answer stuff locally vs resolving it.. Happens then is last item is returned, and then record is looked up.

            Never had any issues with that either.

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.8, 24.11

            1 Reply Last reply Reply Quote 0
            • S
              stevogas
              last edited by stevogas

              Unbound has been up over 45 hours.

              uptime: 164767 seconds
              

              My cache rate has been hovering around mid 40%, looks like its settled in. That may just be the prefetch effect after my last tweeks. Still, a disappointing baseline.

              total.num.queries=43409
              total.num.queries_ip_ratelimited=0
              total.num.cachehits=19764
              total.num.cachemiss=23645
              total.num.prefetch=462
              total.num.expired=0
              total.num.recursivereplies=23645
              

              Most of the optimizations on Nlnet Labs have to do with speed, but I'm also interested in privacy and trying to keep resolving local as much as possible.

              1 Reply Last reply Reply Quote 0
              • johnpozJ
                johnpoz LAYER 8 Global Moderator
                last edited by

                Well your really going to need to look to what is being looked up, and you might want to adjust your min TTL, as stated before many a ttl this days is just so freaking low - for no reason I can see other than trying to up their query count.

                The only sane reason you would have a 5 minute or 60 second ttl, if you were in the middle of migration.. A round robin of 1 hour or 8 hours works just the same as a round robin with 5 minutes.. Your still spreading the load across those IPs.. And with many things pointing to a CDN which is behind load balancing anyway..

                The only reason I can see for such low ttls, is wanting to inflate the number of queries so you can charge more ;)

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 24.11 | Lab VMs 2.8, 24.11

                S 1 Reply Last reply Reply Quote 2
                • S
                  stevogas @johnpoz
                  last edited by

                  @johnpoz said in Unbound cache hit rate is anaemic:

                  min TTL

                  Bumped to 3600. From 1200.

                  1 Reply Last reply Reply Quote 0
                  • S
                    stevogas
                    last edited by

                    Up over 69 hrs. which is an accomplishment in itself.

                    uptime: 216706 seconds
                    

                    And 60.9% hitrate. It achieved that level after only a few hours and has been steady.

                    total.num.queries=58241
                    total.num.queries_ip_ratelimited=0
                    total.num.cachehits=35473
                    total.num.cachemiss=22768
                    total.num.prefetch=933
                    total.num.expired=0
                    total.num.recursivereplies=22768
                    

                    Thanks @johnpoz for the suggestions! We may be at the upper limit for this config.

                    1 Reply Last reply Reply Quote 0
                    • johnpozJ
                      johnpoz LAYER 8 Global Moderator
                      last edited by

                      @stevogas said in Unbound cache hit rate is anaemic:

                      total.num.expired=0

                      Turn on serve 0, see if that bumps your rate up some..

                      An intelligent man is sometimes forced to be drunk to spend time with his fools
                      If you get confused: Listen to the Music Play
                      Please don't Chat/PM me for help, unless mod related
                      SG-4860 24.11 | Lab VMs 2.8, 24.11

                      1 Reply Last reply Reply Quote 0
                      • C
                        chrcoluk
                        last edited by chrcoluk

                        OP, try enabling "serve expired" option in the DNS resolver advanced settings.

                        This is mine with the option enabled.

                        root@PFSENSE ~ # unbound-control -c /var/unbound/unbound.conf stats_noreset | egrep 'total.num|cache.count'
                        total.num.queries=218959
                        total.num.queries_ip_ratelimited=0
                        total.num.cachehits=216339
                        total.num.cachemiss=2620
                        total.num.prefetch=28326
                        total.num.expired=21661
                        total.num.recursivereplies=2620
                        msg.cache.count=2333
                        rrset.cache.count=2034
                        infra.cache.count=3
                        key.cache.count=0
                        

                        pfSense CE 2.7.2

                        1 Reply Last reply Reply Quote 0
                        • S
                          stevogas
                          last edited by

                          Thank you very much @johnpoz @chrcoluk , after 24hrs cachehits are up to over 88%!

                          total.num.queries=35176
                          total.num.queries_ip_ratelimited=0
                          total.num.cachehits=31218
                          total.num.cachemiss=3958
                          total.num.prefetch=7532
                          total.num.expired=7027
                          total.num.recursivereplies=3958
                          msg.cache.count=4324
                          rrset.cache.count=3322
                          infra.cache.count=5
                          key.cache.count=0
                          

                          @chrcoluk there is a big internet out there, don't keep going to the same sites everyday😀! (98%, wow).

                          1 Reply Last reply Reply Quote 0
                          • johnpozJ
                            johnpoz LAYER 8 Global Moderator
                            last edited by johnpoz

                            @stevogas said in Unbound cache hit rate is anaemic:

                            total.num.expired=7027

                            So you must be hitting sites less than once an hour even if have min ttl set to 3600, but you have that many hits on serving expired entries.

                            Doing stats on what is asked for, how often, what the normal ttls are, how often you query - from what your doing the queries from How many users you have.. It can become a obsessive past time ;)

                            While normal, old school practice was to never mess with ttls, and serving up expired could be problematic - depending on what is being asked for.. But to be honest - these companies have decided that a ttl of 60 or 300 seconds is fine... Which is BS if you ask me.. Unless what your looking for is user data.. Really how often is the IP of some fqdn going to change??? So what is the point of having the client have to look it up 60 seconds later?

                            Much of the net is served up by CDNs, much if it anycast to allow for global networks and use of the same IP, etc. You can not serve the public with 1 or 2 IPs assigned to a couple of servers - so your behind some load balancing system, with hundreds if not 1000s of servers serving the content from a pool, etc.

                            So you setup some IPs on the front end of your load balancing - how often would those change? So what is the point of a 60 or 300 second ttl? To lower the amount of queries - those ttls should be getting longer, not shorter.. But hey then we wont know how often somebody wants to go to some fqdn.. And hey company X pays for dns by number of queries they are getting.. So why not have them query way more than they need too ;)

                            I have not run into any issues that I am aware of by setting min ttl, or serving expired. Here is the thing - even when unbound serves up expired.. the ttl on that is 0.. So while your client will try and go to that IP.. If for whatever reason it fails (ip changed) and you try and go there again, unbound in the background has looked up that record again - and would now have the new IP if it changed, etc.

                            DNS is a fascinating protocol, and the deeper you get into it - the more interesting it becomes.

                            Have fun!

                            An intelligent man is sometimes forced to be drunk to spend time with his fools
                            If you get confused: Listen to the Music Play
                            Please don't Chat/PM me for help, unless mod related
                            SG-4860 24.11 | Lab VMs 2.8, 24.11

                            1 Reply Last reply Reply Quote 0
                            • T
                              tomashk
                              last edited by

                              And if somebody is afraid about serving expired you can always put the following into custom options:

                              server:
                              serve-expired-ttl: 86400
                              
                              C 1 Reply Last reply Reply Quote 0
                              • S
                                stevogas
                                last edited by

                                Over 38 hours and hitting 91%.

                                total.num.queries=65336
                                total.num.queries_ip_ratelimited=0
                                total.num.cachehits=59497
                                total.num.cachemiss=5839
                                total.num.prefetch=14832
                                total.num.expired=13774
                                total.num.recursivereplies=5839
                                msg.cache.count=6355
                                rrset.cache.count=4728
                                infra.cache.count=5
                                key.cache.count=0
                                
                                1 Reply Last reply Reply Quote 0
                                • C
                                  chrcoluk @tomashk
                                  last edited by chrcoluk

                                  @tomashk said in Unbound cache hit rate is anaemic:

                                  And if somebody is afraid about serving expired you can always put the following into custom options:

                                  server:
                                  serve-expired-ttl: 86400
                                  

                                  Yep, it can be fine tuned now.

                                  I will present two scenarios, both will be rare in practice, but possible.

                                  Scenario - A content provider changes ip for its services, old ip is dead.

                                  You the user has boosted ttl to 3600 via minttl, and the change happened after 20 mins of the ttl so about another 40 minutes, you hit refresh when the page fails to load and it will be broken for another 40 minutes or until dns cache is flushed.

                                  However if you used serve expired, you would get an initial page load failure, but on refresh it will work as in the background the cache got refreshed.

                                  So whilst both dont comply with the ttl records, serve expired is much more elegant.

                                  In practice this will be rare, and you will just get much higher cache hit rates.

                                  I do agree with John that modern extremely low ttl dns practice is stupid, I am pretty sure that breaks the original RFC as well, it used to be to only set that low if you moving content to another ip, but then increase again after. This is why I pushed for serve expired to be added to the pfsense UI.

                                  pfSense CE 2.7.2

                                  1 Reply Last reply Reply Quote 0
                                  • johnpozJ
                                    johnpoz LAYER 8 Global Moderator
                                    last edited by

                                    @chrcoluk said in Unbound cache hit rate is anaemic:

                                    it used to be to only set that low if you moving content to another ip

                                    Yup use to do that all the time back in the day... As you got closer and closer to zero hour for the switch you would lower the in steps.. Lets say you had a ttl of 24 hours... Couple of days before you might change it to 12.. hours, then 6 later and then 3 later, until you were down to say 1 minute.. for a short time before zero hour.. You would then change your IP.. And within 1 minute everyone should be using the new IP.. Then you could ramp it back up.. Again in steps - just in case you find out something not working and you need to change it.. You wouldn't want some clients having grabbed your 24 hour ttl, etc..

                                    You would ramp down vs just jumping down because that could cause a huge spike on your dns traffic, if you moved it down slowly it would prevent a spike in your dns queries so you were sure you could handle the number of queries with the shorter ttl, etc.

                                    An intelligent man is sometimes forced to be drunk to spend time with his fools
                                    If you get confused: Listen to the Music Play
                                    Please don't Chat/PM me for help, unless mod related
                                    SG-4860 24.11 | Lab VMs 2.8, 24.11

                                    1 Reply Last reply Reply Quote 0
                                    • C carpet referenced this topic on
                                    • C carpet referenced this topic on
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.