Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Slow DNS after 22.05

    Scheduled Pinned Locked Moved DHCP and DNS
    270 Posts 31 Posters 157.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • johnpozJ
      johnpoz LAYER 8 Global Moderator @kvhs
      last edited by johnpoz

      @kvhs said in Slow DNS after 22.05:

      preferred test to demonstrate the problem, given its intermittent nature?

      How about making sure unbound isn't restarting every 300 seconds for starters.. This is only going cause trouble trying to actually find the issue.

      Simple stats output could very enlightening to what might be going on as far as problems.

      [22.05-RELEASE][admin@sg4860.local.lan]/root: unbound-control -c /var/unbound/unbound.conf stats_noreset | grep total.
      total.num.queries=66013
      total.num.queries_ip_ratelimited=0
      total.num.cachehits=52460
      total.num.cachemiss=13553
      total.num.prefetch=27990
      total.num.expired=24624
      total.num.recursivereplies=13553
      total.requestlist.avg=0.319452
      total.requestlist.max=30
      total.requestlist.overwritten=0
      total.requestlist.exceeded=0
      total.requestlist.current.all=0
      total.requestlist.current.user=0
      total.recursion.time.avg=0.086462
      total.recursion.time.median=0.0408701
      total.tcpusage=0
      [22.05-RELEASE][admin@sg4860.local.lan]/root: 
      

      You can see average recursion time, median etc..

      There are lots of things that might present themselves just looking the stats..

      But your saying something is causing NX in your browser - ok do that specific query.. www.facebook.com shouldn't come back NX.. but why did your browser say that? Did you actually look for www.facebook.com or was it something else..

      Lets see a dig +trace so we could see if your having a connection issue to something in the resolve path, but again a connection issue wouldn't cause a NX.. A NX is a specific response to what you asked for and some NS saying sorry that does not exist - be it root for the .tld, be a gltd server for the domain, or the authoritative NS for the domain telling you that record does not exist, etc.

      I don't buy you were told www.facebook.com was NX.. maybe it was www.facbook.c0m or some other typo, etc. If www.facebook.com came back as NX, lets see the query showing that.. etc..

      Its hard to get to the bottom of what is going on when users just say me too, or having a dns problem since went to 22.05 with zero information on what they are doing or trying to do or what the specific failure actually is - like I said for all we know their browser is using doh, or maybe they are trying to route through a vpn, and that vpn is going down, or maybe something they are trying to look up is blocking their vpn connection, etc. etc..

      There are loads of things that could be going on..

      The only thing I can say for sure - is I have seen zero issues with dns going from 22.01 to 22.05 - zero!! So if someone is having an issue, we need to info to figure it out - it sure is not something specific wrong in unbound that is generic in nature, or then everyone would be seeing the issue and the board would be a flame with posts complaining that dns broke on 22.05.. When clearly that is not the case.

      An intelligent man is sometimes forced to be drunk to spend time with his fools
      If you get confused: Listen to the Music Play
      Please don't Chat/PM me for help, unless mod related
      SG-4860 24.11 | Lab VMs 2.8, 24.11

      M 1 Reply Last reply Reply Quote 0
      • T
        tentpiglet @johnpoz
        last edited by

        @johnpoz said in Slow DNS after 22.05:

        Maybe you would doing a query for www.facebook.com.org or www.facebook.com.somethingelse.tld

        Yeah... no, it wasn't that.

        johnpozJ 1 Reply Last reply Reply Quote 0
        • T
          tentpiglet @Mikymike82
          last edited by

          @mikymike82 can you reiterate what your solution was, I've scrolled back but there's a lot of fluff in here so I can't seem to find it.

          1 Reply Last reply Reply Quote 0
          • M
            Mikymike82 @johnpoz
            last edited by

            @johnpoz For answering your question (from my experience), its not just "facebook.com", its everything, from apps to normal websites, random not resolving websites and apps not working. So not a specific client, website, browser etc..., mobile, desktop, laptops, narrowcasting etc.. everything thats trying to resolve an adress.
            Again my "solution" seems to resolve the problem at hand... but not "normal" behaviour in my opinion.

            1 Reply Last reply Reply Quote 0
            • johnpozJ
              johnpoz LAYER 8 Global Moderator @tentpiglet
              last edited by

              @tentpiglet well where did it fail, www.facebook.com is a cname

              ;; ANSWER SECTION:
              www.facebook.com.       30      IN      CNAME   star-mini.c10r.facebook.com.
              star-mini.c10r.facebook.com. 30 IN      A       157.240.18.35
              

              With a 30 second TTL, etc. where did you go ask after, are you doing qname forced strict? A NX is a specific response from a NS.. Its not a timeout or a servfail - its a specific response saying hey what your asking for doesn't exist..

              even a typo of 4 wwww returns an answer not a NX

              ;; QUESTION SECTION:
              ;wwww.facebook.com.             IN      A
              
              ;; ANSWER SECTION:
              wwww.facebook.com.      3600    IN      CNAME   star.facebook.com.
              star.facebook.com.      3600    IN      CNAME   star.c10r.facebook.com.
              star.c10r.facebook.com. 3600    IN      A       157.240.18.15
              

              If I ask for some gibberish, the I get back NX, from the AUTHORITATIVE SOA for that domain..

              ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 33101
              ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
              
              ;; OPT PSEUDOSECTION:
              ; EDNS: version: 0, flags:; udp: 4096
              ;; QUESTION SECTION:
              ;lsjfdsldjf.facebook.com.       IN      A
              
              ;; AUTHORITY SECTION:
              facebook.com.           3600    IN      SOA     a.ns.facebook.com. dns.facebook.com. 220198737 14400 1800 604800 300
              
              

              So to troubleshoot a NX you are getting we need to know the specifics.. Its not a opps unbound is not running currently, or I trying to ask 1.2.3.4 but they are not answering..

              An intelligent man is sometimes forced to be drunk to spend time with his fools
              If you get confused: Listen to the Music Play
              Please don't Chat/PM me for help, unless mod related
              SG-4860 24.11 | Lab VMs 2.8, 24.11

              Cool_CoronaC 1 Reply Last reply Reply Quote 0
              • Cool_CoronaC
                Cool_Corona @johnpoz
                last edited by

                @johnpoz

                total.num.queries=77
                total.num.queries_ip_ratelimited=0
                total.num.cachehits=22
                total.num.cachemiss=55
                total.num.prefetch=0
                total.num.expired=0
                total.num.recursivereplies=55
                total.requestlist.avg=0.272727
                total.requestlist.max=4
                total.requestlist.overwritten=0
                total.requestlist.exceeded=0
                total.requestlist.current.all=0
                total.requestlist.current.user=0
                total.recursion.time.avg=0.127636
                total.recursion.time.median=0.0505173
                total.tcpusage=0

                What do you make of this. Unbound has restarted again...

                johnpozJ 1 Reply Last reply Reply Quote 0
                • johnpozJ
                  johnpoz LAYER 8 Global Moderator @Cool_Corona
                  last edited by

                  @cool_corona said in Slow DNS after 22.05:

                  Unbound has restarted again...

                  yup - its going to be horrible as a caching resolver if it restarts ever few seconds..

                  An intelligent man is sometimes forced to be drunk to spend time with his fools
                  If you get confused: Listen to the Music Play
                  Please don't Chat/PM me for help, unless mod related
                  SG-4860 24.11 | Lab VMs 2.8, 24.11

                  1 Reply Last reply Reply Quote 0
                  • K
                    Kempain @Mikymike82
                    last edited by

                    @mikymike82 said in Slow DNS after 22.05:

                    @Kempain @tentpiglet ; have you tried my suggestion for sesolving the issue (although maybe a temporary resolution), as stated i im running 1,5 week without problems at this moment

                    I'm actually not keen on forwarding my DNS unnecessarily but I can understand how it would resolve the issue since it would then bypass unbound.

                    I was planning on doing a fresh install until I heard you're still experiencing issues after doing that.
                    Still might to rule it out for me too though.

                    I've been monitoring unbound and it's been going strong for 13371 seconds now!
                    Memory usage of unbound doesn't seem to be particularly high although it is increasing slowly which is probably to be expected.

                    Just set logging to level 5 to try and capture more info although not sure the differences between 4/5 will help as it seems to be related identifying which client is having an issue and I know I'm experiencing it across device types.
                    This has meant that unbound has been restarted so will see how it goes...

                    M 1 Reply Last reply Reply Quote 1
                    • K
                      Kempain
                      last edited by

                      unbound-control -c /var/unbound/unbound.conf stats_noreset | grep total
                      total.num.queries=159
                      total.num.queries_ip_ratelimited=0
                      total.num.cachehits=61
                      total.num.cachemiss=98
                      total.num.prefetch=0
                      total.num.expired=0
                      total.num.recursivereplies=98
                      total.requestlist.avg=1.11224
                      total.requestlist.max=31
                      total.requestlist.overwritten=0
                      total.requestlist.exceeded=0
                      total.requestlist.current.all=0
                      total.requestlist.current.user=0
                      total.recursion.time.avg=1.462659
                      total.recursion.time.median=1.03385
                      total.tcpusage=0
                      
                      

                      Unbound has only just restarted so take those with a pinch of salt.

                      Interesting @tentpiglet mentioned 'DNS_PROBE_FINISHED_NXDOMAIN' errors in the browser because I've also been experiencing those at the same time as having DNS issues so I believe they are in some way related.

                      johnpozJ 1 Reply Last reply Reply Quote 1
                      • M
                        Mikymike82 @Kempain
                        last edited by

                        @kempain Same here, but in my production environment i can only troubleshoot so much.... im very curious if you can find anything else, rather then my workaround.

                        K 1 Reply Last reply Reply Quote 1
                        • K
                          Kempain @Mikymike82
                          last edited by Kempain

                          @mikymike82

                          Mine is at home fortunately not in a corporate environment although I do have the wife's complaints to contend with 😅

                          I have the image on USB ready to go with my backup in conf so I should be able to get back up and running pretty quickly once I finally decide to bite the bullet and do a re-install.
                          Just a bit wary of blowing out my settings because I'm using HAProxy and bunch of certs for internal services.
                          Relatively new to pfSense so don't want to F it up and spend all night fixing it.

                          1 Reply Last reply Reply Quote 1
                          • K
                            Kempain
                            last edited by

                            Could it be an issue with cache?

                            4972725c-10bd-4085-af99-e1ecb74987b0-image.png

                            1 Reply Last reply Reply Quote 0
                            • K
                              Kempain
                              last edited by

                              Doing more nslookups from client during the issue I noticed a few things that seem pretty consistent.

                              My initial request/s to pfSense seem to timeout despite my client knowing the IP of pfSense.

                              If I keep placing requests, eventually I get a response, and usually only to IPv6 first.
                              Then in the next response both IPv6 and IPv4 after more timeouts.

                              After it does fully resolve, subsequent requests seem ok for a while.
                              It seems like it takes a few tries to resolve some un-cached addresses sometimes.

                              Unbound is not restarting at this time as I can see it's been running for a while now.

                              nslookup youtu.be
                              Server:  pfsense.localdomain
                              Address:  10.x.x.x
                              
                              DNS request timed out.
                                  timeout was 2 seconds.
                              DNS request timed out.
                                  timeout was 2 seconds.
                              *** Request to pfsense.localdomain timed-out
                              
                              nslookup youtu.be
                              Server:  pfsense.localdomain
                              Address:  10.x.x.x
                              
                              DNS request timed out.
                                  timeout was 2 seconds.
                              DNS request timed out.
                                  timeout was 2 seconds.
                              DNS request timed out.
                                  timeout was 2 seconds.
                              Name:    youtu.be
                              Address:  2a00:1450:4009:81e::200e
                              
                              nslookup youtu.be
                              Server:  pfsense.localdomain
                              Address:  10.x.x.x
                              
                              DNS request timed out.
                                  timeout was 2 seconds.
                              DNS request timed out.
                                  timeout was 2 seconds.
                              DNS request timed out.
                                  timeout was 2 seconds.
                              Name:    youtu.be
                              Address:  2a00:1450:4009:81e::200e
                                        142.250.180.14
                              
                              version: 1.15.0
                              verbosity: 5
                              threads: 4
                              modules: 2 [ validator iterator ]
                              uptime: 19646 seconds
                              options: control(ssl)
                              unbound (pid 96286) is running...
                              
                              
                              total.num.queries=10304
                              total.num.queries_ip_ratelimited=0
                              total.num.cachehits=2806
                              total.num.cachemiss=7498
                              total.num.prefetch=0
                              total.num.expired=0
                              total.num.recursivereplies=7497
                              total.requestlist.avg=3.25433
                              total.requestlist.max=39
                              total.requestlist.overwritten=0
                              total.requestlist.exceeded=0
                              total.requestlist.current.all=3
                              total.requestlist.current.user=1
                              total.recursion.time.avg=9.417250
                              total.recursion.time.median=0.423254
                              total.tcpusage=0
                              
                              
                              GertjanG 1 Reply Last reply Reply Quote 0
                              • GertjanG
                                Gertjan @Kempain
                                last edited by

                                @kempain
                                b3d85d64-b963-4a86-b712-6a8f2936f59c-image.png

                                IMHO, this says to me :
                                nslookup tries to contact a fist DNS server, after the time out, it decided it can't.
                                A next DNS server is tried. It can't neither.
                                A third one is tried (pfsense ?) and this time there is an answer.

                                What do you have here ( Dashboard System information ) :

                                c273b375-e8fd-4928-9b50-e20d9ddf8a76-image.png

                                ?

                                ccadcb29-260a-4c71-b077-ec3020d92940-image.png

                                You have unbound running with maximum log details ?
                                Ok to debug, but think about putting that back to default as soon as possible.
                                Max logd details will overflow the (small) max log file size, so it will get rotated often == even more system and disk resources used.

                                Run on the command line

                                grep 'start' /var/log/resolver.log
                                

                                and try the settings I showed above, under Services > DNS Resolver > General Settings, remove the check from :
                                DHCP Registration
                                OpenVPN Clients

                                These two should be unchecked if you use pfBlockerng-devel anyway.

                                Wait a day or so and run the command again.

                                unbound will also restart on interface events, like a WAN that changes his IP. Or some other interface goes down and up. These events can be seen in the main system log.

                                @tentpiglet said in Slow DNS after 22.05:

                                My wife was also reporting random disconnects from an online game she was playing,

                                This might be a red flag.
                                Game playing involves no DNS interfaction.
                                Its here PC/device against the male server. If this connection gets interrupted, then the issue is : you have a bad connection.
                                It could be local, like : the wifi is plain bad. Easy to test : that issue goes away as soon as you roll out a cable.
                                Or worse, your ISP uplink isn't as good as you think it is.
                                A bad uplink would also explain unreachable remote DNS servers.

                                No "help me" PM's please. Use the forum, the community will thank you.
                                Edit : and where are the logs ??

                                K 3 Replies Last reply Reply Quote 1
                                • K
                                  Kempain @Gertjan
                                  last edited by Kempain

                                  @gertjan said in Slow DNS after 22.05:

                                  What do you have here ( Dashboard System information ) :

                                  Thanks for the reply @Gertjan

                                  I have pfSense itself (127.0.0.1) and 2 remote name servers from quad9 listed, although my understanding is that those aren't actually used anyway because I have it set to use local ignore remote. Interestingly I don't have the IPv6 listing that you have but I may have disabled something there?

                                  5fecc523-6a18-4d0a-a964-4a0bd5e755d6-image.png

                                  b6a08085-faf2-4ab4-8e35-05717cee237a-image.png

                                  I don't see any unbound restarts (start of service messages) when checking the logs:
                                  1742478a-0823-4e25-9d19-e956f2b8fefb-image.png

                                  Unbound has been up for about 15 hours now:
                                  ebecc122-7012-4e17-96d3-2072649edda7-image.png

                                  I've removed the external DNS now just to rule that out so it should just be using local now.

                                  From what you said, it sounds like it's failing on all DNS servers if it's trying one after the other, as you can see there are occasions it times out twice and also three times. Not sure why it would bomb out after 2 attempts if it is running through the list of DNS servers though because there are 3 and the 2 remote are supposed to be ignored anyway from my understanding.

                                  What I find strange is it seems to be having issues contacting pfSense itself like pfSense isn't responding at all (or DNS on pfSense at least).

                                  GertjanG 1 Reply Last reply Reply Quote 0
                                  • K
                                    Kempain @Gertjan
                                    last edited by

                                    @gertjan said in Slow DNS after 22.05:

                                    DHCP Registration
                                    OpenVPN Clients

                                    Forgot to mention these were already disabled and have always been on my box.

                                    1 Reply Last reply Reply Quote 0
                                    • johnpozJ
                                      johnpoz LAYER 8 Global Moderator @Kempain
                                      last edited by

                                      @kempain said in Slow DNS after 22.05:

                                      total.recursion.time.avg=1.462659
                                      total.recursion.time.median=1.03385

                                      Your avg to resolve something is 1.5 seconds? Well that is going to be problematic for sure.. clients timeout normally at 2 seconds.

                                      Then you have this

                                      total.recursion.time.avg=9.417250

                                      Yeah you have a problem with talking to the NSs for sure..

                                      An intelligent man is sometimes forced to be drunk to spend time with his fools
                                      If you get confused: Listen to the Music Play
                                      Please don't Chat/PM me for help, unless mod related
                                      SG-4860 24.11 | Lab VMs 2.8, 24.11

                                      1 Reply Last reply Reply Quote 1
                                      • GertjanG
                                        Gertjan @Kempain
                                        last edited by Gertjan

                                        @kempain
                                        Ok, good info.
                                        Unbound is running for 15 hours already.
                                        So no 'very frequent' restarts.

                                        Still, strange.
                                        unbound should be listening on port 53 of every LAN interface.
                                        Run this

                                        sockstat | grep 'unbound'
                                        

                                        to check.

                                        Can you also check the device where you have run nslookup ?
                                        Start with

                                        ipconfig /all
                                        

                                        You should have a line with :

                                           Serveurs DNS. . .  . . . . . . . . . . : 192.168.1.1
                                                                               2001:470:1f14:5d0:2::1
                                        

                                        where 192.168.1.1 is the default pfSense IP. The IPv6 is my IPv6 LAN IP, as I'm using both IPv6 and IPv6 without issues.

                                        Or was this :

                                        nslookup youtu.be
                                        Server:  pfsense.localdomain
                                        Address:  10.x.x.x
                                        
                                        DNS request timed out.
                                            timeout was 2 seconds.
                                        DNS request timed out.
                                            timeout was 2 seconds.
                                        *** Request to pfsense.localdomain timed-out
                                        

                                        executed on pfSense ? In that case, you can see that nslookup is using 127.0.0.1 and 9.9.9.9 and 149.112.112.112 and two of them are 'unreachable'.

                                        If nslookup was run on pfSense, and it couldn't reach unbound on 127.0.0.1 that I would ditch the entire system. As this is close to impossible.

                                        Do you have :

                                        fa1c9abc-4dcf-4d6d-9840-5a078dad99d2-image.png

                                        ?
                                        I know, other choices are available, like :

                                        5bed802b-2fdd-4fc7-86bc-6a45106e457a-image.png

                                        but I would kill DNS on my system and internal networks. I'll leave it up to you to understand why 😊

                                        And drop this one :

                                        d8ed609f-23d2-43a2-ba5d-f6670256f928-image.png
                                        to a lower level, like "1"

                                        @kempain said in Slow DNS after 22.05:

                                        Forgot to mention these were already disabled and have always been on my box.

                                        Yeah, I get that ;)
                                        Your unbound isn't restarting every minute or so - it restarted 15 hours ago, that's is ok ;)

                                        No "help me" PM's please. Use the forum, the community will thank you.
                                        Edit : and where are the logs ??

                                        K 1 Reply Last reply Reply Quote 0
                                        • K
                                          Kempain @Gertjan
                                          last edited by

                                          @gertjan said in Slow DNS after 22.05:

                                          Game playing involves no DNS interfaction.

                                          Sorry for spamming replies.

                                          I do seem to get connection issues to all services not just web requests. This includes games also when connecting to game servers/lobbies usually which I assume must have to do some kind of lookup?

                                          johnpozJ 1 Reply Last reply Reply Quote 0
                                          • johnpozJ
                                            johnpoz LAYER 8 Global Moderator @Kempain
                                            last edited by

                                            @kempain you can not expect to run a good resolver if your avg time to resolve is 9 seconds.. You can't it just not going to work..

                                            An intelligent man is sometimes forced to be drunk to spend time with his fools
                                            If you get confused: Listen to the Music Play
                                            Please don't Chat/PM me for help, unless mod related
                                            SG-4860 24.11 | Lab VMs 2.8, 24.11

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.