Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Super long DNS times intermittently

    Scheduled Pinned Locked Moved DHCP and DNS
    12 Posts 3 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mark.dodrill
      last edited by

      Hi all. I've been having intermittent issues with DNS lookups failing or taking 10's of seconds to happen. I was finally able to catch these unbound logs:

      May 4 19:52:10 unbound 63124 [63124:0] info: histogram of recursion processing times
      May 4 19:52:10 unbound 63124 [63124:0] info: average recursion processing time 98.123002 sec
      May 4 19:52:10 unbound 63124 [63124:0] info: server stats for thread 0: requestlist max 123 avg 22.3883 exceeded 0 jostled 0
      May 4 19:52:10 unbound 63124 [63124:0] info: server stats for thread 0: 29610 queries, 11626 answers from cache, 17984 recursions, 0 prefetch, 0 rejected by ip ratelimiting
      May 4 19:52:10 unbound 63124 [63124:0] info: service stopped (unbound 1.13.1).

      Not 100% sure what DNS recursion means, but 98 seconds as an average seems super wrong/bad. My other Google search show sub 1 second avg dns recursion times. I've tried having my pfSense SG1000 box as the DNS source vs. pfSense first then going to 8.8.8.8.

      I am trying to figure out why I'm seeing these intermittent super long DNS resolutions, hoping someone can give me some advice based on this log. My Google searches haven't found anything useful. I've looked at Unbound logging levels, but the next most detailed one generates 20+ logs per each DNS lookup and it's too much to even look at.

      Thanks for your ideas.

      GertjanG 1 Reply Last reply Reply Quote 0
      • GertjanG
        Gertjan @mark.dodrill
        last edited by

        @mark-dodrill said in Super long DNS times intermittently:

        I've tried having my pfSense SG1000 box as the DNS source vs. pfSense first then going to 8.8.8.8.

        Use the default resolver DNS settings.

        With only one (1 !) exception :

        Do NOT check this :

        051ef3dd-538b-4906-8630-712eed98d81e-image.png

        ( It's ok check this option when you fully understood the consequences )

        I'm pretty sure that pfSense (Netgate) doesn't want you to communicate all your private DNS info to some known (8.8.8.8) or unknown company. The exception might be that you have a contract with these entities.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        M 1 Reply Last reply Reply Quote 0
        • M
          mark.dodrill @Gertjan
          last edited by

          @gertjan Thanks for the thoughts here. I've tried it both with and without this check box being checked, with the same results, unfortunately.
          I think I have all the DNS Resolver settings reverted to defaults: Enable DNS Resolver checked, port 53, default SSL/TLS cert, SSL Port 853, All network interfaces, all outgoing network interfaces, DNSSEC checked, all other options unchecked.

          Also, in System, General Setup, DNS Resolution Behavior. I presume the best option would be use Local DNS, fall back to Remote DNS, since it is default. Is that correct?

          Do you happen to know how to dig into exactly which DNS lookups took so long? Maybe I can take that up with my ISP?

          GertjanG 1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @mark.dodrill
            last edited by

            @mark-dodrill said in Super long DNS times intermittently:

            Is that correct?

            Yep.

            @mark-dodrill said in Super long DNS times intermittently:

            Do you happen to know how to dig into exactly which DNS lookups took so long

            Get more details.
            What part takes so long ?

            Ask dig again :

            dig forum.netgate.com +trace
            

            and you see it talking to one of the root name servers.
            Then to of of the TLD servers.
            Then to one off the (at least two) domain name servers.
            So it get a result for A - the default query.

            If the "root name servers" was slow, everybody on the net would suffer. These are not just 13 servers somewhere : they are cloned everywhere.
            Same thing for the TLD's.

            The third step : the final query to the domain name servers could take a while, as there are tyically only 2 or 3 of them. And no CDN's.
            Example :

            netgate.com.            3600    IN      NS      ns2.netgate.com.
            netgate.com.            3600    IN      NS      ns1.netgate.com.
            

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            M 1 Reply Last reply Reply Quote 0
            • M
              mark.dodrill @Gertjan
              last edited by

              @gertjan said in Super long DNS times intermittently:

              dig forum.netgate.com +trace

              Ha, I didn't know "dig" was actually a Linux command. :) I was thinking in the general sense of trying to find out which DNS queries (what exact names) were taking so long. I have looked at the unbound docs I could find and couldn't see anything. Do you know any specifics of how to identify which exact DNS lookups are taking that long?

              GertjanG 1 Reply Last reply Reply Quote 0
              • GertjanG
                Gertjan @mark.dodrill
                last edited by Gertjan

                @mark-dodrill

                Did you try :

                dig forum.netgate.com +trace
                

                ??

                Or, why not :

                dig forum.netgate.com +trace | grep 'Received'
                

                I saw :

                ;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms
                ;; Received 1177 bytes from 192.203.230.10#53(e.root-servers.net) in 24 ms
                ;; Received 691 bytes from 192.55.83.30#53(m.gtld-servers.net) in 36 ms
                ;; Received 186 bytes from 208.123.73.80#53(ns1.netgate.com) in 140 ms
                

                A more clearer answer isn't possible.
                But you have to know it is the answer ;)

                No "help me" PM's please. Use the forum, the community will thank you.
                Edit : and where are the logs ??

                M 1 Reply Last reply Reply Quote 0
                • M
                  mark.dodrill @Gertjan
                  last edited by

                  @gertjan
                  Thanks. I've enabled the periodic histogram of DNS resolution times to see if that helps.

                  M 1 Reply Last reply Reply Quote 0
                  • M
                    mark.dodrill @mark.dodrill
                    last edited by

                    Update: I increased the DNS cache size from 20k to 50k, and that seems to have resolved the long average recursion time dropped from 100+ seconds to less than 2 seconds. Number of DNS queries dropped from about 40k per min to less than 400 per min. I guess the too small cache size was the key root cause.

                    GertjanG 1 Reply Last reply Reply Quote 0
                    • GertjanG
                      Gertjan @mark.dodrill
                      last edited by

                      @mark-dodrill said in Super long DNS times intermittently:

                      DNS cache size from 20k to 50k

                      You changed something on this page : Services > DNS Resolver > Advanced Settings ?

                      There is no "20k" or "50k" setting. It's more like "50 MB".

                      No "help me" PM's please. Use the forum, the community will thank you.
                      Edit : and where are the logs ??

                      M 1 Reply Last reply Reply Quote 0
                      • M
                        mark.dodrill @Gertjan
                        last edited by

                        @gertjan On the DNS Resolver, Advanced Settings, I changed:

                        Prefetch Support = Checked/Enabled
                        Message Cache Size = 20MB (4MB default)
                        Outgoing TCP Buffers = 20 (10 default)
                        Incoming TCP Buffers = 20 (10 default)
                        Number of Hosts to cache = 100000 (10k default)

                        On the main screen, I added:
                        statistics-interval: 60
                        So that I can see how many requests there were in a minute, how many came from cache, how many recursions.

                        It looked good for a while after the changes, but I'm back to super long recursion times again. :(

                        May 7 08:32:18 unbound 5057 [5057:0] info: [25%]=0.225168 median[50%]=4.35495 [75%]=128.882
                        May 7 08:32:18 unbound 5057 [5057:0] info: histogram of recursion processing times
                        May 7 08:32:18 unbound 5057 [5057:0] info: average recursion processing time 124.006674 sec
                        May 7 08:32:18 unbound 5057 [5057:0] info: server stats for thread 0: requestlist max 131 avg 14.078 exceeded 0 jostled 0
                        May 7 08:32:18 unbound 5057 [5057:0] info: server stats for thread 0: 33981 queries, 15757 answers from cache, 18224 recursions, 1325 prefetch, 0 rejected by ip ratelimiting

                        Having 33,981 DNS queries per minute seems crazy to me (that's 566 per second). This my home network with my work VPN and one Netflix stream going (max) at a time. Not sure how to determine which IP is generating most of these requests, any ideas would be appreciated.
                        Back to the drawing board.

                        1 Reply Last reply Reply Quote 0
                        • AndyRHA
                          AndyRH
                          last edited by

                          For testing it might be worth setting up a DNS server, just to make sure it is not pfSense. I use PiHole and it has good statistics on usage, it will at least help you ID why you have so many queries. There are about 50 devices on my network and I only get ~200k queries per day. And that includes Roku's that can't report usage and cameras that really want to call home.

                          o||||o
                          7100-1u

                          M 1 Reply Last reply Reply Quote 0
                          • M
                            mark.dodrill @AndyRH
                            last edited by

                            @andyrh Thanks for that idea.

                            I did a packet capture on my pfSense HW for just DNS queries on the LAN, and it's not showing hundreds per second. I guess it just shows cumulative totals, rather than totals per my one minute output. Interpreting the data that way results in a much smaller amount of queries per minute.
                            Thanks for your thoughts.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.