Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbound DNS resolver - high latency at resolution of local mappings

    DHCP and DNS
    3
    18
    2.5k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      carpet
      last edited by carpet

      Hi all,

      since I updated to 23.01 I'm encountering problems with my local dns resolution.
      To clarify: I mean the resolving of hostnames registered at DNS from static mappings at my DHCP server interfaces.

      I do not have registered dynamic leases - just the static ones!
      This setting is absolutely necessary because my complete infrastructure communication (apart from switches & firewall) depends on these DNS entries.
      DNS resolver is configured to forward queries to an external DNS server if neccessary.dns_general.PNG

      I'm not using DNSSEC nor DoT/DoH (I did for testing but disabled it afterwards)

      About one week after the update I noticed some problems with different devices in my network.
      First I thought of temporary problems of the individual devices, but after the problems accumulated I started to dig through the logfiles of some devices and found out that all of them had problems with the dns resolution.
      So I checked my DNS configuration at pfsense but didn't find anything that doesn't look correct to me.

      nat.PNG
      unbound_settings_1.PNG unbound_settings_2.PNG unbound_settings_advanced_1.PNG unbound_settings_advanced_2.PNG unbound_settings_advanced_3.PNG

      After investigation of some topics I decided to do some deeper testing on my own.
      https://forum.netgate.com/topic/157590/unbound-cache-hit-rate-is-anaemic/22
      https://forum.netgate.com/topic/158126/sudden-high-latency-with-dns-local-resolver

      So I created a little powershell script which asks for a resolution of my firewalls' hostname and waits for 1sek before it asks again.

      powershell.txt
      As you can see I'm getting responses in between 40ms and 50ms - which is OK (not great, but OK)
      But sometimes it tooks about 100ms - which seems really unrealistic for me and finally above 1000ms which leads to an error.
      Now I compared my results with the unified.log from pfblockerng.
      unified_log.txt

      Some of the delays seem to appear if there is another dns query from another client or interface.
      i.e.
      simultaneous_request_1.PNG

      Others seem to have no relation to another DNS query.
      simultaneous_request_2.PNG

      Next thing was to check the performance of pfsense during the test time.
      monitoring_cpu.PNG

      monitoring_memory.PNG

      The CPU chart shows some interrupts - but I don't know yet if they are relevant.
      During testing 18:55 to 19:00 the memory chart shows an unregular behavior - but I'm not able to interpret this chart as I'm not familiar with the behaviour of pfsense memory allocation.

      So any help is appreciated to bring me forward from here :)

      Edit: I forgot to mention that I'm also a bit supprised that Unbound does an AAAA lookup although I disabled IPv6 everywhere.

      regards
      Markus

      johnpozJ S 2 Replies Last reply Reply Quote 0
      • johnpozJ
        johnpoz LAYER 8 Global Moderator @carpet
        last edited by johnpoz

        @carpet said in Unbound DNS resolver - high latency at resolution of local mappings:

        supprised that Unbound does an AAAA lookup although I disabled IPv6 everywhere.

        unbound is only going to lookup what is asked.. Yeah clients with no gua IPv6 still ask for AAAA, its stupid! But it is what it is.. Unbound doing what it was asked.

        When you say high latency for local.. What is that exactly.. When asking for a local resource the timing really should be like sub 1 ms, maybe 2 tops..

        $ dig @192.168.9.253 nas.home.arpa                                                 
                                                                                           
        ; <<>> DiG 9.16.36 <<>> @192.168.9.253 nas.home.arpa                               
        ; (1 server found)                                                                 
        ;; global options: +cmd                                                            
        ;; Got answer:                                                                     
        ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35940                          
        ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1            
                                                                                           
        ;; OPT PSEUDOSECTION:                                                              
        ; EDNS: version: 0, flags:; udp: 4096                                              
        ;; QUESTION SECTION:                                                               
        ;nas.home.arpa.                 IN      A                                          
                                                                                           
        ;; ANSWER SECTION:                                                                 
        nas.home.arpa.          3600    IN      A       192.168.9.10                       
                                                                                           
        ;; Query time: 0 msec                                                              
        ;; SERVER: 192.168.9.253#53(192.168.9.253)                                         
        ;; WHEN: Fri May 12 14:29:29 Central Daylight Time 2023                            
        ;; MSG SIZE  rcvd: 58                                                              
                                                                                           
        

        Notice 0 ms, so that means it was under 1.. What are you seeing for such queries?

        Are you really using only localdomain as your domain? Not a fan of doing that.. I would use a domain with tld.. and not a single label. But are what you showing there are you saying when you asked unbound for the local record fw01.localdomain it took 351 ms? Something going on there - since yeah I would agree that shouldn't be happening.. That points to something wrong for resolving a local resource.

        edit: just to follow through on the stupid AAAA queries - here I did a capture on pfsense lan, then went to cnn.com in my browser.. Look it got a response for the A record, and there it goes asking for AAAA anyway - WTF for? The machine doesn't even show link-local IPv6.. I have IPv6 disabled in my windows PC... But it continues to ask for AAAA

        AAAA.jpg

        But unbound isn't going to go on its own and ask for some AAAA when the client only asked for A for host.domain.tld

        edit2: so got firefox from stop doing it via setting network.dns.disableIPv6 in about:config to true - but just need to remember that if I want to do something with IPv6 that I set that back to false because I do enable IPv6 on my pc now and then for testing stuff.

        An intelligent man is sometimes forced to be drunk to spend time with his fools
        If you get confused: Listen to the Music Play
        Please don't Chat/PM me for help, unless mod related
        SG-4860 24.11 | Lab VMs 2.7.2, 24.11

        1 Reply Last reply Reply Quote 0
        • C
          carpet
          last edited by

          Hello johnpoz,

          thank you for your fast reply!
          I checked the IPv6 topic - you were right unbound only does what it's asked for - didn't check the type parameter in my request.
          But as you can see the timings strongly vary for me.dnslookup_pfsense.PNG
          A lookup compared to complete lookup.txt unified.txt

          For A-only lookups it's ~20ms to 30ms. (with some spikes)
          For full lookups it's ~30ms to 40ms. (with some spikes)
          If I resolve the names at pfsense itself it returns me 5ms delay...

          But I'm still far away from 0ms or 1ms answering times.

          Have you got any idea where I can start diagnosis of this problem?

          regards
          Markus

          johnpozJ 1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator @carpet
            last edited by

            @carpet how are you getting a reply for fw01.localdomain from this 5.1 server?

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

            C 1 Reply Last reply Reply Quote 0
            • C
              carpet @johnpoz
              last edited by

              @johnpoz

              I do all the administration with a Windows PC
              So I'm using the best onboard tools available -> Powershell

              I already attached the script an its results als txt.
              Here it comes again:
              A lookup compared to complete lookup.txt

              But I also searched for a "dig" tool withing microsoft store and made a query with it:
              31728dfa-e994-4672-a6a3-5e60fef746ee-image.png

              As you can see the responsetime of the query was 119ms.

              Just to provide you some more information:
              My PC is directly connected to my main switch.
              Calbe length is ~2m.
              Switchport is configured as an access port with untagged vlan 31.
              Switch is an D-Link DGS-1100-24 (I know that's crappy but it does it's job - at least until now)
              Port 24 of the switch is connected with pfsense; configured as trunk allowing only tagged packets from vlan 21,31,32,33,41,42,43,44,45,46,47.
              All these vlans have appropiate interfaces and gateways at pfsene.
              Each vlan has its subnet, so vlan 21 is subnet 192.168.21.0/24 and so on...gateway for each subnet is 254.

              Windows version of my PC is up to date.
              c1825597-643f-4941-9865-10b65b006ed9-image.png

              I encounter the dns problem but not only at my windows pc, but with all other devices (Crestron control system, ESP01, ESP8266, ...) in different subnets.
              All devices use pfsense for DNS resolution and all of the communiaction relies on the DNS resolution of unbounf resolver for the static DHCP mappings.

              regards
              Markus

              1 Reply Last reply Reply Quote 0
              • C
                carpet
                last edited by

                OK I've spent the complete afternoon with investigation.
                Only thing I can say, is that I am now 100% sure that the problem comes from pfsense or my configuration.

                How did I test?
                I've used a second sg1100 and a spare switch (netgear s3300-52x), applied my current pfsense config (from my live system) to this testing system.
                I configured the switch with all neccessary vlans and connected pfsense to port 48 and my Laptop by cable to port 13.
                Port 13 = Untagged 31
                Port 48 = Trunk Tagged 21,31,32,33,41,42,43,44,45,46,47
                nothing else is connected - pfsense has no WAN connection

                Then I startet the powershell script at my laptop queriing for static dhcp entries and got responses in ~20ms, but again with some spikes (up to 1xxms).
                After connecting to the web interface of this pfsense and checking some logs the spikes accumulated - but I still have no idea why this happens.

                johnpozJ 1 Reply Last reply Reply Quote 0
                • johnpozJ
                  johnpoz LAYER 8 Global Moderator @carpet
                  last edited by

                  @carpet said in Unbound DNS resolver - high latency at resolution of local mappings:

                  powershell script

                  did you post this script - like to try it on my setup.

                  An intelligent man is sometimes forced to be drunk to spend time with his fools
                  If you get confused: Listen to the Music Play
                  Please don't Chat/PM me for help, unless mod related
                  SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                  C 1 Reply Last reply Reply Quote 0
                  • C
                    carpet @johnpoz
                    last edited by carpet

                    @johnpoz

                    posted it here, don't know if thats visible to you
                    dff91430-ac43-48f7-99d9-aaa3a63886f6-image.png

                    I can't share *ps1 files - so please see my screenshot or use the txt-file posted.
                    b0901663-666d-4597-9688-762b2fad4fc9-image.png

                    edit:
                    while (1)
                    {
                    try{
                    $timestamp = Get-Date
                    $timeout = measure-command{Resolve-DnsName -Name fw01.localdomain -Type A -DnsOnly -server 192.168.31.254 -QuickTimeout -ErrorAction stop} | Select-Object -Property @{n="time";e={$_.Milliseconds,"Milliseconds" -join " "}}
                    Write-Output "$($timestamp) - $($timeout)"
                    }
                    catch
                    {
                    $error
                    break
                    }
                    sleep -Milliseconds 1000
                    }

                    johnpozJ 1 Reply Last reply Reply Quote 0
                    • johnpozJ
                      johnpoz LAYER 8 Global Moderator @carpet
                      last edited by

                      @carpet hmmm... yeah that is odd..

                      I created a local record of that same name on mine.. 1ms response.

                      while (1)
                      >> {
                      >> try{
                      >> $timestamp = Get-Date
                      >> $timeout = measure-command{Resolve-DnsName -Name fw01.localdomain -Type A -DnsOnly -server 192.168.9.253 -QuickTimeout -ErrorAction stop} | Select-Object -Property @{n="time";e={$_.Milliseconds,"Milliseconds" -join " "}}
                      >> Write-Output "$($timestamp) - $($timeout)"
                      >> }
                      >> catch
                      >> {
                      >> $error
                      >> break
                      >> }
                      >> sleep -Milliseconds 1000
                      >> }
                      05/13/2023 17:26:33 - @{time=1 Milliseconds}
                      05/13/2023 17:26:34 - @{time=1 Milliseconds}
                      05/13/2023 17:26:35 - @{time=1 Milliseconds}
                      05/13/2023 17:26:36 - @{time=1 Milliseconds}
                      05/13/2023 17:26:37 - @{time=1 Milliseconds}
                      05/13/2023 17:26:38 - @{time=1 Milliseconds}
                      05/13/2023 17:26:39 - @{time=1 Milliseconds}
                      05/13/2023 17:26:40 - @{time=1 Milliseconds}
                      05/13/2023 17:26:41 - @{time=1 Milliseconds}
                      05/13/2023 17:26:42 - @{time=2 Milliseconds}
                      05/13/2023 17:26:43 - @{time=1 Milliseconds}
                      05/13/2023 17:26:44 - @{time=1 Milliseconds}
                      05/13/2023 17:26:45 - @{time=1 Milliseconds}
                      05/13/2023 17:26:46 - @{time=1 Milliseconds}
                      05/13/2023 17:26:47 - @{time=1 Milliseconds}
                      

                      An intelligent man is sometimes forced to be drunk to spend time with his fools
                      If you get confused: Listen to the Music Play
                      Please don't Chat/PM me for help, unless mod related
                      SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                      C 1 Reply Last reply Reply Quote 0
                      • C
                        carpet @johnpoz
                        last edited by carpet

                        @johnpoz

                        OK, I think unbound is not the problem.
                        I disabled pfblockerng and rebooted my testing pfsense.
                        Result:
                        results.jpg

                        testing envireonment.jpg

                        so my next assumption: pfblockerng is causing my problems - but I'm still wondering why I don't get 1ms replies like you got.

                        Apart from pfblockerng only avahi package is running.

                        But one thing is still there: if I make any change in the pfsense GUI (i.e. page flip to dashboard) the responsetime of one DNS reply takes longer about +10-15ms. Afterwards it's again at 2-5ms....

                        Maybe another interesting detail:
                        both of my sg1100 were affected by this:
                        https://forum.netgate.com/topic/178049/pfsense-plus-23-01-updates-on-the-1100-and-2100-systems
                        so I had to request the firmwareimage from TAC lite and had to format the systems with ZFS.
                        I have no idea how this is related to my problems, but I think I should share this information.

                        EDIT:
                        after disabling avahi

                        PS C:\WINDOWS\system32> while (1)
                        {
                        try{
                        $timestamp = Get-Date
                        $timeout = measure-command{Resolve-DnsName -Name fw01.localdomain -Type A -DnsOnly -server 192.168.31.254 -QuickTimeout -ErrorAction stop} | Select-Object -Property @{n="time";e={$_.Milliseconds,"Milliseconds" -join " "}}
                        Write-Output "$($timestamp) - $($timeout)"
                        }
                        catch
                        {
                        $error
                        break
                        }
                        sleep -Milliseconds 1000
                        }
                        05/14/2023 11:40:28 - @{time=27 Milliseconds}
                        05/14/2023 11:40:29 - @{time=2 Milliseconds}
                        05/14/2023 11:40:30 - @{time=3 Milliseconds}
                        05/14/2023 11:40:31 - @{time=2 Milliseconds}
                        05/14/2023 11:40:32 - @{time=13 Milliseconds}
                        05/14/2023 11:40:33 - @{time=3 Milliseconds}
                        05/14/2023 11:40:34 - @{time=3 Milliseconds}
                        05/14/2023 11:40:35 - @{time=14 Milliseconds}
                        05/14/2023 11:40:36 - @{time=3 Milliseconds}
                        05/14/2023 11:40:37 - @{time=3 Milliseconds}
                        05/14/2023 11:40:38 - @{time=3 Milliseconds}
                        05/14/2023 11:40:39 - @{time=9 Milliseconds}
                        05/14/2023 11:40:40 - @{time=3 Milliseconds}
                        05/14/2023 11:40:41 - @{time=2 Milliseconds}
                        05/14/2023 11:40:42 - @{time=4 Milliseconds}
                        05/14/2023 11:40:43 - @{time=2 Milliseconds}
                        05/14/2023 11:40:44 - @{time=3 Milliseconds}
                        05/14/2023 11:40:45 - @{time=2 Milliseconds}
                        05/14/2023 11:40:46 - @{time=3 Milliseconds}
                        05/14/2023 11:40:47 - @{time=2 Milliseconds}
                        05/14/2023 11:40:48 - @{time=3 Milliseconds}
                        05/14/2023 11:40:49 - @{time=2 Milliseconds}
                        05/14/2023 11:40:50 - @{time=3 Milliseconds}
                        05/14/2023 11:40:51 - @{time=2 Milliseconds}
                        05/14/2023 11:40:52 - @{time=7 Milliseconds}
                        05/14/2023 11:40:53 - @{time=14 Milliseconds}
                        05/14/2023 11:40:54 - @{time=5 Milliseconds}
                        05/14/2023 11:40:55 - @{time=2 Milliseconds}
                        05/14/2023 11:40:56 - @{time=8 Milliseconds}
                        05/14/2023 11:40:58 - @{time=3 Milliseconds}

                        PS C:\WINDOWS\system32>

                        1 Reply Last reply Reply Quote 0
                        • S
                          SteveITS Galactic Empire @carpet
                          last edited by

                          @carpet

                          DNS resolver is configured to forward queries to an external DNS server if neccessary.

                          It either forwards or doesn’t. I’m not aware of a way to forward as a fallback. It should cache answers though and I’d expect local LAN traffic should be ~1ms.

                          You might check the last few messages in https://forum.netgate.com/topic/178413/major-dns-bug-23-01-with-quad9-on-ssl/140 seems there’s a bug in unbound with ASLR enabled in FreeBSD 14.

                          Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                          When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                          Upvote 👍 helpful posts!

                          C 1 Reply Last reply Reply Quote 0
                          • C
                            carpet @SteveITS
                            last edited by

                            @steveits

                            thank you for this information!

                            I'm not sure if this problem is related to my topic.
                            I'm encountering high delays for lookups of local ressources - and in my testing environment pfsense has no WAN connection but still the delays.

                            about forwarding:
                            As I understand unbound uses it's local cache for lookups - if a ressource is not cached it uses the external DNS resolver(s) configured in the general tab of pfsense.
                            The cache is built from static DHCP mappings + former queries. (+ DHCP leases if you checked this box - I did not!)
                            Former queries will be kept in cache for the period defined under "Advanced Options" -> Minimum TTL... (see pic)
                            Restarting unbound empties the cache.
                            So changing ASLR setting should not change the handling of cached ressources (in my eyes)
                            Please help me if I'm wrong with that! :)

                            d0ae676e-767c-4f1f-9402-8f597443c31c-image.png

                            johnpozJ S 2 Replies Last reply Reply Quote 0
                            • johnpozJ
                              johnpoz LAYER 8 Global Moderator @carpet
                              last edited by johnpoz

                              @carpet I wish I had a sg1100 to test with..

                              But couple of thoughts - is it possible you have something else bombing your dns.. Like iot devices? I had an issue with my isp a while back and was down for like a day.. In that time my alexas where pretty much dosing my dns - in the 24 hour period they were asking for the same shit over and over again - like 2.6 million queries each, I have 4 of them in the house.. While was trying to access anything while my internet was down - I have to believe this would of caused me some slowness in trying to resolve even local stuff..

                              Even with working dns, since I block stuff devices love to hammer over and over again trying to resolve that thing I block.

                              Here is a test I would do - turn off the pfblocker, and make sure its 1000 or 10 or even 100's of thousands of entries are not being loaded into unbound.. Remove that load thing in your custom options box. Wouldn't be a bad test to turn off forwarding completely for testing, and make sure your hitting pfsense IP directly without any redirect firewall rule natting the traffic to loopback..

                              Now do your test.. Also check to make sure your not getting lots of queries from something else on your network while your doing your test..

                              Once you restart your unbound, you can check what number of queries your seeing via stats

                              unbound-control -c /var/unbound/unbound.conf stats

                              An intelligent man is sometimes forced to be drunk to spend time with his fools
                              If you get confused: Listen to the Music Play
                              Please don't Chat/PM me for help, unless mod related
                              SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                              C 1 Reply Last reply Reply Quote 0
                              • C
                                carpet @johnpoz
                                last edited by

                                @johnpoz

                                I'm sure that there is no spamming (because this was also my first thought and I checked that).

                                And please remember the problem also occurs at my testing environment with nothing connected not even WAN uplink.

                                @johnpoz said in Unbound DNS resolver - high latency at resolution of local mappings:

                                Here is a test I would do - turn off the pfblocker, and make sure its 1000 or 10 or even 100's of thousands of entries are not being loaded into unbound.. Remove that load thing in your custom options box. Wouldn't be a bad test to turn off forwarding completely for testing, and make sure your hitting pfsense IP directly without any redirect firewall rule natting the traffic to loopback..
                                Now do your test.. Also check to make sure your not getting lots of queries from something else on your network while your doing your test..
                                Once you restart your unbound, you can check what number of queries your seeing via stats
                                unbound-control -c /var/unbound/unbound.conf stats

                                I will go for this and post my results.

                                Thank you very much for your support until now!

                                oh btw. I opened a case at TAC Lite and requested the image for +22.05 - if everything else doesn't work - this will be my last option - maybe i will give my old zotac zbox a try with +23.01...

                                1 Reply Last reply Reply Quote 0
                                • S
                                  SteveITS Galactic Empire @carpet
                                  last edited by

                                  @carpet Without rereading the long thread they tied the FreeBSD bug to slow queries. Agree it makes little sense unless the coding maybe assumes memory locations and somehow fails or retries without crashing but I didn’t write it. :)

                                  If forwarding is enabled then the settings/general list is used. If not enabled then it will use the root DNS servers to do lookups directly.

                                  If you are querying info from registered DHCP leases or host or domain overrides it should do neither. If it can’t find info locally it will presumably try to connect out and time out.

                                  Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                  When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                  Upvote 👍 helpful posts!

                                  C 1 Reply Last reply Reply Quote 0
                                  • C
                                    carpet @SteveITS
                                    last edited by

                                    @steveits
                                    thank you for this explanation!

                                    @johnpoz
                                    after a lot of testing I'm now sure my problems are realted to pfblockerng blacklists.
                                    I do not understand why local DNS entries are checked against pfblocker blacklists with unbound.... but I hope I find a way to change this behaviour.

                                    Please handle this topic as closed!
                                    Thanks for your help!

                                    johnpozJ 1 Reply Last reply Reply Quote 0
                                    • johnpozJ
                                      johnpoz LAYER 8 Global Moderator @carpet
                                      last edited by

                                      @carpet said in Unbound DNS resolver - high latency at resolution of local mappings:

                                      I do not understand why local DNS entries are checked against pfblocker blacklists with unbound

                                      You understand how it blocks is it creates local entries for say baddomain.tld points to 127.0.0.1

                                      If you have 10,000 entries in your local db, then looking for fw01.localdomain takes a bit longer.. vs looking through 100 records..

                                      An intelligent man is sometimes forced to be drunk to spend time with his fools
                                      If you get confused: Listen to the Music Play
                                      Please don't Chat/PM me for help, unless mod related
                                      SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                                      C 1 Reply Last reply Reply Quote 0
                                      • C
                                        carpet @johnpoz
                                        last edited by

                                        @johnpoz said in Unbound DNS resolver - high latency at resolution of local mappings:

                                        @carpet said in Unbound DNS resolver - high latency at resolution of local mappings:

                                        I do not understand why local DNS entries are checked against pfblocker blacklists with unbound

                                        You understand how it blocks is it creates local entries for say baddomain.tld points to 127.0.0.1

                                        If you have 10,000 entries in your local db, then looking for fw01.localdomain takes a bit longer.. vs looking through 100 records..

                                        I understand now.
                                        I expected unbound only checking blacklists as a second step (and not creating these entries at the same level)

                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.