Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts

    Scheduled Pinned Locked Moved General pfSense Questions
    17 Posts 6 Posters 4.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • getcomG
      getcom @SteveITS
      last edited by getcom

      @steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

      I've seen several posts about DNS issues with forwarding enabled but generally unchecking DNSSEC fixes it. You don't have that enabled.

      That's a lot of memory too, 2.7 GB. Do you have a super large DHCP lease file maybe, that it's trying to reload every time it restarts? (which it does at each lease renewal, with Registration checked)

      Do you have pfBlockerNG installed, or other packages?

      What is a super large DHCP lease file? 140 entries are not too large. We have clients with 600 or more entries without any issue @pfsense version 22.05.
      Yes, there is pfBlockerNG-devel version 3.2.0_1 running. Should I update this to the pfBlockerNG which should have now the same features as the devel version?

      pfSense23.01_unbound_installed packages.png

      johnpozJ S 2 Replies Last reply Reply Quote 0
      • johnpozJ
        johnpoz LAYER 8 Global Moderator @getcom
        last edited by johnpoz

        @getcom A load of 18, that seems crazy high.. In comparison my unbound is barely using anything.

        unbound.jpg

        is the issue that unbound is stuck or something, or that its working like crazy trying to open a port causing high cpu?

        As to moving to just NG vs -devel, I would think so, but I find it highly unlikely that is the reason for your issues. My understanding (which could be flawed) is that the versions are now in sync.. What might be done going forward with the -devel version I am not sure, maybe he will start working with newer version of pfblocker?

        An intelligent man is sometimes forced to be drunk to spend time with his fools
        If you get confused: Listen to the Music Play
        Please don't Chat/PM me for help, unless mod related
        SG-4860 24.11 | Lab VMs 2.8, 24.11

        1 Reply Last reply Reply Quote 0
        • S
          SteveITS Galactic Empire @getcom
          last edited by SteveITS

          @getcom said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

          What is a super large DHCP lease file

          A long time ago I vaguely recall a post where the person had a relatively few devices but a corrupted file and it had thousands upon thousands of entries. Just trying to guess at what could be making unbound use CPU, and reading in files was one thought.

          @getcom said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

          Should I update this to the pfBlockerNG

          They are supposed to be the same now, at least at release. (Edit: release of 23.01)

          Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
          When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
          Upvote ๐Ÿ‘ helpful posts!

          GertjanG 1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @SteveITS
            last edited by Gertjan

            @steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

            What is a super large DHCP lease file

            This one, for the IPv4 'pool' leases : /var/dhcpd/var/db/dhcpd.leases

            There is a small gleu-ware program called /usr/local/sbin/dhcpleases that is kickstarted when a dhcpd IPv4 server assigns a new lease to some LAN device.
            dhcpleases will parse the file for valid, actif leases, and write them out to /etc/hosts

            When it finished, it will a stone from the ground and sling-shotd unbound.

            To see if this happens a lot : you ... can't. As you can't be sure what the reason was why unbound was instructed to restart.
            You can see how often unbound restarts :

            grep 'start' /var/log/resolver.log
            

            and again, this could also happen after a WAN event (your ISP wanted to give you a new IP) or some one remove a LAN cable from the pfSense box.
            Or the pfSense admin instructed pfBlockerng that it needs to reload every hour.
            Or some idiot admin (like me) likes to mess around with his pfSense.

            So, keep in mind, some events are normal, and it's ok that unbound restarts ones in a while.

            ......
            <30>1 2023-02-19T08:32:55.264556+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] info: start of service (unbound 1.17.1).
            <29>1 2023-02-19T14:35:35.966065+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] notice: Restart of unbound 1.17.1.
            <30>1 2023-02-21T14:35:42.050332+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] info: start of service (unbound 1.17.1).
            <29>1 2023-02-21T14:35:44.782969+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] notice: Restart of unbound 1.17.1.
            

            But one thing is sure : those who have a lot of LAN devices using DHCP, and have "Register DHCP leases in the DNS Resolver" will get a free membership of the 'DNS/unbound' sucks - club.
            At least they know why (now).

            This club has of course tight relations with the 'I have to forward to some one' club.
            Sorry for the rant.

            edit : Yes, This will get resolved.
            Even if this that means 'KEA' joins the party (see other threads/rdmine).

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            getcomG 1 Reply Last reply Reply Quote 0
            • getcomG
              getcom @Gertjan
              last edited by

              @gertjan said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

              @steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

              What is a super large DHCP lease file

              This one, for the IPv4 'pool' leases : /var/dhcpd/var/db/dhcpd.leases

              There is a small gleu-ware program called /usr/local/sbin/dhcpleases that is kickstarted when a dhcpd IPv4 server assigns a new lease to some LAN device.
              dhcpleases will parse the file for valid, actif leases, and write them out to /etc/hosts

              When it finished, it will a stone from the ground and sling-shotd unbound.

              To see if this happens a lot : you ... can't. As you can't be sure what the reason was why unbound was instructed to restart.
              You can see how often unbound restarts :

              grep 'start' /var/log/resolver.log
              

              and again, this could also happen after a WAN event (your ISP wanted to give you a new IP) or some one remove a LAN cable from the pfSense box.
              Or the pfSense admin instructed pfBlockerng that it needs to reload every hour.
              Or some idiot admin (like me) likes to mess around with his pfSense.

              So, keep in mind, some events are normal, and it's ok that unbound restarts ones in a while.

              ......
              <30>1 2023-02-19T08:32:55.264556+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] info: start of service (unbound 1.17.1).
              <29>1 2023-02-19T14:35:35.966065+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] notice: Restart of unbound 1.17.1.
              <30>1 2023-02-21T14:35:42.050332+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] info: start of service (unbound 1.17.1).
              <29>1 2023-02-21T14:35:44.782969+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] notice: Restart of unbound 1.17.1.
              

              But one thing is sure : those who have a lot of LAN devices using DHCP, and have "Register DHCP leases in the DNS Resolver" will get a free membership of the 'DNS/unbound' sucks - club.
              At least they know why (now).

              This club has of course tight relations with the 'I have to forward to some one' club.
              Sorry for the rant.

              edit : Yes, This will get resolved.
              Even if this that means 'KEA' joins the party (see other threads/rdmine).

              @SteveITS, @Gertjan
              The /var/dhcpd/var/db/dhcpd.leases file looks normal. Nothing different to other setups.
              There are ~4000 open connections to the forwarder DNS servers. I`m wondering if it asks every forwarding server for every client request?
              It looks like that the connections will not be closed.
              This means for every site2site VPN the load will increase dramatically in the 23.01 release. In previous versions this setup was unobtrusive.

              lsof | grep unbound | wc -l
                  4039
              
              sockstat | grep unbound | wc -l
                  3750
              
              cat /var/unbound/host_entries.conf | wc -l
                   291
              

              If I restart unbound every hour by cron, it does not look so bad. But this is an ugly workaround.

              last pid:  1173;  load averages:  1.19,  1.18,  1.39                                                                                                               up 1+17:00:36  21:34:05
              75 processes:  1 running, 74 sleeping
              CPU:  1.7% user,  0.1% nice,  4.0% system,  1.1% interrupt, 93.2% idle
              Mem: 743M Active, 4365M Inact, 3241M Wired, 54G Free
              ARC: 1431M Total, 218M MFU, 1095M MRU, 156K Anon, 17M Header, 96M Other
                   1201M Compressed, 2544M Uncompressed, 2.12:1 Ratio
              Swap: 2048M Total, 2048M Free
              
                PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
              67966 unbound      16  23    0   568M   471M kqread  12  34:05  69.36% unbound
              

              Compared to a similiar setup on a pfSense release 2.6 without unbound cron restart:

              last pid:  6402;  load averages:  0.58,  0.73,  0.86                                                                                                                                                                                            up 368+03:08:31 21:40:34
              69 processes:  1 running, 67 sleeping, 1 zombie
              CPU:  0.7% user,  0.0% nice,  3.8% system,  0.2% interrupt, 95.4% idle
              Mem: 448M Active, 319M Inact, 3283M Wired, 58G Free
              ARC: 1144M Total, 200M MFU, 907M MRU, 32K Anon, 6030K Header, 31M Other
                   356M Compressed, 771M Uncompressed, 2.17:1 Ratio
              Swap: 24G Total, 24G Free
              
                PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
              38497 unbound      16  22    0   253M   163M kqread   4  50:33  52.76% unbound
              

              Is there any difference or disadvantage if I would use bind instead of unbound?
              My external DNS servers are all running bind 9 on LXC containers. I could never recognize any problem with it.

              S 1 Reply Last reply Reply Quote 0
              • S
                SteveITS Galactic Empire @getcom
                last edited by

                @getcom Hm, I have had 23.01 running at my home on a 2100, about 18 hours since I changed a setting last night. I don't have a billion devices at home though.

                : sockstat | grep unbound | wc -l
                      12
                
                  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
                25498 unbound       2  20    0   158M   124M kqread   0   7:37   0.25% unbound
                

                DNS Forwarder has a "Query DNS servers sequentially" option, Resolver does not. I found a 2015 post (https://serverfault.com/questions/732920/how-to-do-parallel-queries-to-the-upstream-dns-using-unbound) that unbound queries them all.

                How many DNS servers do you have configured?

                re: BIND, there is a package but as I vaguely recall it's not meant for resolving? Not too familiar with the package (am of course familiar with BIND).

                Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                Upvote ๐Ÿ‘ helpful posts!

                getcomG 1 Reply Last reply Reply Quote 0
                • getcomG
                  getcom @SteveITS
                  last edited by

                  @steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

                  How many DNS servers do you have configured?

                  re: BIND, there is a package but as I vaguely recall it's not meant for resolving? Not too familiar with the package (am of course familiar with BIND).

                  Two external DNS servers, one for each WAN connection (=>dual WAN setup) , four for the four site2site VPNs, one for an AD sub domain.
                  Related to the 2015 post: unbound did this parallel queries also in the 22.05 release. This means this is/was not the root cause.

                  What is strange is the fact that after a few cron unbound restarts, the load is now in a normal range (~0.19). Also the sockstat shows only 400 to 600 connections.
                  Puuh...this is something what I don`t like because it is gone before I could find out the root cause...this issue can happen again.

                  getcomG 1 Reply Last reply Reply Quote 0
                  • getcomG
                    getcom @getcom
                    last edited by

                    It is back again. The load is definitely too high. I could recognize audio problems in Jitsi meetings.
                    The system load goes up and down all the time.
                    I think I will go back to the previous release until this is fixed.

                    pfSense23.01_unbound_top3.png

                    getcomG 1 Reply Last reply Reply Quote 0
                    • getcomG
                      getcom @getcom
                      last edited by

                      After updating another system to 23.01 with same behavior, I came to the end that the parallel requests are so much more aggressive in 23.01 that the answers are killing the messenger. I saw ~4500+ requests/packets per second shooting up unbound in a way that it was not able to answer anymore. I got lots of timeouts on VMs and clients for some seconds/minutes and all was freezing.
                      @Gertjan I think you are right...I`m a member of the 'DNS/unbound' sucks - club' now ...
                      My conclusion is that until unbound is not able to handle sequential requests, I should give dnsmasq alias DNS-Forwarder a try as mentioned by @steveits.

                      S 1 Reply Last reply Reply Quote 0
                      • S
                        SteveITS Galactic Empire @getcom
                        last edited by

                        @getcom One other idea that has come up in other threads e.g. https://forum.netgate.com/topic/178413/major-dns-bug-23-01-with-quad9-on-ssl/ is to disable DNS over TLS. It doesn't seem to be a problem for everyone but some people say it is necessary when using Unbound in forwarding mode.

                        I don't recall anyone reporting high CPU usage though.

                        Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                        When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                        Upvote ๐Ÿ‘ helpful posts!

                        getcomG 1 Reply Last reply Reply Quote 0
                        • A
                          aduzsardi
                          last edited by aduzsardi

                          rant:
                          there are quite a few problems with pfSense and DNS services , i just given up trying to chase them down, either high cpu usage, flaky dns service, dns not working at all on localhost on the firewall ... it just times out , domain overrides not working and so on

                          now running DNS server on a couple separate boxes, no issues at all
                          pfSense does show it's limits even on netgate hardware (we have a couple of 7100 appliances), they cram a lot of things into this firewall appliance for whatever reason ... i'm guessing if we need professional business level firewall that's what TNSR is for

                          getcomG 2 Replies Last reply Reply Quote 0
                          • getcomG
                            getcom @SteveITS
                            last edited by

                            @steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

                            @getcom One other idea that has come up in other threads e.g. https://forum.netgate.com/topic/178413/major-dns-bug-23-01-with-quad9-on-ssl/ is to disable DNS over TLS. It doesn't seem to be a problem for everyone but some people say it is necessary when using Unbound in forwarding mode.

                            DNS over TLS is/was not activated...

                            1 Reply Last reply Reply Quote 0
                            • getcomG
                              getcom @aduzsardi
                              last edited by

                              This post is deleted!
                              1 Reply Last reply Reply Quote 0
                              • getcomG
                                getcom @aduzsardi
                                last edited by

                                @aduzsardi said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

                                rant:
                                there are quite a few problems with pfSense and DNS services , i just given up trying to chase them down, either high cpu usage, flaky dns service, dns not working at all on localhost on the firewall ... it just times out , domain overrides not working and so on

                                now running DNS server on a couple separate boxes, no issues at all
                                pfSense does show it's limits even on netgate hardware (we have a couple of 7100 appliances), they cram a lot of things into this firewall appliance for whatever reason ... i'm guessing if we need professional business level firewall that's what TNSR is for

                                Some tcpdumps later: all clear
                                At the moment we have lots of DoS and bruteforce attacks here in Germany. We are delivering tanks and defense systems to Ukraine. Perhaps there is a connection with this? Don`t know, but I never saw something like this before.
                                With the regular VDSL or cable contract you get an asynchronous connection, e.g. 100mbps downsteam/40mbps upstream. I saw up to 6000 DNS requests per second, each of them had only some kB, but the answers were up to four times larger. Additionally with the smaller upstream bandwidth the complete internet connection was unusable.
                                The smaller Netgate SG boxes ran hot in this situation...
                                Since I activated geoip blocking for all countries except Germany and USA and added ~150 badbot blacklists into pfblockerNG there is a silence now...

                                E 1 Reply Last reply Reply Quote 1
                                • E
                                  Enhance2736 @getcom
                                  last edited by

                                  @getcom dang man! i feel for you. keep up the good work and keep those ruzzkies out !!!

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.