Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts

    Scheduled Pinned Locked Moved General pfSense Questions
    17 Posts 6 Posters 4.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      SteveITS Galactic Empire @getcom
      last edited by SteveITS

      @getcom said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

      What is a super large DHCP lease file

      A long time ago I vaguely recall a post where the person had a relatively few devices but a corrupted file and it had thousands upon thousands of entries. Just trying to guess at what could be making unbound use CPU, and reading in files was one thought.

      @getcom said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

      Should I update this to the pfBlockerNG

      They are supposed to be the same now, at least at release. (Edit: release of 23.01)

      Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
      When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
      Upvote ๐Ÿ‘ helpful posts!

      GertjanG 1 Reply Last reply Reply Quote 0
      • GertjanG
        Gertjan @SteveITS
        last edited by Gertjan

        @steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

        What is a super large DHCP lease file

        This one, for the IPv4 'pool' leases : /var/dhcpd/var/db/dhcpd.leases

        There is a small gleu-ware program called /usr/local/sbin/dhcpleases that is kickstarted when a dhcpd IPv4 server assigns a new lease to some LAN device.
        dhcpleases will parse the file for valid, actif leases, and write them out to /etc/hosts

        When it finished, it will a stone from the ground and sling-shotd unbound.

        To see if this happens a lot : you ... can't. As you can't be sure what the reason was why unbound was instructed to restart.
        You can see how often unbound restarts :

        grep 'start' /var/log/resolver.log
        

        and again, this could also happen after a WAN event (your ISP wanted to give you a new IP) or some one remove a LAN cable from the pfSense box.
        Or the pfSense admin instructed pfBlockerng that it needs to reload every hour.
        Or some idiot admin (like me) likes to mess around with his pfSense.

        So, keep in mind, some events are normal, and it's ok that unbound restarts ones in a while.

        ......
        <30>1 2023-02-19T08:32:55.264556+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] info: start of service (unbound 1.17.1).
        <29>1 2023-02-19T14:35:35.966065+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] notice: Restart of unbound 1.17.1.
        <30>1 2023-02-21T14:35:42.050332+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] info: start of service (unbound 1.17.1).
        <29>1 2023-02-21T14:35:44.782969+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] notice: Restart of unbound 1.17.1.
        

        But one thing is sure : those who have a lot of LAN devices using DHCP, and have "Register DHCP leases in the DNS Resolver" will get a free membership of the 'DNS/unbound' sucks - club.
        At least they know why (now).

        This club has of course tight relations with the 'I have to forward to some one' club.
        Sorry for the rant.

        edit : Yes, This will get resolved.
        Even if this that means 'KEA' joins the party (see other threads/rdmine).

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        getcomG 1 Reply Last reply Reply Quote 0
        • getcomG
          getcom @Gertjan
          last edited by

          @gertjan said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

          @steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

          What is a super large DHCP lease file

          This one, for the IPv4 'pool' leases : /var/dhcpd/var/db/dhcpd.leases

          There is a small gleu-ware program called /usr/local/sbin/dhcpleases that is kickstarted when a dhcpd IPv4 server assigns a new lease to some LAN device.
          dhcpleases will parse the file for valid, actif leases, and write them out to /etc/hosts

          When it finished, it will a stone from the ground and sling-shotd unbound.

          To see if this happens a lot : you ... can't. As you can't be sure what the reason was why unbound was instructed to restart.
          You can see how often unbound restarts :

          grep 'start' /var/log/resolver.log
          

          and again, this could also happen after a WAN event (your ISP wanted to give you a new IP) or some one remove a LAN cable from the pfSense box.
          Or the pfSense admin instructed pfBlockerng that it needs to reload every hour.
          Or some idiot admin (like me) likes to mess around with his pfSense.

          So, keep in mind, some events are normal, and it's ok that unbound restarts ones in a while.

          ......
          <30>1 2023-02-19T08:32:55.264556+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] info: start of service (unbound 1.17.1).
          <29>1 2023-02-19T14:35:35.966065+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] notice: Restart of unbound 1.17.1.
          <30>1 2023-02-21T14:35:42.050332+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] info: start of service (unbound 1.17.1).
          <29>1 2023-02-21T14:35:44.782969+01:00 pfSense.mydomain.tld unbound 65402 - - [65402:0] notice: Restart of unbound 1.17.1.
          

          But one thing is sure : those who have a lot of LAN devices using DHCP, and have "Register DHCP leases in the DNS Resolver" will get a free membership of the 'DNS/unbound' sucks - club.
          At least they know why (now).

          This club has of course tight relations with the 'I have to forward to some one' club.
          Sorry for the rant.

          edit : Yes, This will get resolved.
          Even if this that means 'KEA' joins the party (see other threads/rdmine).

          @SteveITS, @Gertjan
          The /var/dhcpd/var/db/dhcpd.leases file looks normal. Nothing different to other setups.
          There are ~4000 open connections to the forwarder DNS servers. I`m wondering if it asks every forwarding server for every client request?
          It looks like that the connections will not be closed.
          This means for every site2site VPN the load will increase dramatically in the 23.01 release. In previous versions this setup was unobtrusive.

          lsof | grep unbound | wc -l
              4039
          
          sockstat | grep unbound | wc -l
              3750
          
          cat /var/unbound/host_entries.conf | wc -l
               291
          

          If I restart unbound every hour by cron, it does not look so bad. But this is an ugly workaround.

          last pid:  1173;  load averages:  1.19,  1.18,  1.39                                                                                                               up 1+17:00:36  21:34:05
          75 processes:  1 running, 74 sleeping
          CPU:  1.7% user,  0.1% nice,  4.0% system,  1.1% interrupt, 93.2% idle
          Mem: 743M Active, 4365M Inact, 3241M Wired, 54G Free
          ARC: 1431M Total, 218M MFU, 1095M MRU, 156K Anon, 17M Header, 96M Other
               1201M Compressed, 2544M Uncompressed, 2.12:1 Ratio
          Swap: 2048M Total, 2048M Free
          
            PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
          67966 unbound      16  23    0   568M   471M kqread  12  34:05  69.36% unbound
          

          Compared to a similiar setup on a pfSense release 2.6 without unbound cron restart:

          last pid:  6402;  load averages:  0.58,  0.73,  0.86                                                                                                                                                                                            up 368+03:08:31 21:40:34
          69 processes:  1 running, 67 sleeping, 1 zombie
          CPU:  0.7% user,  0.0% nice,  3.8% system,  0.2% interrupt, 95.4% idle
          Mem: 448M Active, 319M Inact, 3283M Wired, 58G Free
          ARC: 1144M Total, 200M MFU, 907M MRU, 32K Anon, 6030K Header, 31M Other
               356M Compressed, 771M Uncompressed, 2.17:1 Ratio
          Swap: 24G Total, 24G Free
          
            PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
          38497 unbound      16  22    0   253M   163M kqread   4  50:33  52.76% unbound
          

          Is there any difference or disadvantage if I would use bind instead of unbound?
          My external DNS servers are all running bind 9 on LXC containers. I could never recognize any problem with it.

          S 1 Reply Last reply Reply Quote 0
          • S
            SteveITS Galactic Empire @getcom
            last edited by

            @getcom Hm, I have had 23.01 running at my home on a 2100, about 18 hours since I changed a setting last night. I don't have a billion devices at home though.

            : sockstat | grep unbound | wc -l
                  12
            
              PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
            25498 unbound       2  20    0   158M   124M kqread   0   7:37   0.25% unbound
            

            DNS Forwarder has a "Query DNS servers sequentially" option, Resolver does not. I found a 2015 post (https://serverfault.com/questions/732920/how-to-do-parallel-queries-to-the-upstream-dns-using-unbound) that unbound queries them all.

            How many DNS servers do you have configured?

            re: BIND, there is a package but as I vaguely recall it's not meant for resolving? Not too familiar with the package (am of course familiar with BIND).

            Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
            When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
            Upvote ๐Ÿ‘ helpful posts!

            getcomG 1 Reply Last reply Reply Quote 0
            • getcomG
              getcom @SteveITS
              last edited by

              @steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

              How many DNS servers do you have configured?

              re: BIND, there is a package but as I vaguely recall it's not meant for resolving? Not too familiar with the package (am of course familiar with BIND).

              Two external DNS servers, one for each WAN connection (=>dual WAN setup) , four for the four site2site VPNs, one for an AD sub domain.
              Related to the 2015 post: unbound did this parallel queries also in the 22.05 release. This means this is/was not the root cause.

              What is strange is the fact that after a few cron unbound restarts, the load is now in a normal range (~0.19). Also the sockstat shows only 400 to 600 connections.
              Puuh...this is something what I don`t like because it is gone before I could find out the root cause...this issue can happen again.

              getcomG 1 Reply Last reply Reply Quote 0
              • getcomG
                getcom @getcom
                last edited by

                It is back again. The load is definitely too high. I could recognize audio problems in Jitsi meetings.
                The system load goes up and down all the time.
                I think I will go back to the previous release until this is fixed.

                pfSense23.01_unbound_top3.png

                getcomG 1 Reply Last reply Reply Quote 0
                • getcomG
                  getcom @getcom
                  last edited by

                  After updating another system to 23.01 with same behavior, I came to the end that the parallel requests are so much more aggressive in 23.01 that the answers are killing the messenger. I saw ~4500+ requests/packets per second shooting up unbound in a way that it was not able to answer anymore. I got lots of timeouts on VMs and clients for some seconds/minutes and all was freezing.
                  @Gertjan I think you are right...I`m a member of the 'DNS/unbound' sucks - club' now ...
                  My conclusion is that until unbound is not able to handle sequential requests, I should give dnsmasq alias DNS-Forwarder a try as mentioned by @steveits.

                  S 1 Reply Last reply Reply Quote 0
                  • S
                    SteveITS Galactic Empire @getcom
                    last edited by

                    @getcom One other idea that has come up in other threads e.g. https://forum.netgate.com/topic/178413/major-dns-bug-23-01-with-quad9-on-ssl/ is to disable DNS over TLS. It doesn't seem to be a problem for everyone but some people say it is necessary when using Unbound in forwarding mode.

                    I don't recall anyone reporting high CPU usage though.

                    Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                    When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                    Upvote ๐Ÿ‘ helpful posts!

                    getcomG 1 Reply Last reply Reply Quote 0
                    • A
                      aduzsardi
                      last edited by aduzsardi

                      rant:
                      there are quite a few problems with pfSense and DNS services , i just given up trying to chase them down, either high cpu usage, flaky dns service, dns not working at all on localhost on the firewall ... it just times out , domain overrides not working and so on

                      now running DNS server on a couple separate boxes, no issues at all
                      pfSense does show it's limits even on netgate hardware (we have a couple of 7100 appliances), they cram a lot of things into this firewall appliance for whatever reason ... i'm guessing if we need professional business level firewall that's what TNSR is for

                      getcomG 2 Replies Last reply Reply Quote 0
                      • getcomG
                        getcom @SteveITS
                        last edited by

                        @steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

                        @getcom One other idea that has come up in other threads e.g. https://forum.netgate.com/topic/178413/major-dns-bug-23-01-with-quad9-on-ssl/ is to disable DNS over TLS. It doesn't seem to be a problem for everyone but some people say it is necessary when using Unbound in forwarding mode.

                        DNS over TLS is/was not activated...

                        1 Reply Last reply Reply Quote 0
                        • getcomG
                          getcom @aduzsardi
                          last edited by

                          This post is deleted!
                          1 Reply Last reply Reply Quote 0
                          • getcomG
                            getcom @aduzsardi
                            last edited by

                            @aduzsardi said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:

                            rant:
                            there are quite a few problems with pfSense and DNS services , i just given up trying to chase them down, either high cpu usage, flaky dns service, dns not working at all on localhost on the firewall ... it just times out , domain overrides not working and so on

                            now running DNS server on a couple separate boxes, no issues at all
                            pfSense does show it's limits even on netgate hardware (we have a couple of 7100 appliances), they cram a lot of things into this firewall appliance for whatever reason ... i'm guessing if we need professional business level firewall that's what TNSR is for

                            Some tcpdumps later: all clear
                            At the moment we have lots of DoS and bruteforce attacks here in Germany. We are delivering tanks and defense systems to Ukraine. Perhaps there is a connection with this? Don`t know, but I never saw something like this before.
                            With the regular VDSL or cable contract you get an asynchronous connection, e.g. 100mbps downsteam/40mbps upstream. I saw up to 6000 DNS requests per second, each of them had only some kB, but the answers were up to four times larger. Additionally with the smaller upstream bandwidth the complete internet connection was unusable.
                            The smaller Netgate SG boxes ran hot in this situation...
                            Since I activated geoip blocking for all countries except Germany and USA and added ~150 badbot blacklists into pfblockerNG there is a silence now...

                            E 1 Reply Last reply Reply Quote 1
                            • E
                              Enhance2736 @getcom
                              last edited by

                              @getcom dang man! i feel for you. keep up the good work and keep those ruzzkies out !!!

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.