Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbound seems to be restarting frequently

    Scheduled Pinned Locked Moved DHCP and DNS
    178 Posts 43 Posters 96.4k Views 11 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K Offline
      kejianshi
      last edited by

      Here is a clip from my logs:

      Apr 15 20:02:42 unbound: [53580:0] info: start of service (unbound 1.5.3).
      Apr 15 20:02:42 unbound: [53580:0] notice: init module 1: iterator
      Apr 15 20:02:42 unbound: [53580:0] notice: init module 0: validator
      Apr 15 20:02:42 unbound: [53580:0] notice: Restart of unbound 1.5.3.
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: requestlist max 6 avg 3.88889 exceeded 0 jostled 0
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: 9 queries, 0 answers from cache, 9 recursions, 0 prefetch
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: requestlist max 21 avg 11.6667 exceeded 0 jostled 0
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: 39 queries, 0 answers from cache, 39 recursions, 0 prefetch
      Apr 15 20:02:42 unbound: [53580:0] info: service stopped (unbound 1.5.3).
      Apr 15 20:02:42 unbound: [53580:0] info: start of service (unbound 1.5.3).
      Apr 15 20:02:42 unbound: [53580:0] notice: init module 1: iterator
      Apr 15 20:02:42 unbound: [53580:0] notice: init module 0: validator
      Apr 15 20:02:42 unbound: [53580:0] notice: Restart of unbound 1.5.3.
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: requestlist max 2 avg 1.33333 exceeded 0 jostled 0
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: 3 queries, 0 answers from cache, 3 recursions, 0 prefetch
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: 1 queries, 0 answers from cache, 1 recursions, 0 prefetch
      Apr 15 20:02:42 unbound: [53580:0] info: service stopped (unbound 1.5.3).
      Apr 15 20:02:42 unbound: [53580:0] info: start of service (unbound 1.5.3).
      Apr 15 20:02:42 unbound: [53580:0] notice: init module 1: iterator
      Apr 15 20:02:42 unbound: [53580:0] notice: init module 0: validator
      Apr 15 20:02:42 unbound: [53580:0] notice: Restart of unbound 1.5.3.
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
      Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch
      Apr 15 20:02:42 unbound: [53580:0] info: service stopped (unbound 1.5.3).
      Apr 15 20:02:42 unbound: [53580:0] info: start of service (unbound 1.5.3).
      Apr 15 20:02:42 unbound: [53580:0] notice: init module 1: iterator
      Apr 15 20:02:42 unbound: [53580:0] notice: init module 0: validator
      Mar 29 23:23:26 unbound: [88947:0] info: start of service (unbound 1.5.3).
      Mar 29 23:23:26 unbound: [88947:0] notice: init module 1: iterator
      Mar 29 23:23:26 unbound: [88947:0] notice: init module 0: validator
      Mar 29 23:23:26 unbound: [88947:0] notice: Restart of unbound 1.5.3.
      Mar 29 23:23:26 unbound: [88947:0] info: 2.000000 4.000000 1
      Mar 29 23:23:26 unbound: [88947:0] info: 1.000000 2.000000 1
      Mar 29 23:23:26 unbound: [88947:0] info: 0.524288 1.000000 8
      Mar 29 23:23:26 unbound: [88947:0] info: 0.262144 0.524288 50
      Mar 29 23:23:26 unbound: [88947:0] info: 0.131072 0.262144 91

      So, to me, it looks like its only really restarting when its supposed to to deal with attacks against it, badly formed DNS entries etc etc. 
      It went 15 days without a restart from Mar 29 - April 15 and whats today?  23rd?  so its been 8 days since last restart of unbound?

      Doesn't seem too broken to me….

      However, I really do think setting matter.  Alot.

      1 Reply Last reply Reply Quote 0
      • dennypageD Offline
        dennypage
        last edited by

        I'm guessing you have "Register DHCP leases in the DNS Resolver" disabled, or you don't have DHCP leases with dynamic address.

        I've just turned this off in my configuration, and waiting/watching to see if it resolves my issues.

        1 Reply Last reply Reply Quote 0
        • K Offline
          kejianshi
          last edited by

          Nope - I have those things enabled.  Both of them.

          But I don't have IP conflicts and nor am I connected to one machine with both wire and wireless.

          And the general settings matter and so does the advanced setting in unbound.

          Without looking at people's pfsense with my own eyes and going through all the setting its hard to know what is causing some people problems.

          1 Reply Last reply Reply Quote 0
          • dennypageD Offline
            dennypage
            last edited by

            @kejianshi:

            Nope - I have those things enabled.  Both of them.

            Hmm… only one of these things is a setting (in the General tab) that can be enabled. The other is a dynamic artifact of the environment and DHCP mapping.

            But I don't have IP conflicts and nor am I connected to one machine with both wire and wireless.

            I also do not have this.

            And the general settings matter and so does the advanced setting in unbound.

            Just for clarity, can you explain what you are referring to in the settings? In particular, is there anything other than "Enable DNSSEC Support" and "Register DHCP leases in the DNS Resolver" (mentioned above) that you are referring to?

            Thanks

            1 Reply Last reply Reply Quote 0
            • K Offline
              kejianshi
              last edited by

              I can tell you my settings…

              Enabled:

              DNSSEC

              DHCP Registration

              Static DHCP

              In advanced Settings besides what came enabled, I also Enabled:
              Hide Identity
              Hide Version
              Prefetch Support
              Prefetch DNS Key Support
              Harden DNSSEC data
              Unwanted Reply Threshold 10 million

              I have no DNS IPs in the general settings of pfsense.  None.

              1 Reply Last reply Reply Quote 0
              • W Offline
                waingro
                last edited by

                @kevindd992002:

                But even though it restarts, it doesn't lose any "resolving" capability, right?

                Depends. Often it takes about 30 seconds after a restart before it starts resolving, long enough to be annoying since it happens all the time on some routers.

                1 Reply Last reply Reply Quote 0
                • W Offline
                  waingro
                  last edited by

                  @waingro:

                  @kevindd992002:

                  But even though it restarts, it doesn't lose any "resolving" capability, right?

                  Depends. Often it takes about 30 seconds after a restart before it starts resolving, long enough to be annoying since it happens all the time on some routers.

                  Interestingly, I have two exact copies of pfSense now, but different hardware. unbound bugs out on one of them (config is 100% equal, I differed the conf)

                  Works: atom, 2gb, 1 intel pro nics
                  Fails: i5 4 cores, 8gb, intel pro nics

                  It's not the nice, because they're physically the same, I switch between tests.

                  Adding more memory shouldn't be a problem either.

                  So is it the fact that it's 4-cores? I guess more things can go wrong if there's issue with synchronization code, but that's just speculation.

                  At any rate, an interesting result. Note that the only problem on the 4-core is unbound, everything else works great in pfSense.

                  wtf

                  1 Reply Last reply Reply Quote 0
                  • C Offline
                    ConfusedUser
                    last edited by

                    Another one here:
                    I got 2.2.2 running on different hardware (Dell Server, "normal" PC, etc.). All with the same issue that unbound keeps restarting. Single WAN and multi WAN configs. On all of them there are issues with unbound restarting

                    I'm using unbound for three main reasons:
                    1. DNSsec
                    2. Prefetch
                    3. Recursive queries

                    The bad thing is that the constant restarts (probably 10-20 per day)  basically make the cache and the Prefetch and also the cache useless and it is a big hit on overall DNS performance.
                    I was reading the whole thread and also the linked ones and I'll try a few things now but I really want to me to add myself to the list of users so the devs are aware: There is a problem.  :'(

                    Btw, on a big test network I got the unbound service running on a Windows 7 box. I'm using the same config as in pfSense. Guess what… It's running completely stable. No restarts or whatsoever.

                    1 Reply Last reply Reply Quote 0
                    • GertjanG Online
                      Gertjan
                      last edited by

                      @ConfusedUser:

                      Btw, on a big test network I got the unbound service running on a Windows 7 box. I'm using the same config as in pfSense. Guess what… It's running completely stable. No restarts or whatsoever.

                      This Windows box is also the DHCP server ?
                      Without DHCP serving, unbound won't restart on pfSEnse.

                      No "help me" PM's please. Use the forum, the community will thank you.
                      Edit : and where are the logs ??

                      1 Reply Last reply Reply Quote 0
                      • C Offline
                        ConfusedUser
                        last edited by

                        @Gertjan:

                        @ConfusedUser:

                        Btw, on a big test network I got the unbound service running on a Windows 7 box. I'm using the same config as in pfSense. Guess what… It's running completely stable. No restarts or whatsoever.

                        This Windows box is also the DHCP server ?
                        Without DHCP serving, unbound won't restart on pfSEnse.

                        No, it's no DHCP server and even if it was I doubt that would make a difference. There SHOULD be no reason for a DHCP server to be able to restart a DNS service (unless there is something in the code that forces/allows it which I expect is the issue with pfSense).
                        So even if I added a DHCP server to the Windows 7 machine I'm nearly certain it wouldn't make a difference.

                        1 Reply Last reply Reply Quote 0
                        • GertjanG Online
                          Gertjan
                          last edited by

                          @ConfusedUser:

                          (unless there is something in the code that forces/allows it which I expect is the issue with pfSense).

                          And there is.
                          Read my posts above, or look at the (PHP) code: when the system hosts file changed (because of a new entry, put in by the DHCP server) then unbound is reloaded …. (it shows in the logs : stopped -> started).

                          Running unbound on pfSEnse without DHCP server (so the system hosts file is nearly static), unbound will NOT get reloaded ....

                          No "help me" PM's please. Use the forum, the community will thank you.
                          Edit : and where are the logs ??

                          1 Reply Last reply Reply Quote 0
                          • dennypageD Offline
                            dennypage
                            last edited by

                            Differences between your and my configuration are the DHCP parameters and the prefetch.

                            I will set prefetch aside as I doubt it is relevant to the issue at hand.

                            Static DHCP registration would only have affect at initial startup or static table edit, so I think we can set that aside as well.

                            Dynamic DHCP address registration would appear to be the key issue. Based on the logs you posted, you have essentially no dynamic DHCP activity in your network. None between March 29 and April 15. The people experiencing the problem have lots of dynamic DHCP activity.

                            I don't think it's your settings that make you immune, I think it's your environment.

                            As an experiment, I turned off DHCP registration last night. I have not had a issue since. This isn't how I want to run, but I am more interested in general DNS reliability than being able to resolve dynamic client names.

                            @kejianshi:

                            I can tell you my settings…

                            Enabled:

                            DNSSEC

                            DHCP Registration

                            Static DHCP

                            In advanced Settings besides what came enabled, I also Enabled:
                            Hide Identity
                            Hide Version
                            Prefetch Support
                            Prefetch DNS Key Support
                            Harden DNSSEC data
                            Unwanted Reply Threshold 10 million

                            1 Reply Last reply Reply Quote 0
                            • K Offline
                              kejianshi
                              last edited by

                              I am using static leases in DHCP for everything that is normally present (minus wireless for android devices and visitors etc)

                              I wouldn't want to register every DHCP client in DNS if I was hosting open free wifi to 100s of random users though.

                              I just can't imagine the need.  Why?

                              1 Reply Last reply Reply Quote 0
                              • W Offline
                                waingro
                                last edited by

                                This bug is incredibly annoying, as resolves fails for quite a few minutes after restart.

                                Is there a ticket on this, or is it being worked on already?

                                1 Reply Last reply Reply Quote 0
                                • C Offline
                                  ConfusedUser
                                  last edited by

                                  Like I wrote a bit earlier for me it is important that the cache works and that I get fast DNS response time for the clients so I'd also like to use the prefetch feature. Since both doesn't work when the service is constantly restarted (cache hit rate around 5% and basically no prefetches) I really wanted to fix this one and I started digging into the code.

                                  - Disclaimer: I am NO programmer so everything I'm writing here could be wrong! -

                                  For me the whole unbound code looks very messy. Example:
                                  In many files a function "services_unbound_configure()" is called when starting or re-starting unbound and also in various other scenarios. This function is defined in the services.inc.
                                  The first and worst thing that always happens when calling this function:
                                  // kill any running Unbound instance
                                  if (file_exists("{$g['varrun_path']}/unbound.pid"))
                                  sigkillbypid("{$g['varrun_path']}/unbound.pid", "TERM");

                                  That means: Cache is lost. Each time this function is called.
                                  Unbound got a few nice features using unbound-control, e.g. killing the process is not necessary in many cases since unbound-control reload could help often. Also the cache could be stored and recovered afterwards. But I can't see this happening here.
                                  So after killing unbound an doing a few other things the function calls sync_unbound_service().
                                  This function is defined in unbound.inc and does various things:
                                  function sync_unbound_service() {
                                  global $config, $g;
                                  create_unbound_chroot_path();
                                  // Configure our Unbound service
                                  do_as_unbound_user("unbound-anchor");
                                  unbound_remote_control_setup();
                                  unbound_generate_config();
                                  do_as_unbound_user("start");
                                  require_once("service-utils.inc");
                                  if (is_service_running("unbound"))
                                  do_as_unbound_user("restore_cache");
                                  }

                                  So there would be an option to recover the cache - but - that works only if it is stored before killing unbound.
                                  Also the function call do_as_unbound_user("restore_cache") wouldn't work because it would end up in a break since "restore_cache" is not defined for this function.
                                  function do_as_unbound_user($cmd) {
                                  global $g;
                                  switch ($cmd) {
                                  case "start":
                                  mwexec("/usr/local/sbin/unbound -c {$g['unbound_chroot_path']}/unbound.conf");
                                  break;
                                  case "stop":
                                  mwexec("echo '/usr/local/sbin/unbound-control stop' | /usr/bin/su -m unbound", true);
                                  break;
                                  case "reload":
                                  mwexec("echo '/usr/local/sbin/unbound-control reload' | /usr/bin/su -m unbound", true);
                                  break;
                                  case "unbound-anchor":
                                  mwexec("echo '/usr/local/sbin/unbound-anchor -a {$g['unbound_chroot_path']}/root.key' | /usr/bin/su -m unbound", true);
                                  break;
                                  case "unbound-control-setup":
                                  mwexec("echo '/usr/local/sbin/unbound-control-setup -d {$g['unbound_chroot_path']}' | /usr/bin/su -m unbound", true);
                                  break;
                                  default:
                                  break;
                                  }
                                  }

                                  Anyway, after seeing this and many other areas that look VERY strange to me I gave up. Again, it might be me, I'm no programmer, I might be misinterpreting a lot here but it looks like the whole unbound implementation requires cleanup from an experienced programmer.

                                  So I implemented a workaround which works for me and it might work for others as well.

                                  • I'm registering the client DHCP addresses at a Windows DNS server.
                                  • I'm using only static WAN IPs
                                  • I'm not using any DDNS

                                  So in other words: I can't see any reason why unbound should ever be restarted on my box (probably unless I do a config change).
                                  What I did is I removed last night the ability to restart unbound. That means for me if I'm doing config changes I either need to manually execute "unbound-control reload" or I need to stop and start the service.

                                  These are the changes I did last night in services.inc (changes are in bold red letters):
                                  function services_unbound_configure() {
                                  global $config, $g;
                                  $return = 0;
                                  if (isset($config['system']['developerspew'])) {
                                  $mt = microtime();
                                  echo "services_unbound_configure() being called $mt\n";
                                  }
                                  // kill any running Unbound instance
                                  //if (file_exists("{$g['varrun_path']}/unbound.pid"))
                                  //sigkillbypid("{$g['varrun_path']}/unbound.pid", "TERM");
                                  require_once("service-utils.inc");
                                  if (!(get_service_status(find_service_by_name('unbound')))) {

                                  if (isset($config['unbound']['enable'])) {
                                  if (platform_booting())
                                  echo gettext("Starting DNS Resolver…");
                                  else
                                  sleep(1);
                                  /* generate hosts file */
                                  if(system_hosts_generate()!=0)
                                  $return = 1;
                                  require_once('/etc/inc/unbound.inc');
                                  sync_unbound_service();
                                  if (platform_booting())
                                  echo gettext("done.") . "\n";
                                  system_dhcpleases_configure();
                                  }
                                  if (!platform_booting()) {
                                  if (services_dhcpd_configure()!=0)
                                  $return = 1;
                                  }
                                  } return $return;
                                  }

                                  That means for me since last night:
                                  Prefetches from around 0 to around 5%
                                  Cache hit rate from around 5% to around 30%

                                  Hope that helps others as well who are suffering from the constant unbound restarts…

                                  1 Reply Last reply Reply Quote 0
                                  • D Offline
                                    doktornotor Banned
                                    last edited by

                                    @ConfusedUser:

                                    In many files a function "services_unbound_configure()" is called when starting or re-starting unbound and also in various other scenarios. This function is defined in the services.inc.
                                    The first and worst thing that always happens when calling this function:
                                    // kill any running Unbound instance
                                    if (file_exists("{$g['varrun_path']}/unbound.pid"))
                                    sigkillbypid("{$g['varrun_path']}/unbound.pid", "TERM");

                                    That means: Cache is lost. Each time this function is called.

                                    The cache is lost no matter what, even on the "graceful" unbound-control reload (per unbound docs). It would flush it anyway.

                                    man unbound-control

                                    reload Reload the server. This flushes the cache and reads the config file fresh.

                                    I cannot see any solution here to losing the cache beyond crude hacks (restore the cache every time…) until the retarded design is fixed upstream.

                                    1 Reply Last reply Reply Quote 0
                                    • C Offline
                                      ConfusedUser
                                      last edited by

                                      @doktornotor:

                                      The cache is lost no matter what, even on the "graceful" unbound-control reload (per unbound docs). It would flush it anyway. I cannot see any solution here to losing the cache beyond crude hacks (restore the cache every time…) until the retarded design is fixed upstream.

                                      Completely correct, that's why I wrote

                                      Also the cache could be stored and recovered afterwards. But I can't see this happening here.

                                      Regarding the "reload" I see this simply as a more elegant way than killing a process. That's why I mentioned it.

                                      1 Reply Last reply Reply Quote 0
                                      • D Offline
                                        doktornotor Banned
                                        last edited by

                                        Did you try this instead?

                                        
                                         if (file_exists("{$g['varrun_path']}/unbound.pid"))
                                              sigkillbypid("{$g['varrun_path']}/unbound.pid", "HUP");
                                        
                                        

                                        This would need to be moved just before the final

                                        
                                        return $return;
                                        
                                        

                                        obviously - I don't get the "logic" behind killing the service first and then doing some reconfiguration.  :o ::)

                                        1 Reply Last reply Reply Quote 0
                                        • C Offline
                                          ConfusedUser
                                          last edited by

                                          @doktornotor:

                                          Did you try this instead?

                                          
                                           if (file_exists("{$g['varrun_path']}/unbound.pid"))
                                                sigkillbypid("{$g['varrun_path']}/unbound.pid", "HUP");
                                          
                                          

                                          This would need to be moved just before the final

                                          
                                          return $return;
                                          
                                          

                                          obviously - I don't get the "logic" behind killing the service first and then doing some reconfiguration.  :o ::)

                                          I haven't tried because I don't want/need the process to be killed or restarted at all.
                                          Since this function is called to start and to restart the process I'm checking if the process is running with this line "if (!(get_service_status(find_service_by_name('unbound'))))" and if it is already running I simply do nothing.

                                          In my opinion the first thing that should happen is to split the start and the restart function and also to add a check box "keep cache when restarting" to the GUI which causes an "unbound-control dump_cache" and after restarting it runs an "unbound-control load_cache".
                                          But changing the scripts to accomplish that goes far beyond my skills.
                                          Also the fact that things are called currently that don't exist means for me someone really needs to look into the whole implementation.

                                          1 Reply Last reply Reply Quote 0
                                          • D Offline
                                            doktornotor Banned
                                            last edited by

                                            @ConfusedUser:

                                            I haven't tried because I don't want/need the process to be killed or restarted at all.
                                            Since this function is called to start and to restart the process I'm checking if the process is running with this line "if (!(get_service_status(find_service_by_name('unbound'))))" and if it is already running I simply do nothing.

                                            Well, this hack obviously is not acceptable, since it breaks things like the DHCP registration… So, we should move to testing something that might get accepted instead. :P

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.