Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbound seems to be restarting frequently

    Scheduled Pinned Locked Moved DHCP and DNS
    178 Posts 43 Posters 96.4k Views 11 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dennypageD Offline
      dennypage
      last edited by

      Differences between your and my configuration are the DHCP parameters and the prefetch.

      I will set prefetch aside as I doubt it is relevant to the issue at hand.

      Static DHCP registration would only have affect at initial startup or static table edit, so I think we can set that aside as well.

      Dynamic DHCP address registration would appear to be the key issue. Based on the logs you posted, you have essentially no dynamic DHCP activity in your network. None between March 29 and April 15. The people experiencing the problem have lots of dynamic DHCP activity.

      I don't think it's your settings that make you immune, I think it's your environment.

      As an experiment, I turned off DHCP registration last night. I have not had a issue since. This isn't how I want to run, but I am more interested in general DNS reliability than being able to resolve dynamic client names.

      @kejianshi:

      I can tell you my settings…

      Enabled:

      DNSSEC

      DHCP Registration

      Static DHCP

      In advanced Settings besides what came enabled, I also Enabled:
      Hide Identity
      Hide Version
      Prefetch Support
      Prefetch DNS Key Support
      Harden DNSSEC data
      Unwanted Reply Threshold 10 million

      1 Reply Last reply Reply Quote 0
      • K Offline
        kejianshi
        last edited by

        I am using static leases in DHCP for everything that is normally present (minus wireless for android devices and visitors etc)

        I wouldn't want to register every DHCP client in DNS if I was hosting open free wifi to 100s of random users though.

        I just can't imagine the need.  Why?

        1 Reply Last reply Reply Quote 0
        • W Offline
          waingro
          last edited by

          This bug is incredibly annoying, as resolves fails for quite a few minutes after restart.

          Is there a ticket on this, or is it being worked on already?

          1 Reply Last reply Reply Quote 0
          • C Offline
            ConfusedUser
            last edited by

            Like I wrote a bit earlier for me it is important that the cache works and that I get fast DNS response time for the clients so I'd also like to use the prefetch feature. Since both doesn't work when the service is constantly restarted (cache hit rate around 5% and basically no prefetches) I really wanted to fix this one and I started digging into the code.

            - Disclaimer: I am NO programmer so everything I'm writing here could be wrong! -

            For me the whole unbound code looks very messy. Example:
            In many files a function "services_unbound_configure()" is called when starting or re-starting unbound and also in various other scenarios. This function is defined in the services.inc.
            The first and worst thing that always happens when calling this function:
            // kill any running Unbound instance
            if (file_exists("{$g['varrun_path']}/unbound.pid"))
            sigkillbypid("{$g['varrun_path']}/unbound.pid", "TERM");

            That means: Cache is lost. Each time this function is called.
            Unbound got a few nice features using unbound-control, e.g. killing the process is not necessary in many cases since unbound-control reload could help often. Also the cache could be stored and recovered afterwards. But I can't see this happening here.
            So after killing unbound an doing a few other things the function calls sync_unbound_service().
            This function is defined in unbound.inc and does various things:
            function sync_unbound_service() {
            global $config, $g;
            create_unbound_chroot_path();
            // Configure our Unbound service
            do_as_unbound_user("unbound-anchor");
            unbound_remote_control_setup();
            unbound_generate_config();
            do_as_unbound_user("start");
            require_once("service-utils.inc");
            if (is_service_running("unbound"))
            do_as_unbound_user("restore_cache");
            }

            So there would be an option to recover the cache - but - that works only if it is stored before killing unbound.
            Also the function call do_as_unbound_user("restore_cache") wouldn't work because it would end up in a break since "restore_cache" is not defined for this function.
            function do_as_unbound_user($cmd) {
            global $g;
            switch ($cmd) {
            case "start":
            mwexec("/usr/local/sbin/unbound -c {$g['unbound_chroot_path']}/unbound.conf");
            break;
            case "stop":
            mwexec("echo '/usr/local/sbin/unbound-control stop' | /usr/bin/su -m unbound", true);
            break;
            case "reload":
            mwexec("echo '/usr/local/sbin/unbound-control reload' | /usr/bin/su -m unbound", true);
            break;
            case "unbound-anchor":
            mwexec("echo '/usr/local/sbin/unbound-anchor -a {$g['unbound_chroot_path']}/root.key' | /usr/bin/su -m unbound", true);
            break;
            case "unbound-control-setup":
            mwexec("echo '/usr/local/sbin/unbound-control-setup -d {$g['unbound_chroot_path']}' | /usr/bin/su -m unbound", true);
            break;
            default:
            break;
            }
            }

            Anyway, after seeing this and many other areas that look VERY strange to me I gave up. Again, it might be me, I'm no programmer, I might be misinterpreting a lot here but it looks like the whole unbound implementation requires cleanup from an experienced programmer.

            So I implemented a workaround which works for me and it might work for others as well.

            • I'm registering the client DHCP addresses at a Windows DNS server.
            • I'm using only static WAN IPs
            • I'm not using any DDNS

            So in other words: I can't see any reason why unbound should ever be restarted on my box (probably unless I do a config change).
            What I did is I removed last night the ability to restart unbound. That means for me if I'm doing config changes I either need to manually execute "unbound-control reload" or I need to stop and start the service.

            These are the changes I did last night in services.inc (changes are in bold red letters):
            function services_unbound_configure() {
            global $config, $g;
            $return = 0;
            if (isset($config['system']['developerspew'])) {
            $mt = microtime();
            echo "services_unbound_configure() being called $mt\n";
            }
            // kill any running Unbound instance
            //if (file_exists("{$g['varrun_path']}/unbound.pid"))
            //sigkillbypid("{$g['varrun_path']}/unbound.pid", "TERM");
            require_once("service-utils.inc");
            if (!(get_service_status(find_service_by_name('unbound')))) {

            if (isset($config['unbound']['enable'])) {
            if (platform_booting())
            echo gettext("Starting DNS Resolver…");
            else
            sleep(1);
            /* generate hosts file */
            if(system_hosts_generate()!=0)
            $return = 1;
            require_once('/etc/inc/unbound.inc');
            sync_unbound_service();
            if (platform_booting())
            echo gettext("done.") . "\n";
            system_dhcpleases_configure();
            }
            if (!platform_booting()) {
            if (services_dhcpd_configure()!=0)
            $return = 1;
            }
            } return $return;
            }

            That means for me since last night:
            Prefetches from around 0 to around 5%
            Cache hit rate from around 5% to around 30%

            Hope that helps others as well who are suffering from the constant unbound restarts…

            1 Reply Last reply Reply Quote 0
            • D Offline
              doktornotor Banned
              last edited by

              @ConfusedUser:

              In many files a function "services_unbound_configure()" is called when starting or re-starting unbound and also in various other scenarios. This function is defined in the services.inc.
              The first and worst thing that always happens when calling this function:
              // kill any running Unbound instance
              if (file_exists("{$g['varrun_path']}/unbound.pid"))
              sigkillbypid("{$g['varrun_path']}/unbound.pid", "TERM");

              That means: Cache is lost. Each time this function is called.

              The cache is lost no matter what, even on the "graceful" unbound-control reload (per unbound docs). It would flush it anyway.

              man unbound-control

              reload Reload the server. This flushes the cache and reads the config file fresh.

              I cannot see any solution here to losing the cache beyond crude hacks (restore the cache every time…) until the retarded design is fixed upstream.

              1 Reply Last reply Reply Quote 0
              • C Offline
                ConfusedUser
                last edited by

                @doktornotor:

                The cache is lost no matter what, even on the "graceful" unbound-control reload (per unbound docs). It would flush it anyway. I cannot see any solution here to losing the cache beyond crude hacks (restore the cache every time…) until the retarded design is fixed upstream.

                Completely correct, that's why I wrote

                Also the cache could be stored and recovered afterwards. But I can't see this happening here.

                Regarding the "reload" I see this simply as a more elegant way than killing a process. That's why I mentioned it.

                1 Reply Last reply Reply Quote 0
                • D Offline
                  doktornotor Banned
                  last edited by

                  Did you try this instead?

                  
                   if (file_exists("{$g['varrun_path']}/unbound.pid"))
                        sigkillbypid("{$g['varrun_path']}/unbound.pid", "HUP");
                  
                  

                  This would need to be moved just before the final

                  
                  return $return;
                  
                  

                  obviously - I don't get the "logic" behind killing the service first and then doing some reconfiguration.  :o ::)

                  1 Reply Last reply Reply Quote 0
                  • C Offline
                    ConfusedUser
                    last edited by

                    @doktornotor:

                    Did you try this instead?

                    
                     if (file_exists("{$g['varrun_path']}/unbound.pid"))
                          sigkillbypid("{$g['varrun_path']}/unbound.pid", "HUP");
                    
                    

                    This would need to be moved just before the final

                    
                    return $return;
                    
                    

                    obviously - I don't get the "logic" behind killing the service first and then doing some reconfiguration.  :o ::)

                    I haven't tried because I don't want/need the process to be killed or restarted at all.
                    Since this function is called to start and to restart the process I'm checking if the process is running with this line "if (!(get_service_status(find_service_by_name('unbound'))))" and if it is already running I simply do nothing.

                    In my opinion the first thing that should happen is to split the start and the restart function and also to add a check box "keep cache when restarting" to the GUI which causes an "unbound-control dump_cache" and after restarting it runs an "unbound-control load_cache".
                    But changing the scripts to accomplish that goes far beyond my skills.
                    Also the fact that things are called currently that don't exist means for me someone really needs to look into the whole implementation.

                    1 Reply Last reply Reply Quote 0
                    • D Offline
                      doktornotor Banned
                      last edited by

                      @ConfusedUser:

                      I haven't tried because I don't want/need the process to be killed or restarted at all.
                      Since this function is called to start and to restart the process I'm checking if the process is running with this line "if (!(get_service_status(find_service_by_name('unbound'))))" and if it is already running I simply do nothing.

                      Well, this hack obviously is not acceptable, since it breaks things like the DHCP registration… So, we should move to testing something that might get accepted instead. :P

                      1 Reply Last reply Reply Quote 0
                      • C Offline
                        ConfusedUser
                        last edited by

                        @doktornotor:

                        Well, this hack obviously is not acceptable, since it breaks things like the DHCP registration… So, we should move to testing something that might get accepted instead. :P

                        Sure… This is really just a workaround and for me it works like a charm. Obviously this can't go into release this way.

                        I was posting this because it might help others, too, who don't use dynamic WAN IPs and DHCP registration in unbound.

                        What really needs to be happen would be to fix AND clean up unbound.inc and services.inc.
                        I would love to help (obviously I'm not too lazy, the investigation and the workaround took me a few hours) but my skills are simply not good enough to provide a "proper" solution even though it doesn't look too complicated to me so I might give it a try...

                        For now (and for anyone who wants to use it): The workaround works perfect for me. But don't blame me if it kills your box.  :)

                        1 Reply Last reply Reply Quote 0
                        • D Offline
                          doktornotor Banned
                          last edited by

                          Actually all the code to dump/restore cache is there. Just the checkbox to enable the dumpcache var went AWOL. This was there with the 2.1 package.

                          https://github.com/pfsense/pfsense-packages/blob/master/config/unbound/unbound.xml#L179

                          EDIT: Filed a bug about the missing cache save/restore - https://redmine.pfsense.org/issues/4667

                          1 Reply Last reply Reply Quote 0
                          • GertjanG Online
                            Gertjan
                            last edited by

                            I saw the code in unbound.inc, that handles the dump_cache and load_cache.
                            It might be the solution.
                            So I activated the 'dump_cache' facilities, so that before a 'reload', the cache was dumped.

                            Have a look at this:

                            function unbound_control($action) {
                            	global $config, $g;
                            
                            	$cache_dumpfile = "/var/tmp/unbound_cache";
                            
                            	switch ($action) {
                            	case "start":
                            .....
                            	case "stop":
                            .....
                            	case "reload":
                            .....
                            	case "dump_cache":
                            		// Dump Unbound's Cache
                            		if ($config['unbound']['dumpcache'] == "on")
                            			do_as_unbound_user("dump_cache");
                            		break;
                            	case "restore_cache":
                            		// Restore Unbound's Cache
                            		if ((is_service_running("unbound")) && ($config['unbound']['dumpcache'] == "on")) {
                            			if (file_exists($cache_dumpfile) && filesize($cache_dumpfile) > 0)
                            				do_as_unbound_user("load_cache < /var/tmp/unbound_cache");
                            		}
                            		break;
                            	default:
                            		break;
                            
                            	}
                            }
                            
                            

                            Be carefully : $config['unbound']['dumpcache'] isn't defined neither declared in the GUI.
                            And doing something like this: do_as_unbound_user("load_cache < /var/tmp/unbound_cache");
                            will not work. The function do_as_unbound_user() works not that way.
                            Anyway, I had my unbound cache being dumped before restart.

                            And then I started to think … after reading this:

                                   load_cache
                            	      The contents of the cache	is loaded from stdin.  Uses  the  same
                            	      format as	dump_cache uses.  Loading the cache with old, or wrong
                            	      data can result in old or	wrong data returned to clients.	 Load-
                            	      ing data into the	cache in this way is supported in order	to aid
                            	      with debugging.
                            

                            Consider the situation: unbound starts, and read all the files its need, like /var/etc/hosts, the DHCP leases file, etc.
                            Then we instruct it to load the cache file, /var/tmp/unbound_cache
                            It doesn't take long to discover that the internal working cache (with the new local info) in unbound is being replaced by what has been written in /var/tmp/unbound_cache.
                            F*ck.
                            Doing so makes it completely useless to restart unbound to begin with …...  >:(
                            As said in the doc: dump_cache and load_cache exists for debugging purposes.

                            Leaves me with one idea: ditch unbound, even if it is the most secured name server, and make friends with bind...... (one of the most used name server on the planet !?!), which, I'm pretty sure, doesn't 'work' like this.

                            edit: hopping through the unbound source code make me think: it 'reloads' when reading in a cache file ….  :o
                            Well .....

                            No "help me" PM's please. Use the forum, the community will thank you.
                            Edit : and where are the logs ??

                            S 1 Reply Last reply Reply Quote 0
                            • D Offline
                              doktornotor Banned
                              last edited by

                              Yeah I had a look at the source code and I have hard time understanding the design. Also, the "reload" which is something presumed to be graceful in the rest of the world is pretty much equal to restart and disrupts the service. Unless this gets fixed quickly upstream, I frankly would suggest removing the DHCP registration features from the DNS Resolver GUI altogether. With this broken design, it is not usable.

                              1 Reply Last reply Reply Quote 0
                              • K Offline
                                kejianshi
                                last edited by

                                It seems to work ok IF you are using static DHCP entries for the clients, like I do.

                                Of course this isn't possible if you are providing DHCP for a mall full of wireless clients.

                                But then again, why the hell would you want a mall full of wireless clients registered in DNS?

                                So maybe register the static entries but not the dynamic entries.

                                1 Reply Last reply Reply Quote 0
                                • C Offline
                                  ConfusedUser
                                  last edited by

                                  I saw host entries can be removed with unbound-control at runtime w/o restarting the service. But I can't see an option to add a host using unbound-control. If that is possible this could eliminate the issues when using DHCP.

                                  I'm registering ALL my leases at a Windows Server DNS Service (since there is an AD running on it). So even DHCP is not updating the local DNS it still keeps restarting it.  So I'm using unbound only for Internet DNS loolup really but everything locally is done by Microsoft DNS service. So definitely no need for restarting anything since hosts file will never change.

                                  Well, "…keeps restarting..." is wrong in fact, "...kept restarting..." is correct in my case since I removed the ability to restart unbound from the code a couple of days ago (which I posted here). Even it's a very dirty workaround it works perfect so far without any adverse effects for my usage. Couldn't be happier that I've done that.

                                  1 Reply Last reply Reply Quote 0
                                  • K Offline
                                    kevindd992002
                                    last edited by

                                    Is there any progress to this issue?

                                    1 Reply Last reply Reply Quote 0
                                    • L Offline
                                      Liath.WW
                                      last edited by

                                      yeah hoping myself that this is fixed, and soon. I don't have any dns mappings done in DHCP (do have some statics), but I keep gettign the restart issue every flippin time a phone is turned on and connects. It's getting old fast.

                                      1 Reply Last reply Reply Quote 0
                                      • C Offline
                                        CiscoKid85
                                        last edited by

                                        In its current implementation, unbound is unusable for me. The restarts are causing the cache to clear so I've had to use forwarder to make everything work.

                                        1 Reply Last reply Reply Quote 0
                                        • K Offline
                                          kevindd992002
                                          last edited by

                                          It's clearly happening to very many users but is it already acknowledged as a real "issue"? No pun intended.

                                          1 Reply Last reply Reply Quote 0
                                          • D Offline
                                            doktornotor Banned
                                            last edited by

                                            @kevindd992002:

                                            It's clearly happening to very many users but is it already acknowledged as a real "issue"? No pun intended.

                                            Perhaps take this to upstream mailing list? As discussed many times above, there's no graceful reload anywhere, the code is braindead.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.