Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbound seems to be restarting frequently

    Scheduled Pinned Locked Moved DHCP and DNS
    178 Posts 43 Posters 96.4k Views 11 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C Offline
      ConfusedUser
      last edited by

      @doktornotor:

      Did you try this instead?

      
       if (file_exists("{$g['varrun_path']}/unbound.pid"))
            sigkillbypid("{$g['varrun_path']}/unbound.pid", "HUP");
      
      

      This would need to be moved just before the final

      
      return $return;
      
      

      obviously - I don't get the "logic" behind killing the service first and then doing some reconfiguration.  :o ::)

      I haven't tried because I don't want/need the process to be killed or restarted at all.
      Since this function is called to start and to restart the process I'm checking if the process is running with this line "if (!(get_service_status(find_service_by_name('unbound'))))" and if it is already running I simply do nothing.

      In my opinion the first thing that should happen is to split the start and the restart function and also to add a check box "keep cache when restarting" to the GUI which causes an "unbound-control dump_cache" and after restarting it runs an "unbound-control load_cache".
      But changing the scripts to accomplish that goes far beyond my skills.
      Also the fact that things are called currently that don't exist means for me someone really needs to look into the whole implementation.

      1 Reply Last reply Reply Quote 0
      • D Offline
        doktornotor Banned
        last edited by

        @ConfusedUser:

        I haven't tried because I don't want/need the process to be killed or restarted at all.
        Since this function is called to start and to restart the process I'm checking if the process is running with this line "if (!(get_service_status(find_service_by_name('unbound'))))" and if it is already running I simply do nothing.

        Well, this hack obviously is not acceptable, since it breaks things like the DHCP registration… So, we should move to testing something that might get accepted instead. :P

        1 Reply Last reply Reply Quote 0
        • C Offline
          ConfusedUser
          last edited by

          @doktornotor:

          Well, this hack obviously is not acceptable, since it breaks things like the DHCP registration… So, we should move to testing something that might get accepted instead. :P

          Sure… This is really just a workaround and for me it works like a charm. Obviously this can't go into release this way.

          I was posting this because it might help others, too, who don't use dynamic WAN IPs and DHCP registration in unbound.

          What really needs to be happen would be to fix AND clean up unbound.inc and services.inc.
          I would love to help (obviously I'm not too lazy, the investigation and the workaround took me a few hours) but my skills are simply not good enough to provide a "proper" solution even though it doesn't look too complicated to me so I might give it a try...

          For now (and for anyone who wants to use it): The workaround works perfect for me. But don't blame me if it kills your box.  :)

          1 Reply Last reply Reply Quote 0
          • D Offline
            doktornotor Banned
            last edited by

            Actually all the code to dump/restore cache is there. Just the checkbox to enable the dumpcache var went AWOL. This was there with the 2.1 package.

            https://github.com/pfsense/pfsense-packages/blob/master/config/unbound/unbound.xml#L179

            EDIT: Filed a bug about the missing cache save/restore - https://redmine.pfsense.org/issues/4667

            1 Reply Last reply Reply Quote 0
            • GertjanG Online
              Gertjan
              last edited by

              I saw the code in unbound.inc, that handles the dump_cache and load_cache.
              It might be the solution.
              So I activated the 'dump_cache' facilities, so that before a 'reload', the cache was dumped.

              Have a look at this:

              function unbound_control($action) {
              	global $config, $g;
              
              	$cache_dumpfile = "/var/tmp/unbound_cache";
              
              	switch ($action) {
              	case "start":
              .....
              	case "stop":
              .....
              	case "reload":
              .....
              	case "dump_cache":
              		// Dump Unbound's Cache
              		if ($config['unbound']['dumpcache'] == "on")
              			do_as_unbound_user("dump_cache");
              		break;
              	case "restore_cache":
              		// Restore Unbound's Cache
              		if ((is_service_running("unbound")) && ($config['unbound']['dumpcache'] == "on")) {
              			if (file_exists($cache_dumpfile) && filesize($cache_dumpfile) > 0)
              				do_as_unbound_user("load_cache < /var/tmp/unbound_cache");
              		}
              		break;
              	default:
              		break;
              
              	}
              }
              
              

              Be carefully : $config['unbound']['dumpcache'] isn't defined neither declared in the GUI.
              And doing something like this: do_as_unbound_user("load_cache < /var/tmp/unbound_cache");
              will not work. The function do_as_unbound_user() works not that way.
              Anyway, I had my unbound cache being dumped before restart.

              And then I started to think … after reading this:

                     load_cache
              	      The contents of the cache	is loaded from stdin.  Uses  the  same
              	      format as	dump_cache uses.  Loading the cache with old, or wrong
              	      data can result in old or	wrong data returned to clients.	 Load-
              	      ing data into the	cache in this way is supported in order	to aid
              	      with debugging.
              

              Consider the situation: unbound starts, and read all the files its need, like /var/etc/hosts, the DHCP leases file, etc.
              Then we instruct it to load the cache file, /var/tmp/unbound_cache
              It doesn't take long to discover that the internal working cache (with the new local info) in unbound is being replaced by what has been written in /var/tmp/unbound_cache.
              F*ck.
              Doing so makes it completely useless to restart unbound to begin with …...  >:(
              As said in the doc: dump_cache and load_cache exists for debugging purposes.

              Leaves me with one idea: ditch unbound, even if it is the most secured name server, and make friends with bind...... (one of the most used name server on the planet !?!), which, I'm pretty sure, doesn't 'work' like this.

              edit: hopping through the unbound source code make me think: it 'reloads' when reading in a cache file ….  :o
              Well .....

              No "help me" PM's please. Use the forum, the community will thank you.
              Edit : and where are the logs ??

              S 1 Reply Last reply Reply Quote 0
              • D Offline
                doktornotor Banned
                last edited by

                Yeah I had a look at the source code and I have hard time understanding the design. Also, the "reload" which is something presumed to be graceful in the rest of the world is pretty much equal to restart and disrupts the service. Unless this gets fixed quickly upstream, I frankly would suggest removing the DHCP registration features from the DNS Resolver GUI altogether. With this broken design, it is not usable.

                1 Reply Last reply Reply Quote 0
                • K Offline
                  kejianshi
                  last edited by

                  It seems to work ok IF you are using static DHCP entries for the clients, like I do.

                  Of course this isn't possible if you are providing DHCP for a mall full of wireless clients.

                  But then again, why the hell would you want a mall full of wireless clients registered in DNS?

                  So maybe register the static entries but not the dynamic entries.

                  1 Reply Last reply Reply Quote 0
                  • C Offline
                    ConfusedUser
                    last edited by

                    I saw host entries can be removed with unbound-control at runtime w/o restarting the service. But I can't see an option to add a host using unbound-control. If that is possible this could eliminate the issues when using DHCP.

                    I'm registering ALL my leases at a Windows Server DNS Service (since there is an AD running on it). So even DHCP is not updating the local DNS it still keeps restarting it.  So I'm using unbound only for Internet DNS loolup really but everything locally is done by Microsoft DNS service. So definitely no need for restarting anything since hosts file will never change.

                    Well, "…keeps restarting..." is wrong in fact, "...kept restarting..." is correct in my case since I removed the ability to restart unbound from the code a couple of days ago (which I posted here). Even it's a very dirty workaround it works perfect so far without any adverse effects for my usage. Couldn't be happier that I've done that.

                    1 Reply Last reply Reply Quote 0
                    • K Offline
                      kevindd992002
                      last edited by

                      Is there any progress to this issue?

                      1 Reply Last reply Reply Quote 0
                      • L Offline
                        Liath.WW
                        last edited by

                        yeah hoping myself that this is fixed, and soon. I don't have any dns mappings done in DHCP (do have some statics), but I keep gettign the restart issue every flippin time a phone is turned on and connects. It's getting old fast.

                        1 Reply Last reply Reply Quote 0
                        • C Offline
                          CiscoKid85
                          last edited by

                          In its current implementation, unbound is unusable for me. The restarts are causing the cache to clear so I've had to use forwarder to make everything work.

                          1 Reply Last reply Reply Quote 0
                          • K Offline
                            kevindd992002
                            last edited by

                            It's clearly happening to very many users but is it already acknowledged as a real "issue"? No pun intended.

                            1 Reply Last reply Reply Quote 0
                            • D Offline
                              doktornotor Banned
                              last edited by

                              @kevindd992002:

                              It's clearly happening to very many users but is it already acknowledged as a real "issue"? No pun intended.

                              Perhaps take this to upstream mailing list? As discussed many times above, there's no graceful reload anywhere, the code is braindead.

                              1 Reply Last reply Reply Quote 0
                              • C Offline
                                ConfusedUser
                                last edited by

                                Just want to mention that my "solution" https://forum.pfsense.org/index.php?topic=89589.msg517047#msg517047 is working great on 6 production systems.  :)

                                1 Reply Last reply Reply Quote 0
                                • C Offline
                                  cmb
                                  last edited by

                                  @doktornotor:

                                  @kevindd992002:

                                  It's clearly happening to very many users but is it already acknowledged as a real "issue"? No pun intended.

                                  Perhaps take this to upstream mailing list? As discussed many times above, there's no graceful reload anywhere, the code is braindead.

                                  That's what I was planning on doing post-2.2.4. I dug into it a few days ago looking at something else to find basically what's been discussed previously. Sending it a -HUP is a full stop/start rather than a reload, and same for unbound-control reload. There is no way to just reload the config without doing a full stop/start. It is certainly braindead.

                                  Anyone else is welcome to report upstream, just please post back a link to the thread here.

                                  1 Reply Last reply Reply Quote 0
                                  • K Offline
                                    ky41083 Banned
                                    last edited by

                                    How many people having this issue are using a config that used to use the forwarder rather than the resolver? And how many of those people still have the top 3 DHCP options checked in the forwarder, even though the forwarder is disabled?

                                    If this is the case, humor me here… go to the DNS forwarder page, check the top enable box to enable access to the 3 DHCP options boxes below it, uncheck all 3 DHCP options boxes, then uncheck the top enable box (so we don't unwantingly enable the forwarder), click the save button, verify that all 3 DHCP options boxes for the forwarder are now unchecked and disabled once the page reloads, and reboot pfSense for sanity.

                                    Please post back as to what effect this has on Unbound reloads… either the same, less frequently, or eliminates them all together.

                                    1 Reply Last reply Reply Quote 0
                                    • J Offline
                                      jpns
                                      last edited by

                                      I have the same problem with unbound constantly reloading and having no DNS resolution for about 30 seconds every few minutes. I did have both the dynamic and static DHCP clients options ticked for DNS but I disabled them both and it didn't fix it. I have come to the conclusion that unbound is so broken that I've had to switch back to dnsmasq and now it's all working fine. I was using forwarding in unbound anyway (I am running multi-WAN with failover), so it doesn't really bother me having to switch.

                                      1 Reply Last reply Reply Quote 0
                                      • E Offline
                                        Ecnerwal
                                        last edited by

                                        @ky41083:

                                        How many people having this issue are using a config that used to use the forwarder rather than the resolver?
                                        …
                                        Please post back as to what effect this has on Unbound reloads... either the same, less frequently, or eliminates them all together.

                                        I sure hope it eliminates them altogether. I have noticed (and came looking, and found this thread) that it NEVER runs more than an hour at a time, and that can't help cache performance one stinking bit. Sometimes it does appear to get down to 30 minutes or even a minute or two (without settings changes going on.) But absolutely never more than an hour. Not so peachy.

                                        Got it rebooted, we'll see.

                                        That would be "Nope, didn't help a bit." Well, I guess it did get just over 4 hours in one place there. But mostly no help.

                                        Oct 16 08:49:49	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 08:49:48	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 07:49:21	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 07:49:21	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:48:58	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:48:58	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:47:45	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:47:45	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:35:23	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:35:23	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:30:07	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:30:07	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:26:31	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:26:31	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:24:05	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:24:05	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:22:25	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:22:25	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:21:29	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:21:29	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:20:58	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:20:58	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:19:57	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:19:56	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:12:45	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:12:45	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:11:16	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:11:16	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:09:50	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:09:50	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:07:16	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:07:16	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:01:39	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:01:39	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:01:02	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:01:02	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:00:36	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:00:35	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 06:00:34	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 06:00:34	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:59:10	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:59:10	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:59:04	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:59:04	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:52:44	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:52:44	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:52:24	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:52:24	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:51:53	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:51:53	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:50:09	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:50:09	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:49:59	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:49:59	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:49:15	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:49:15	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:48:56	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:48:55	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 05:47:53	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 05:47:53	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 04:43:13	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 04:43:13	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 03:41:43	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 03:41:43	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 03:41:42	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 03:41:42	unbound: [69305:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 03:41:41	unbound: [69305:0] info: start of service (unbound 1.5.3).
                                        Oct 16 03:41:40	unbound: [22642:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 03:41:00	unbound: [22642:0] info: start of service (unbound 1.5.3).
                                        Oct 16 03:41:00	unbound: [22642:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 03:40:59	unbound: [22642:0] info: start of service (unbound 1.5.3).
                                        Oct 16 03:40:59	unbound: [22642:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 03:40:58	unbound: [22642:0] info: start of service (unbound 1.5.3).
                                        Oct 16 03:40:58	unbound: [22642:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 03:40:57	unbound: [22642:0] info: start of service (unbound 1.5.3).
                                        Oct 16 03:40:57	unbound: [22642:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 03:40:57	unbound: [22642:0] info: start of service (unbound 1.5.3).
                                        Oct 15 23:33:46	unbound: [35431:0] info: start of service (unbound 1.5.3).
                                        Oct 15 23:33:46	unbound: [35431:0] info: service stopped (unbound 1.5.3).
                                        Oct 15 23:33:46	unbound: [35431:0] info: start of service (unbound 1.5.3).
                                        Oct 15 23:33:46	unbound: [35431:0] info: service stopped (unbound 1.5.3).
                                        Oct 15 23:33:46	unbound: [35431:0] info: start of service (unbound 1.5.3).
                                        Oct 15 23:33:46	unbound: [15076:0] info: service stopped (unbound 1.5.3).
                                        Oct 15 23:33:41	unbound: [15076:0] info: start of service (unbound 1.5.3).
                                        

                                        Having a boatload of resources free because I can't run traffic shaping and transparent squid since 2.1.x, I cranked all the settings in the resolver to maximum (they were already bumped up quite a bit from defaults) though I doubt it will have any effect. But what the heck…

                                        So with all the settings cranked we got - an hour and 17 seconds.

                                        Oct 16 10:47:34	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 09:47:17	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        

                                        …and then things went downhill.

                                        Oct 16 12:03:38	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 12:03:38	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 12:03:20	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 12:03:20	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 12:02:08	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 12:02:08	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 12:01:26	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 12:01:26	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 11:59:31	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 11:59:31	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 11:58:50	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 11:58:50	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 11:56:58	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 11:56:58	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 11:56:30	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 11:56:30	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 11:52:29	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 11:52:29	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 11:51:46	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 11:51:45	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 11:50:39	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 11:50:39	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 11:49:04	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 11:49:03	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 11:48:37	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        Oct 16 11:48:37	unbound: [63283:0] info: service stopped (unbound 1.5.3).
                                        Oct 16 10:47:34	unbound: [63283:0] info: start of service (unbound 1.5.3).
                                        

                                        Back to dnsMasq for now. Perhaps I'll try unbound on a separate server at some point, clearly there are issues with the package that degrade its potential usefulness.

                                        pfSense on i5 3470/DQ77MK/16GB/500GB

                                        1 Reply Last reply Reply Quote 0
                                        • K Offline
                                          ky41083 Banned
                                          last edited by

                                          On my hardware at least, this stops Unbound from restarting until a config change is made to DHCP. When the DHCP config change is applied, Unbound gets restarted, and then starts doing the same old restart every hour or less deal, until the pfSense box is rebooted. On a fresh reboot with no config changes Unbound has run for about 15 days without restarting (longest I've gone without a config change lately).

                                          The issue is not in the package, it's in the scripts controlling it. The way the scripts are written now, they insert data into Unbound's config files, then restart the service for Unbound to reload those config files and pull in the changes. Dnsmasq is handled the same way, except it will hold its cache and re-read config files upon getting a SIGHUP.

                                          The right way to handle local DNS changes, for Unbound at least, would basically be to do the opposite of what is being done now. Rather than write to the config files and bounce the service, you would use unbound-control to tell Unbound about the new local DNS changes and let it write the config files.

                                          The code is almost completely written already, all that needs to be done is for someone that knows what to change and has the time to change it, to go through all the Unbound functions and replace config file writes with unbound-control command calls that use almost identical syntax. For example, instead of writing local-data-ptr to a file like this and bouncing the service (how it's done now):

                                          $unbound_entries .= "local-data: \"{$host['fqdn']} {$type} {$host['ipaddr']}\"\n";
                                          

                                          You would do something like this and not ever restart the service:

                                          $unbound_cmd .= "unbound-control local_data {$host['fqdn']} {$type} {$host['ipaddr']}";
                                          

                                          Same thing with removing local DNS entries, every unbound-control add command has a corresponding remove command, local_data_remove for this example.

                                          If Unbound doesn't write the entries to the config files in case of a user forced service stop / restart, then the code in use now is already maintaining consistency in the config files. Or you could simply re-add all the local DNS records with unbound-control calls on service start. Either way would work fine.

                                          If memory serves, this was a lot of the reasoning behind using Unbound over Dnsmasq. Because Dnsmasq has nothing like unbound-control to insert and remove DNS records, and even make live config changes, without doing a full config reload / service restart. Why this was not carried over to implementation is beyond me. Ask the man upstairs.

                                          Also, clearly, some functions are being called in the wrong place, and this needs to be cleaned up as well. The DHCP register check boxes on the forwarder page should have no effect on the resolver what so ever, but it appears they absolutely do, as noted above. Also, the same DHCP register check boxes on the resolver page seem to only write these items to the Unbound config files without restarting the service, which is only half right the way it is currently implemented (you would expect a service restart to follow). This all needs to be changed to use unbound-control anyways, but should be noted on the fix Unbound to do list with everything else.

                                          Thank you and good night  :D

                                          1 Reply Last reply Reply Quote 1
                                          • D Offline
                                            David_W
                                            last edited by

                                            @ky41083:

                                            The issue is not in the package, it's in the scripts controlling it. The way the scripts are written now, they insert data into Unbound's config files, then restart the service for Unbound to reload those config files and pull in the changes. Dnsmasq is handled the same way, except it will hold its cache and re-read config files upon getting a SIGHUP.

                                            The right way to handle local DNS changes, for Unbound at least, would basically be to do the opposite of what is being done now. Rather than write to the config files and bounce the service, you would use unbound-control to tell Unbound about the new local DNS changes and let it write the config files.

                                            Whilst this sounds promising, have you code read unbound to check what unbound-control local_data and unbound-control local_data_remove actually do? If these commands force a cache flush, then there is no gain and potential losses in your proposed approach. If unbound uses a cache walk to discard affected RR(s), a different potential performance issue arises if this process is blocking, especially if unbound is started with no local RRs and they are added one at a time using unbound-control. If cache discard takes place via a non-blocking cache walk or no cached RR removal takes place, the possibility of race conditions exists, as stale RR data will be served for an indeterminate but potentially significant time.

                                            Rather than using unbound-control to insert all RRs on startup, a hybrid approach is possible - start unbound with the current state in the configuration files, then use unbound-control for subsequent changes.

                                            One complexity in implementing your proposed approach is a frustrating non-orthogonality in the local_data and local_data_remove commands. local_data adds a single RR. local_data_remove removes all RRs for the given name - which might include both A and AAAA records for the same name (and, if local_data_remove for PTR records is called by the name rather than the IP address, this means IPv4 and/or IPv6 PTR records pointing to that name will also be removed). IPv4 and IPv6 address allocation use entirely separate processes that have no guarantees on temporal relationship. There are circumstances where you might want to remove just one RR, which means you have to remove all RRs for the name then re-add any wanted RRs.

                                            If there are advantages to using unbound-control, it might be better to add support for unbound control to pfSense by generating a new unbound configuration as at present, then diffing it with the old configuration. If there are changes solely in local data (i.e. in dhcpleases_entries.conf and/or host_entries.conf), pfSense could use the output of the diff to make appropriate calls to unbound-control, otherwise replace the old configuration with the new and send unbound a SIGHUP. This approach has the advantage of ensuring consistency (for example by ensuring explicitly configured local-data is not accidentally removed) and is easily removed if unbound gets saner SIGHUP behaviour in the future. One implementational complexity is the need to call unbound-control local_data for any explicit local-data and local-data-ptr in unbound.conf after you call unbound-control local_data_remove for the same name.

                                            Another possibility is to use an entirely separate DNS server for local zone(s), defining them in Unbound as stub zones. There is potential advantage here in terms of being able to DNSSEC sign these zones. However, when data changes, stale data would be cached in Unbound that would require carefully targeted calls to unbound-control flush_type or a costly (because of the cache walk) unbound-control flush_zone to remove.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.