Unbound frequently restarts on 2.2 - is this normal?

phil.davis

I imagine/hope/think that "reload" causes the running unbound to simply re-read its config (which includes reading associated leases or whatever other stuff is pointed to by the config) and internally implement that on-the-fly without any significant interruption of user service.

But maybe that is not the case! Someone could look in the unbound source code and see how a "reload" message really is processed.

doktornotor

@Gertjan:

I didn't have a look at the unbound source code (yet) but ""unbound-control reload" means to me : restart it.
All this to force it to reread the hosts file (and dhcp leases file ?).

Hell no…

reload
Reload the server. This flushes the cache and reads the config file fresh.

This should not cause a service restart! Ever. The feature would be totally pointless otherwise.

If anyone's experiencing the issue, truss the reload and see if it's trying to load some include which does not exist… Like:


truss unbound-control -c /var/unbound/unbound.conf reload

Note: Having two threads about exactly the same does NOT help. >:(

Gertjan

@doktornotor:

Hell no…

reload
Reload the server. This flushes the cache and reads the config file fresh.

… and that is what I would like to see.

https://www.unbound.net/documentation/unbound.conf.html learns us how to reload "kill -HUP cat /usr/local/etc/unbound/unbound.pid".
For pfSense, that will be :
kill -HUP cat /var/run/unbound.pid

When doing so, the log shows :

04-23-2015 12:43:41 Daemon.Info 192.168.1.1 Apr 23 12:43:45 unbound: [45284:0] info: start of service (unbound 1.5.3).
04-23-2015 12:43:41 Daemon.Notice 192.168.1.1 Apr 23 12:43:45 unbound: [45284:0] notice: init module 0: iterator
04-23-2015 12:43:41 Daemon.Notice 192.168.1.1 Apr 23 12:43:45 unbound: [45284:0] notice: Restart of unbound 1.5.3.

Maybe unbound isn't logging exactly what it does.
HUPping ….. but it plain restarts.
HUPping, it reloads, but is says it restarts.
"unbound" uses a new definition of "reload" or "HUP" ?
Whatever ...

Anyway, again, not a bad thing, not a pfSense issue, its more an small "unbound" issue.

doktornotor

I did some digging into the unbound source code and seems like it restarts the worker process on SIGHUP/unbound-control reload. Looks like some half-assed server restart if you ask me. Then again, I'm definitely not familiar with the code, and in general do not have time to dig into it in detail. Since the upstream seemed pretty communicative, maybe get in touch with them instead. (Also, the cache flushing on reload seems like highly unwanted behaviour to me…)

kejianshi

In my case, restart/reload and cache flush is selected in the advanced settings under certain conditions when DNS is getting flooded with unwanted replies to prevent DNS cache poisoning. I'm not sure why I'm not seeing all this other bad behavior.

Gertjan

@doktornotor:

I did some digging into the unbound source code and seems like it restarts the worker process on SIGHUP/unbound-control reload. Looks like some half-assed server restart if you ask me. Then again, I'm definitely not familiar with the code, and in general do not have time to dig into it in detail. Since the upstream seemed pretty communicative, maybe get in touch with them instead. (Also, the cache flushing on reload seems like highly unwanted behaviour to me…)

I did the same thing this afternoon.
Your conclusion is mine ….

@kejianshi:

In my case, restart/reload and cache flush is selected in the advanced settings under certain conditions when DNS is getting flooded with unwanted replies to prevent DNS cache poisoning. I'm not sure why I'm not seeing all this other bad behavior.

That the real issue in 'the issue': who cares about (often) reloading/starting/whatever, but, if the cache is flushed …. then for what is unbound good for ? DNSForwarder is doing the same thing already.

@kejianshi : From what I saw, there is a strict relationship between DHCP activity (system host file rewriting when a new host is added - removed) and unbound reloading. Pretty logic, as a DNS server should know about local hosts - unbound parsers the file upon start reload (from what I understood).

dugeem

Running pfSense 2.2.2 on an Alix 2D3 with dual WANs - one with IPv4/IPv6 and other with only IPv4.

Regardless of whether I use Unbound or Dnsmasq I'm currently seeing DNS service restarts every 30 minutes which I think is due to DHCP6 (likely dhcp6c but this needs to be confirmed).

As per the comments of many - this renders DNS caching largely ineffective. I think doktornotor put his finger on it earlier in this thread noting that too many services are needlessly restarted when something happens - even if the something results in no IPv4/IPv6 address changes. Hopefully improvements can be made in the next few releases to mitigate this.

Secondly in the case of unbound DHCP registration there is a possibility of inserting/removing A records using the unbound-control local_data (& local_data_remove) command. This will obviously need some code refactoring with the main change being that if unbound is enabled that DHCP registrations would result in unbound-control local_data commands with no service restart. However the unbound.conf file would still need updating so that if unbound is restarted for other reasons it will load all the hosts defined in local-data etc.

Comments? Do we have sufficient understanding of these issues to create some problem tickets? Or has someone already done this?

Ben. 0

Unbound restarts here roughly every 3 hours. At this time I completely lose the connection to pfSense.

Do you have the same symptoms or is just the logs being written? Im curious if the restart really affects me or there is something else going which interrupts the connection between WIFI and pfSense.

ky41083

Anyone still having this issue, try this:
https://forum.pfsense.org/index.php?topic=89589.msg558373#msg558373

Would like to get some feedback beyond the handful of devices I manage.

55c40e301e

@ky41083:

Anyone still having this issue, try this:
https://forum.pfsense.org/index.php?topic=89589.msg558373#msg558373

Would like to get some feedback beyond the handful of devices I manage.

For me this stopped the repeated "unbound: service stopped", "unbound: start of service" messages 2-3 times per minute. Thanks - this was a longstanding issue.

It fit because this installation was previously dnsmasq, switched to unbound some time ago.

Specifically, the relevant part of the config export looked like this before:

	 <dnsmasq><regdhcpstatic><custom_options><domain_needed><no_private_reverse><interface></interface></no_private_reverse></domain_needed></custom_options></regdhcpstatic></dnsmasq>

and like this after:

	 <dnsmasq><custom_options><domain_needed><no_private_reverse><interface></interface></no_private_reverse></domain_needed></custom_options></dnsmasq>

It also took a reboot.

A more subtle issue for me is that machines seem to lose DNS resolution (maybe all connectivity?) for about 5 seconds every time their DHCP lease expires and is renewed. For now I've just lengthened DHCP leases significantly - they were short for testing. Separate issue I guess.