Frequent unbound restarts

jasonArloUser

@Gertjan unbound doesn't need to be reprogrammed. It already has the ability to reload configuration with the unbound-control command. The issue here is that the process that updates the leases doesn't use that, it sends the process a HUP. This is a bug as the lease program knows it can only be affecting the local zone so there is no possible reason to do anything other than reload that zone. Sending HUP and expecting unbound to somehow know that HUP means a reload of the local zone would be incorrect.

There are 2 simple ways to fix this issue: (a) fix the script to call unbounded-control as it should or (b) in the shell command that starts unbounded it is probably possible to set up a HUP handler that calls unbound-control. The second solution is wrong here because HUP should really reload everything but at least it would get things working without having to rewrite anything else.

Gertjan

@jasonArloUser said in Frequent unbound restarts:

@Gertjan unbound doesn't need to be reprogrammed. It already has the ability to reload configuration with the unbound-control command. The issue here is that the process that updates the leases doesn't use that, it sends the process a HUP. This is a bug as the lease program knows it can only be affecting the local zone so there is no possible reason to do anything other than reload that zone. Sending HUP and expecting unbound to somehow know that HUP means a reload of the local zone would be incorrect.

I know. Still waiting for this to happen.
Look at the word reload here. Or from the authors.

What I read here is : it just restarts, or something close to stop and start.
Cache is lost - all config files are read in again ... The only benefit is that the process isn't destroyed, and recreated.

This means that, for me, that when a relative small /etc/hosts file has to get read in a again, it also reads other config files like the "/var/unbound/pfb_dnsbl.*conf" from pfBlocker if it is installed and activated.

A Former User

Hi

Just wanted to add I came here after googling this event in the logs.

I have a feeling the restarts started or became worse when my ISP enabled IPv6. I did have the check box set for Register DHCP leases in the DNS Resolver, but with IPv4 noticed no issues.

When I enabled IPv6 I found web pages didn't zip in quite as fast as they did when it was IPv4 only, but were okay after the first visit, I put this down to the web browser using a fall back mechanism then caching the results of it for a while. A few weeks on and I saw the restarts in the log every second or so and came here.

So I've disabled the DCHP registration option, the resets have stopped, and web pages are loading without the extra latency, which makes sense as the results are staying in the DNS resolver cache and no delays from hitting pfSense when the service is restarting.

It's like I've had a speed upgrade.

Regards

Phil

RichMawdsley

I installed pfSense about a month ago and have been trying to track down my random internet drops for weeks. Finally saw the HUP signal message in the logs this morning and realised it happened on every DHCP request.. which lead me to this page.

I'm a little in shock over how absolutley stone age this is. A really basic ability causes it to restart the whole thing. What the hell?!

Static DHCP as a workaround is great and all, but as others have said, it's no where near a solution. It's manually doing the job for DHCP & DNS.

Anyone tested if this is fixed in any of the later v5 releases etc? Or is that not where the problem lay?

Gertjan

I never actuality tested (looked for) this : If a new lease comes in and the option is checked, then unbound is HUPped.
A lease renewal : no, as only the duration is updated, which is not a DNS 'thing'.

jt

how's the BIND implementation currently doing? would switching from unbound help here?

Gertjan

bind check the config file it's using, and parses them when it detect it was 'touched' by some other process.
What I do know is this : it won't ditch the cache when this happens.
It can also unbind and bind to new interfaces, when they come and leave.

But I can't tell if it would be better.
bind is huge. Setup has to been done manually, even if you use a GUI like pfSense uses.
bind needs users to look at the manual (huge also) for sure , otherwise DNS becomes a mess.

I don't bother, practically all my devices on my trusted LAN's have static MAC leases, for IPv4 and IPv6, so 'my' unbound doesn't restarts very often (less then ones a day, probably even less frequent).

Orbixx

I have this problem after adding pfBlockerNG with a significantly large DNSBL list. Will try the following to reduce/eliminate impact:

Reduce pfBlockerNG lists to a more reasonable size
Add more static IP leases where reasonable
Increase lease time

lawrencedol

[Post deleted. My problem is not related to this.]

stephenw10

If you are getting disruption to VoIP calls that's clearly not directly DNS related. It may in fact be nothing to do with Unbound at all and in fact is just a symptom of something else that also causes Unbound to take far longer to reload.
Have you been seeing this before 2.4.5 or just since upgrading? If it's only in 2.4.5 you are probably hitting this:
https://redmine.pfsense.org/issues/10414

Try opening top -aSH and also pinging the firewall and then go to Status > Filter Reload in the gui and reload the filter.
If you see pings spike and processes shoot up to the top of the top table, pfctl, sshd, dpinger etc, then you almist certainly are hitting that.

Steve

lawrencedol

@stephenw10

Appreciate the feedback, thanks. I guess I am still digging on my issue because my son just confirmed to me that my specific problem is not yet resolved.

stephenw10

I'm assuming you are running this at home and don't have a massive number of dhcp clients?

There are thousands of users in the same situation, including me, who are not hitting this. It think it's likely Unbound reloading causing disruption is in fact a symptom of something else rather than a cause.

Steve

lawrencedol

This post is deleted!

stephenw10

Check the cron table, install the Cron package to make it easy. What is running at those intervals?

Steve

lawrencedol

This post is deleted!

jasonArloUser

I have turned off DHCP DNS registration for guest users and so on and I have static DHCP leases for all my known devices but I still have the internet go offline once per day. The fact remains: this is a bug in PfSense that the lease script is calling HUP. It needs to be changed to reload the only the local cache as described above.

stephenw10

This is the open big covering this issue: https://redmine.pfsense.org/issues/5413

There is an open pull request there. dhcpleases is a binary though so would need to be compiled and swapped out to test.

Steve

Gertjan

@jasonArloUser said in Frequent unbound restarts:

I have turned off DHCP DNS registration for guest users and so on and I have static DHCP leases for all my known devices but I still have the internet go offline once per day. The fact remains: this is a bug in PfSense that the lease script is calling HUP. It needs to be changed to reload the only the local cache as described above.

If you remove the check, as shown here :

then process 'dhcpleases' - example :

2930  -  Ss       0:00.01 /usr/local/sbin/dhcpleases -l /var/dhcpd/var/db/dhcpd.leases -d your-local-domain.tld -p /var/run/unbound.pid -u /var/unbound/dhcpleases_entries.conf -h /etc/hosts

will not run : unbound will not get restarted by a new DHCP lease.

If (your) unbound is restarting to often, some other process is responsible for this.

Btw : my 'unbound' process restated 5 days ago - and I guess it was me doing so changing a setting.

@stephenw10 said in Frequent unbound restarts:

This is the open big covering this issue: https://redmine.pfsense.org/issues/5413
There is an open pull request there. dhcpleases is a binary though so would need to be compiled and swapped out to tes

Unbelievable that this issue still exists after 4 years. One might ask : is it really an issue, as circumventing it is rather easy to do ?
Also, unbound is a rather small resolver that handles the job very well.
It 'unbound' that has to be rewritten by the authors so it can reread dynamic its config files if one changes - as the alternative does : bind (named).
But bind has also it's disadvantage. It's huge. And hard to administer it when you hide the option (hunderds) behind a GUI like interface like pfSense.

stephenw10

I have never hit it myself but clearly some people do. Switching to reload instead of restart does seem like the obvious option here. The fact it hasn't happened yet may imply I'm missing something though.

Steve

lucas_nz

@stephenw10 It's particularly noticeable if you are using pfBlockerNG - which adds large lists of sites to the unbound config (to provide DNS based blocking) and thus the reload can take some seconds (the restart wasn't noticeable before I implemented pfBlockerNG). This was a major issue for me until I unticked DHCP registration option. But having DHCP registration disabled is a bit lame.

Luke