Frequent unbound restarts

bimmerdriver

Upgraded to 2.3.4 last night. The upgrade did not go well (https://forum.pfsense.org/index.php?topic=130297.0), so it's possible there are underlying problems.

I have the log set to hold 2000 entries. Since this morning, the log is only holding 5 hours of messages, including almost 40 restarts of unbound. During this time, there was virtually no load on the system. A sample of the messages is below. There is no obvious error.

In Services / DNS Resolver / General Settings, DNSSEC, DHCP registration and Static DHCP are enabled. Aside from that, unbound is configured with defaults.

In System / General Setup / DNS Server Settings, no DNSes are configured and DNS Server Override is unchecked.

Is there a reason for so many restarts?


May 10 08:55:36	unbound	13085:0	info: start of service (unbound 1.6.1).
May 10 08:55:36	unbound	13085:0	notice: init module 1: iterator
May 10 08:55:36	unbound	13085:0	notice: init module 0: validator
May 10 08:55:36	unbound	13085:0	notice: Restart of unbound 1.6.1.
May 10 08:55:36	unbound	13085:0	info: 16.000000 32.000000 1
May 10 08:55:36	unbound	13085:0	info: 0.524288 1.000000 2
May 10 08:55:36	unbound	13085:0	info: 0.262144 0.524288 38
May 10 08:55:36	unbound	13085:0	info: 0.131072 0.262144 46
May 10 08:55:36	unbound	13085:0	info: 0.065536 0.131072 54
May 10 08:55:36	unbound	13085:0	info: 0.032768 0.065536 36
May 10 08:55:36	unbound	13085:0	info: 0.016384 0.032768 27
May 10 08:55:36	unbound	13085:0	info: 0.008192 0.016384 42
May 10 08:55:36	unbound	13085:0	info: 0.004096 0.008192 1
May 10 08:55:36	unbound	13085:0	info: 0.000000 0.000001 2
May 10 08:55:36	unbound	13085:0	info: lower(secs) upper(secs) recursions
May 10 08:55:36	unbound	13085:0	info: [25%]=0.0268516 median[50%]=0.0855609 [75%]=0.201594
May 10 08:55:36	unbound	13085:0	info: histogram of recursion processing times
May 10 08:55:36	unbound	13085:0	info: average recursion processing time 0.246546 sec
May 10 08:55:36	unbound	13085:0	info: server stats for thread 3: requestlist max 20 avg 3.25703 exceeded 0 jostled 0
May 10 08:55:36	unbound	13085:0	info: server stats for thread 3: 332 queries, 83 answers from cache, 249 recursions, 0 prefetch, 0 rejected by ip ratelimiting
May 10 08:55:36	unbound	13085:0	info: 1.000000 2.000000 1
May 10 08:55:36	unbound	13085:0	info: 0.524288 1.000000 2
May 10 08:55:36	unbound	13085:0	info: 0.262144 0.524288 8
May 10 08:55:36	unbound	13085:0	info: 0.131072 0.262144 13
May 10 08:55:36	unbound	13085:0	info: 0.065536 0.131072 17
May 10 08:55:36	unbound	13085:0	info: 0.032768 0.065536 13
May 10 08:55:36	unbound	13085:0	info: 0.016384 0.032768 5
May 10 08:55:36	unbound	13085:0	info: 0.008192 0.016384 10
May 10 08:55:36	unbound	13085:0	info: lower(secs) upper(secs) recursions
May 10 08:55:36	unbound	13085:0	info: [25%]=0.0384394 median[50%]=0.0905939 [75%]=0.199129
May 10 08:55:36	unbound	13085:0	info: histogram of recursion processing times
May 10 08:55:36	unbound	13085:0	info: average recursion processing time 0.150241 sec
May 10 08:55:36	unbound	13085:0	info: server stats for thread 2: requestlist max 11 avg 0.521739 exceeded 0 jostled 0
May 10 08:55:36	unbound	13085:0	info: server stats for thread 2: 82 queries, 13 answers from cache, 69 recursions, 0 prefetch, 0 rejected by ip ratelimiting
May 10 08:55:36	unbound	13085:0	info: 4.000000 8.000000 1
May 10 08:55:36	unbound	13085:0	info: 0.262144 0.524288 7
May 10 08:55:36	unbound	13085:0	info: 0.131072 0.262144 9
May 10 08:55:36	unbound	13085:0	info: 0.065536 0.131072 14
May 10 08:55:36	unbound	13085:0	info: 0.032768 0.065536 7
May 10 08:55:36	unbound	13085:0	info: 0.016384 0.032768 9
May 10 08:55:36	unbound	13085:0	info: 0.008192 0.016384 8
May 10 08:55:36	unbound	13085:0	info: 0.000256 0.000512 1
May 10 08:55:36	unbound	13085:0	info: 0.000000 0.000001 3
May 10 08:55:36	unbound	13085:0	info: lower(secs) upper(secs) recursions
May 10 08:55:36	unbound	13085:0	info: [25%]=0.0213902 median[50%]=0.0725577 [75%]=0.16384
May 10 08:55:36	unbound	13085:0	info: histogram of recursion processing times
May 10 08:55:36	unbound	13085:0	info: average recursion processing time 0.172102 sec
May 10 08:55:36	unbound	13085:0	info: server stats for thread 1: requestlist max 4 avg 0.389831 exceeded 0 jostled 0
May 10 08:55:36	unbound	13085:0	info: server stats for thread 1: 86 queries, 27 answers from cache, 59 recursions, 0 prefetch, 0 rejected by ip ratelimiting
May 10 08:55:36	unbound	13085:0	info: 1.000000 2.000000 1
May 10 08:55:36	unbound	13085:0	info: 0.524288 1.000000 1
May 10 08:55:36	unbound	13085:0	info: 0.262144 0.524288 19
May 10 08:55:36	unbound	13085:0	info: 0.131072 0.262144 33
May 10 08:55:36	unbound	13085:0	info: 0.065536 0.131072 37
May 10 08:55:36	unbound	13085:0	info: 0.032768 0.065536 29
May 10 08:55:36	unbound	13085:0	info: 0.016384 0.032768 19
May 10 08:55:36	unbound	13085:0	info: 0.008192 0.016384 26
May 10 08:55:36	unbound	13085:0	info: 0.004096 0.008192 2
May 10 08:55:36	unbound	13085:0	info: 0.000000 0.000001 5
May 10 08:55:36	unbound	13085:0	info: lower(secs) upper(secs) recursions
May 10 08:55:36	unbound	13085:0	info: [25%]=0.0250072 median[50%]=0.0743922 [75%]=0.174763
May 10 08:55:36	unbound	13085:0	info: histogram of recursion processing times
May 10 08:55:36	unbound	13085:0	info: average recursion processing time 0.120666 sec
May 10 08:55:36	unbound	13085:0	info: server stats for thread 0: requestlist max 17 avg 2.08721 exceeded 0 jostled 0
May 10 08:55:36	unbound	13085:0	info: server stats for thread 0: 219 queries, 47 answers from cache, 172 recursions, 0 prefetch, 0 rejected by ip ratelimiting
May 10 08:55:36	unbound	13085:0	info: service stopped (unbound 1.6.1).

kpa

Something must be triggering a false WAN interface configuration changed condition, that's what usually causes the Unbound restart but it's only run when the WAN DHCP lease gets renewed.

johnpoz

" DHCP registration "

I do believe this is what causes the restarts every time a dhcp lease is updated, given out.. Atleast this use to be an issue - not sure if that was suppose to be fixed or not. I don't have that set. Only statics..

bimmerdriver

Thank you both for the suggestions. I'll look at the logs for concurrent events.

bimmerdriver

I compared the dns and dhcp logs and there is a correlation.

For every instance of

May 10 10:39:24	unbound	85179:0	info: service stopped (unbound 1.6.1).

there is an instance of

May 10 10:39:24	dhcpleases		Sending HUP signal to dns daemon(85179)

Occasionally, it takes more than one attempt:

May 10 10:33:06	dhcpleases		Sending HUP signal to dns daemon(85179)
May 10 10:33:06	dhcpleases		kqueue error: unkown
May 10 10:33:05	dhcpleases		Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
May 10 10:33:05	dhcpleases		Sending HUP signal to dns daemon(13085)

Sometimes the HUP messages are preceeded by DHCPACK and sometimes by dhcpd restarting. It appears to be "normal", although messy would be a better word to describe it.

bimmerdriver

On second thought, maybe inefficient would be a better way to describe it. This is a home system. There are under 20 dhcp and dhcpv6 leases and under 30 reservations. In a large network with numerous devices, unbound would be spending more time restarting than resolving dns queries. Not very scalable.

johnpoz

There has been lots of discussion on this.. I have not paid much attention to the threads because I don't have it registering dhcp other than my devices I have set for static via dhcp. Other devices that would just get something out of the dhcp pool.. This would normally just be guest devices since all my other devices I setup a reservation for.

I have zero need to being able to resolve by name a friends phone for example that might be on my guest wifi network once in a blue moon.

But I do recall there being multiple threads and much discussion on this - I do believe pfblocker came up in the conversations as well as some sort of contributor to the restarts. Since it reloads its lists into unbound when they update.

For now I would disable dhcp leases registration and see if that lowers the number of restarts you see.

bimmerdriver

Thanks for your reply. I did search and noted one thread in particular that was started a long time ago about this. It sounds like the same issue.

I have reservations for all of my servers, desktops and laptops. Virtually all of the dynamic leases are mobile phones or tablets. I agree that it doesn't serve much purpose to register these leases. I occasionally have 30 tablets on the network in addition to the usual mobile phones. I'll take a look at the log when that happens next time. Irrespective of the utility of registering leases, the implementation (requiring unbound to restart) seems to be inherently inefficient.

kpa

I haven't looked at the implementation that closely but Unbound has a 'reload' subcommand that should be enough to reload all of the configuration file without requiring a full restart of the daemon. However, Unbound is run under chroot in pfSense that might be the reason why a restart is required to fully reload all of the settings. This is guesswork on my part based on what I know about Unbound and chroot'ed daemons in general.

RonpfS

Disable DHCP Registration and configure Host Overrides in DNS Resolver. ;)

As for unbound, BBCan177 is presently trying to use live reload to make change to unbound DB instead of having it reload conf file at every change. But he ran into problems with DB getting out of sync with the conf file.

So at some point in the future, maybe DHCP could use the same technique.

bimmerdriver

@RonpfS:

Disable DHCP Registration and configure Host Overrides in DNS Resolver. ;)

As for unbound, BBCan177 is presently trying to use live reload to make change to unbound DB instead of having it reload conf file at every change. But he ran into problems with DB getting out of sync with the conf file.

So at some point in the future, maybe DHCP could use the same technique.

Can you be more specific about host overrides? Maybe I'm missing something, but I didn't see any such setting.

RonpfS

Under Services / DNS Resolver / General Settings ?
https://doc.pfsense.org/index.php/Unbound_DNS_Resolver

Host Overrides allows creation of custom DNS responses/records to create new entries that do not exist in DNS outside the firewall, or to override DNS responses for other hosts.

This is were I input the DHCP reservations so the hostname are resolved by unbound.

Maybe you can just keep Static DHCP enabled to get the same result.

mtarbox

I'm glad that someone else posted on this topic, albeit back in 2017.
I enabled TLD yesterday, and pfblocker is set to update at 0030.
Sometime around 0200 this morning my pfsense box started sending out multiple emails from the service watchdog that the dhcpd service and unbound had stopped and it was restarting those services. However I did not receive my daily email from the pfsense box, normally that comes in around 5-530am.
I'm unable to ssh into the pfsense box, and it was unresponsive this morning.
Unfortunately I start my work week today, so I can't even delve into it.
Luckily I do have a current backup, so I'm thinking I will follow the above, disable DHCP Registration and configure Host Overrides in DNS Resolver, and see if that solves the issues.

2.4.2
8gb ram

![Screenshot (20).png](/public/imported_attachments/1/Screenshot (20).png)
![Screenshot (20).png_thumb](/public/imported_attachments/1/Screenshot (20).png_thumb)

bbrendon

It seems like a good interim "hack" would be to patch the code to prevent unbound from restarting with dhcp udpates and create a cron job to do it on a more "controlled" basis.

Does anyone know which files in the php code needs changing? Is it under /usr/local/www ?

Gertjan

You will find what you're looking for here /etc/inc.
Check services.inc and system.inc.

It's far more easier to disable the "DHCP Registration" completly, and give all host that you want be able to resolve at any time a static fixed MAC DHCP lease entry.
When these host renew, they will receive the same IP every time, and will unbound not be told to restart.

Remember : the real issue is unbound itself : it reads config file when it start. If some host-IP information is known afterwards - unbound can only be made aware when it restarts. A cron solution would make your device "non resolvable" for a certain time, the time it takes before the cron job runs.

RonpfS

@gertjan said in Frequent unbound restarts:

unbound can only be made aware when it restarts

pfblockerNG-devel is changing unbound internal db (Resolver Live Sync) using unbound-control cmd. So maybe at some point DHCP server could be doing something similar.

sotirone

Is there really no other way to refresh other than restarting Unbound and reloading everything?

I mean this is not only a problem with the "Register DHCP leases in the DNS Resolver" but also with Remote OpenVPN clients that are using the DNS Resolver.

Every time a Remote Client connection is initiated or stopped, Unbound has to restart. If you are using pfBlocker-ng, that could mean upwards of 60 seconds of downtime.

I thought I read here on the forum that a fix was being developed but I guess I am mistaken.

Either Netgate or upstream needs to do something about this. Having to disable important features for the sake of uptime seems like quite a big problem for me.

Is there any way to escalate this?

RonpfS

DHCP could change the Unbound in-memory db as pfblockerNG does with Live Reload, instead of restarting Unbound.

jasonArloUser

Hi all

Is there any update on this? I'm trying to use PFBlockerNG-devel but this causes unbound restarts to take a noticeable amount of time and them happening every 10-30 minutes makes the whole package unusable.

I did a quick check on github and I see where the C code is sending a HUP to DNS. I could change this to call unbound-control like pfBlockerNG does (though I can't assume unbound is running the way PFBlocker can, so it will be slightly more complicated). Would such a pull request be accepted?

EDIT: Alternatively, we could change the unbound startup script to catch HUP and call unbound-control instead of restarting? That might be more robust.

Gertjan

@jasonArloUser said in Frequent unbound restarts:

I'm trying to use PFBlockerNG-devel but this causes unbound restarts

as per pfBlocker settings, it will reload restart unbound every :

@jasonArloUser said in Frequent unbound restarts:

to take a noticeable amount of time

Other options are : use less feeds. Or a bigger system.

@jasonArloUser said in Frequent unbound restarts:

on github and I see where the C code is sending a HUP to DNS

The DHCP daemon ?
Do what @RonpfS mentioned.
Give all important devices static mac mappings, and the DHCP server daemon won't restart unbound any-more.