Frequent unbound restarts



  • Upgraded to 2.3.4 last night. The upgrade did not go well (https://forum.pfsense.org/index.php?topic=130297.0), so it's possible there are underlying problems.

    I have the log set to hold 2000 entries. Since this morning, the log is only holding 5 hours of messages, including almost 40 restarts of unbound. During this time, there was virtually no load on the system. A sample of the messages is below. There is no obvious error.

    In Services / DNS Resolver / General Settings, DNSSEC, DHCP registration and Static DHCP are enabled. Aside from that, unbound is configured with defaults.

    In System / General Setup / DNS Server Settings, no DNSes are configured and DNS Server Override is unchecked.

    Is there a reason for so many restarts?

    
    May 10 08:55:36	unbound	13085:0	info: start of service (unbound 1.6.1).
    May 10 08:55:36	unbound	13085:0	notice: init module 1: iterator
    May 10 08:55:36	unbound	13085:0	notice: init module 0: validator
    May 10 08:55:36	unbound	13085:0	notice: Restart of unbound 1.6.1.
    May 10 08:55:36	unbound	13085:0	info: 16.000000 32.000000 1
    May 10 08:55:36	unbound	13085:0	info: 0.524288 1.000000 2
    May 10 08:55:36	unbound	13085:0	info: 0.262144 0.524288 38
    May 10 08:55:36	unbound	13085:0	info: 0.131072 0.262144 46
    May 10 08:55:36	unbound	13085:0	info: 0.065536 0.131072 54
    May 10 08:55:36	unbound	13085:0	info: 0.032768 0.065536 36
    May 10 08:55:36	unbound	13085:0	info: 0.016384 0.032768 27
    May 10 08:55:36	unbound	13085:0	info: 0.008192 0.016384 42
    May 10 08:55:36	unbound	13085:0	info: 0.004096 0.008192 1
    May 10 08:55:36	unbound	13085:0	info: 0.000000 0.000001 2
    May 10 08:55:36	unbound	13085:0	info: lower(secs) upper(secs) recursions
    May 10 08:55:36	unbound	13085:0	info: [25%]=0.0268516 median[50%]=0.0855609 [75%]=0.201594
    May 10 08:55:36	unbound	13085:0	info: histogram of recursion processing times
    May 10 08:55:36	unbound	13085:0	info: average recursion processing time 0.246546 sec
    May 10 08:55:36	unbound	13085:0	info: server stats for thread 3: requestlist max 20 avg 3.25703 exceeded 0 jostled 0
    May 10 08:55:36	unbound	13085:0	info: server stats for thread 3: 332 queries, 83 answers from cache, 249 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    May 10 08:55:36	unbound	13085:0	info: 1.000000 2.000000 1
    May 10 08:55:36	unbound	13085:0	info: 0.524288 1.000000 2
    May 10 08:55:36	unbound	13085:0	info: 0.262144 0.524288 8
    May 10 08:55:36	unbound	13085:0	info: 0.131072 0.262144 13
    May 10 08:55:36	unbound	13085:0	info: 0.065536 0.131072 17
    May 10 08:55:36	unbound	13085:0	info: 0.032768 0.065536 13
    May 10 08:55:36	unbound	13085:0	info: 0.016384 0.032768 5
    May 10 08:55:36	unbound	13085:0	info: 0.008192 0.016384 10
    May 10 08:55:36	unbound	13085:0	info: lower(secs) upper(secs) recursions
    May 10 08:55:36	unbound	13085:0	info: [25%]=0.0384394 median[50%]=0.0905939 [75%]=0.199129
    May 10 08:55:36	unbound	13085:0	info: histogram of recursion processing times
    May 10 08:55:36	unbound	13085:0	info: average recursion processing time 0.150241 sec
    May 10 08:55:36	unbound	13085:0	info: server stats for thread 2: requestlist max 11 avg 0.521739 exceeded 0 jostled 0
    May 10 08:55:36	unbound	13085:0	info: server stats for thread 2: 82 queries, 13 answers from cache, 69 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    May 10 08:55:36	unbound	13085:0	info: 4.000000 8.000000 1
    May 10 08:55:36	unbound	13085:0	info: 0.262144 0.524288 7
    May 10 08:55:36	unbound	13085:0	info: 0.131072 0.262144 9
    May 10 08:55:36	unbound	13085:0	info: 0.065536 0.131072 14
    May 10 08:55:36	unbound	13085:0	info: 0.032768 0.065536 7
    May 10 08:55:36	unbound	13085:0	info: 0.016384 0.032768 9
    May 10 08:55:36	unbound	13085:0	info: 0.008192 0.016384 8
    May 10 08:55:36	unbound	13085:0	info: 0.000256 0.000512 1
    May 10 08:55:36	unbound	13085:0	info: 0.000000 0.000001 3
    May 10 08:55:36	unbound	13085:0	info: lower(secs) upper(secs) recursions
    May 10 08:55:36	unbound	13085:0	info: [25%]=0.0213902 median[50%]=0.0725577 [75%]=0.16384
    May 10 08:55:36	unbound	13085:0	info: histogram of recursion processing times
    May 10 08:55:36	unbound	13085:0	info: average recursion processing time 0.172102 sec
    May 10 08:55:36	unbound	13085:0	info: server stats for thread 1: requestlist max 4 avg 0.389831 exceeded 0 jostled 0
    May 10 08:55:36	unbound	13085:0	info: server stats for thread 1: 86 queries, 27 answers from cache, 59 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    May 10 08:55:36	unbound	13085:0	info: 1.000000 2.000000 1
    May 10 08:55:36	unbound	13085:0	info: 0.524288 1.000000 1
    May 10 08:55:36	unbound	13085:0	info: 0.262144 0.524288 19
    May 10 08:55:36	unbound	13085:0	info: 0.131072 0.262144 33
    May 10 08:55:36	unbound	13085:0	info: 0.065536 0.131072 37
    May 10 08:55:36	unbound	13085:0	info: 0.032768 0.065536 29
    May 10 08:55:36	unbound	13085:0	info: 0.016384 0.032768 19
    May 10 08:55:36	unbound	13085:0	info: 0.008192 0.016384 26
    May 10 08:55:36	unbound	13085:0	info: 0.004096 0.008192 2
    May 10 08:55:36	unbound	13085:0	info: 0.000000 0.000001 5
    May 10 08:55:36	unbound	13085:0	info: lower(secs) upper(secs) recursions
    May 10 08:55:36	unbound	13085:0	info: [25%]=0.0250072 median[50%]=0.0743922 [75%]=0.174763
    May 10 08:55:36	unbound	13085:0	info: histogram of recursion processing times
    May 10 08:55:36	unbound	13085:0	info: average recursion processing time 0.120666 sec
    May 10 08:55:36	unbound	13085:0	info: server stats for thread 0: requestlist max 17 avg 2.08721 exceeded 0 jostled 0
    May 10 08:55:36	unbound	13085:0	info: server stats for thread 0: 219 queries, 47 answers from cache, 172 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    May 10 08:55:36	unbound	13085:0	info: service stopped (unbound 1.6.1).
    
    


  • Something must be triggering a false WAN interface configuration changed condition, that's what usually causes the Unbound restart but it's only run when the WAN DHCP lease gets renewed.


  • Rebel Alliance Global Moderator

    " DHCP registration "

    I do believe this is what causes the restarts every time a dhcp lease is updated, given out.. Atleast this use to be an issue - not sure if that was suppose to be fixed or not.  I don't have that set.  Only statics..



  • Thank you both for the suggestions. I'll look at the logs for concurrent events.



  • I compared the dns and dhcp logs and there is a correlation.

    For every instance of

    May 10 10:39:24	unbound	85179:0	info: service stopped (unbound 1.6.1).
    

    there is an instance of

    May 10 10:39:24	dhcpleases		Sending HUP signal to dns daemon(85179)
    

    Occasionally, it takes more than one attempt:

    May 10 10:33:06	dhcpleases		Sending HUP signal to dns daemon(85179)
    May 10 10:33:06	dhcpleases		kqueue error: unkown
    May 10 10:33:05	dhcpleases		Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    May 10 10:33:05	dhcpleases		Sending HUP signal to dns daemon(13085)
    

    Sometimes the HUP messages are preceeded by DHCPACK and sometimes by dhcpd restarting. It appears to be "normal", although messy would be a better word to describe it.



  • On second thought, maybe inefficient would be a better way to describe it. This is a home system. There are under 20 dhcp and dhcpv6 leases and under 30 reservations. In a large network with numerous devices, unbound would be spending more time restarting than resolving dns queries. Not very scalable.


  • Rebel Alliance Global Moderator

    There has been lots of discussion on this.. I have not paid much attention to the threads because I don't have it registering dhcp other than my devices I have set for static via dhcp.  Other devices that would just get something out of the dhcp pool.. This would normally just be guest devices since all my other devices I setup a reservation for.

    I have zero need to being able to resolve by name a friends phone for example that might be on my guest wifi network once in a blue moon.

    But I do recall there being multiple threads and much discussion on this - I do believe pfblocker came up in the conversations as well as some sort of contributor to the restarts.  Since it reloads its lists into unbound when they update.

    For now I would disable dhcp leases registration and see if that lowers the number of restarts you see.



  • Thanks for your reply. I did search and noted one thread in particular that was started a long time ago about this. It sounds like the same issue.

    I have reservations for all of my servers, desktops and laptops. Virtually all of the dynamic leases are mobile phones or tablets. I agree that it doesn't serve much purpose to register these leases. I occasionally have 30 tablets on the network in addition to the usual mobile phones. I'll take a look at the log when that happens next time. Irrespective of the utility of registering leases, the implementation (requiring unbound to restart) seems to be inherently inefficient.



  • I haven't looked at the implementation that closely but Unbound has a 'reload' subcommand that should be enough to reload all of the configuration file without requiring a full restart of the daemon. However, Unbound is run under chroot in pfSense that might be the reason why a restart is required to fully reload all of the settings. This is guesswork on my part based on what I know about Unbound and chroot'ed daemons in general.



  • Disable DHCP Registration and configure Host Overrides in DNS Resolver.  ;)

    As for unbound, BBCan177 is presently trying to use live reload to make change to unbound DB instead of having it reload conf file at every change. But he ran into problems with DB getting out of sync with the conf file.

    So at some point in the future, maybe DHCP could use the same technique.



  • @RonpfS:

    Disable DHCP Registration and configure Host Overrides in DNS Resolver.  ;)

    As for unbound, BBCan177 is presently trying to use live reload to make change to unbound DB instead of having it reload conf file at every change. But he ran into problems with DB getting out of sync with the conf file.

    So at some point in the future, maybe DHCP could use the same technique.

    Can you be more specific about host overrides? Maybe I'm missing something, but I didn't see any such setting.



  • Under Services / DNS Resolver / General Settings ?
    https://doc.pfsense.org/index.php/Unbound_DNS_Resolver

    Host Overrides allows creation of custom DNS responses/records to create new entries that do not exist in DNS outside the firewall, or to override DNS responses for other hosts.

    This is were I input the DHCP reservations so the hostname are resolved by unbound.

    Maybe you can just keep Static DHCP enabled to get the same result.



  • I'm glad that someone else posted on this topic, albeit back in 2017.
    I enabled TLD yesterday, and pfblocker is set to update at 0030.
    Sometime around 0200 this morning my pfsense box started sending out multiple emails from the service watchdog that the dhcpd service and unbound had stopped and it was restarting those services. However I did not receive my daily email from the pfsense box, normally that comes in around 5-530am.
    I'm unable to ssh into the pfsense box, and it was unresponsive this morning.
    Unfortunately I start my work week today, so I can't even delve into it.
    Luckily I do have a current backup, so I'm thinking I will follow the above, disable DHCP Registration and configure Host Overrides in DNS Resolver, and see if that solves the issues.

    2.4.2
    8gb ram

    ![Screenshot (20).png](/public/imported_attachments/1/Screenshot (20).png)
    ![Screenshot (20).png_thumb](/public/imported_attachments/1/Screenshot (20).png_thumb)