Unbound and dnsmasq frequently crash



  • Hello. I'm running pfSense 2.4.4-RELEASE-p2 on a PC Engines APU device. I've been noticing that unbound frequently crashes and is restarted from a default configuration. The logs I can see through the web console don't offer much help either. All it shows are these messages over and over again (along with server statistics):

    info: start of service (unbound 1.8.1).
    info: service stopped (unbound 1.8.1).

    I noticed several threads describing a bug about Unbound running as a single threaded process. In those threads, there seemed to be two pieces of advice:

    1. Set server:so-reuseport: no
    2. Disable DNSSEC

    I tried both of these things, but they actually seem to have made the problem worse.

    Afterwards, I also tried using the DNS Forwarder (dnsmasq) instead of the DNS Resolver (unbound). As with unbound, I used a default configuration. This actually led to an improvement, but dnsmasq still ultimately crashed, again without much information in the logs.

    The failures cause a noticeable lag when browsing websites, since clients wait for a response from dnsmasq/unbound before trying an upstream DNS server. Any thoughts on how the issues can be resolved, or how more information about the failures can be found? Thank you.


  • LAYER 8 Global Moderator

    You sure they are not just stop and starting because of dhcp lease.

    Set unbound to not register dhcp leases, only static reservations.



  • Thanks for your reply. I checked and DHCP registration is not set. My settings are:




  • @kjiwa said in Unbound and dnsmasq frequently crash:

    As long as you see a

    info: service stopped (unbound 1.8.1).

    Followed by a

    info: start of service (unbound 1.8.1).

    You will now know that it isn't a crash but a restart.
    A crash is easy to recognize : the process is still in memory, but doing nothing. Also : a crash will not log out "service stopped".
    Unbound, most often, will not restart itself, it is instructed to do so.
    The DNS log can reveal many clues about the reason.
    An interface goes up and down.
    Another service goes up and down.
    DHCP could be a (a very known) reason.
    Other packages that work a lot with DNS can restart unbound.

    And yes, dns resolving stops during unbound restarts. It's just a side effect, one more reason to take some time and and investigate the issue.
    On a default system, this one : "Set unbound to not register dhcp leases, only static reservations" will bring unbound restarting to a halt (except maybe for some interfaces going up and down).

    Some packages are known for unbound/dnsmasq kicking.


  • LAYER 8 Global Moderator

    when you say over an over again on the restarts... How often, once a minute, day, hour what?



  • Thank you both for your responses. @Gertjan You are correct that the static mappings part is not part of the default configuration, but I turned it off earlier and then back on after johnpoz's feedback.

    unbound is restarting approximately every 10 seconds. In the most recent logs it restarted 3 times in the span of 24 seconds.


  • LAYER 8 Global Moderator

    Well yeah that would make it completely useless then sure..

    Do you have anything else installed? Any other packages?



  • No, it was a pretty fresh install. I had 2.4.4-RELEASE-p1 installed from a memstick installer, and then upgraded to p2.


  • LAYER 8 Global Moderator

    well clearly it shouldn't be doing that.. Mine restarts every few days maybe..

    What is in your system log?



  • I see these messages repeated:

    Jan 12 17:24:15 unbound 21782:0 info: server stats for thread 0: 8 queries, 0 answers from cache, 8 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Jan 12 17:24:15 unbound 21782:0 info: server stats for thread 0: requestlist max 2 avg 0.5 exceeded 0 jostled 0
    Jan 12 17:24:15 unbound 21782:0 info: average recursion processing time 0.133886 sec
    Jan 12 17:24:15 unbound 21782:0 info: histogram of recursion processing times
    Jan 12 17:24:15 unbound 21782:0 info: [25%]=1e-06 median[50%]=0.065536 [75%]=0.262144
    Jan 12 17:24:15 unbound 21782:0 info: lower(secs) upper(secs) recursions
    Jan 12 17:24:15 unbound 21782:0 info: 0.000000 0.000001 2
    Jan 12 17:24:15 unbound 21782:0 info: 0.032768 0.065536 2
    Jan 12 17:24:15 unbound 21782:0 info: 0.065536 0.131072 1
    Jan 12 17:24:15 unbound 21782:0 info: 0.131072 0.262144 1
    Jan 12 17:24:15 unbound 21782:0 info: 0.262144 0.524288 2
    Jan 12 17:24:15 unbound 21782:0 info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Jan 12 17:24:15 unbound 21782:0 info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
    Jan 12 17:24:19 unbound 80355:0 notice: init module 0: validator
    Jan 12 17:24:19 unbound 80355:0 notice: init module 1: iterator
    Jan 12 17:24:19 unbound 80355:0 info: start of service (unbound 1.8.1).
    Jan 12 17:24:20 unbound 80355:0 info: generate keytag query _ta-4f66. NULL IN
    Jan 12 17:24:22 unbound 80355:0 info: service stopped (unbound 1.8.1).



  • Ok : these are the startup bla-bla of unbound.

    How many

    info: start of service (unbound 1.8.1)
    

    a day ?



  • I inspected /var/log/resolver.log and found that it contains approximately 4800 lines for around 30 minutes.

    # grep "start of service" /var/log/resolver.log | wc -l
    479
    


  • Thanks for pointing out that unbound will restart whenever an interface changes status. Separately I found another thread where a user describes a similar issue (https://forum.netgate.com/topic/139513/dns-resolver-fails-to-work-when-pfsense-has-an-ipv6-address/).

    I inspected my DHCP log and found that the dhcp6 client was repeatedly releasing and renewing its lease, which then led me to find an erroneous setting in my modem. Unbound now appears to operate stably.

    Thank you both @johnpoz and @Gertjan. I doubt I'd have discovered the issue without your help and advice.