Unbound service start problem



  • pfSense 2.1 and Unbound, x86, stable config (apart from updated to IPs in rules) for months.

    I'm not sure what the issue is. Unbound is these days regularly halting, and neither "Services status -> restart" nor the "services watchdog" package (which presumably also uses restart) can restart it. Unbound service will restart if disabled and re-enabled though.

    It's done this maybe 8 - 12 times this fortnight. The logs don't provide any information to know what's going on. I'd like not to have to completely uninstall/reinstall it, if it can be diagnosed some easier way.

    Next time it stops and won't auto-restart, is there a manual version of the auto-restart command with verbose logging (or other things to try) to get a better idea what the error is?



  • Anything in the logs which might be give an idea as to why?



  • No; nothing useful in the logs.

    In the end I uninstalled/reinstalled the package, maybe that fixed something as it hasn't recurred for the past few hours?

    But if it did recur, the same would apply - I'd therefore appreciate how to get more info anyway.



  • Hmm… But i see this:

    php: config.inc: The command '/usr/pbi/unbound-amd64/sbin/unbound-control start' returned exit code '1', the output was '/libexec/ld-elf.so.1: /usr/sbin/unbound: Undefined symbol "_ThreadRuneLocale"'
    

    PS: It's 2.1.1. In my case i think because downgrade from 2.2

    root(103): where unbound
    /usr/sbin/unbound
    /usr/local/sbin/unbound
    

    PPS: After cleanup all running ok.
    But I think it might conflict with built-in whatchdog.



  • @stilez:

    No; nothing useful in the logs.

    In the end I uninstalled/reinstalled the package, maybe that fixed something as it hasn't recurred for the past few hours?

    But if it did recur, the same would apply - I'd therefore appreciate how to get more info anyway.

    Has it happened again at all?
    Otherwise still trying to see if I can simulate the issue but to no avail :/



  • @Michael:

    Hmm… But i see this:

    php: config.inc: The command '/usr/pbi/unbound-amd64/sbin/unbound-control start' returned exit code '1', the output was '/libexec/ld-elf.so.1: /usr/sbin/unbound: Undefined symbol "_ThreadRuneLocale"'
    

    PS: It's 2.1.1. In my case i think because downgrade from 2.2

    root(103): where unbound
    /usr/sbin/unbound
    /usr/local/sbin/unbound
    

    PPS: After cleanup all running ok.

    Yeah - Unbound is going into 2.2 base so using you will have issues on there. On 2.2 you can check for 'DNS Resolver' under Services. Its near complete (barring the nice to have stuff) but does the job that package does. Upgrade code to convert existing unbound package installs into the base still needs to be done as well.



  • I have the same problem - but worse.  Back in January 2014 I installed unbound to pfSense v2.1.0.  It worked great until I rebooted the pfSense - then it would stop caching.  I messed around with it for a few days but could not get it to work without uninstalling/reinstalling after every reboot (stop/start was not enough).

    I decided to try it again with pfSense v2.1.2, but it has the same issue.  It works GREAT when I first install it.  It even worked for a few days… until I rebooted the pfSense this morning and now I cannot get it to restart.  I played around with the settings for several hours this morning but just managed to lock up the firewall (it hung on "starting unbound") and I could not get back in the gui.

    I just uninstalled it again.  :'(

    Rather than not do anything about it I decided to report it in the forums in the hope that people will not view this as whining rant but as a genuine desire to give back to the pfSense community to help improve the product.

    edit
    Our current setup:
    PFSense 2.1.2
    2 WAN's with failover
    4 LAN's
    installed packages:
    PFBlocker
    Squid
    Squidguard
    openVPN client
    NUT
    Cron
    NTPD
    dhcpd
    apinger



  • Hello Everybody,

    same here ^^

    some news on this issues how to solve it?

    greetz



  • I'm seeing something similar.  Unbound will just exit with nothing interesting in the logs.  Using the service menu to restart fails with no error message.  It really feels like something is amiss with the start/stop script.  The only way I've found to restart it is to save the unbound settings on its settings page.

    This is pfsense 2.1.3.  I don't recall what I was running prior to this, either 2.1.2 or 2.1.1, but the issue did not occur there.  The packages tab reports I have unbound version 1.4.22_2.

    I think it's connected witha filter reload.  I caught it stalling when an openvpn client disconnected, and other failures may very well coincide with one of my wan connections failing over.  From the system logs:

    ul 2 11:36:35	kernel: ovpns1: link state changed to DOWN
    Jul 2 11:36:35	check_reload_status: Reloading filter
    Jul 2 11:36:36	kernel: ovpns1: link state changed to UP
    Jul 2 11:36:36	check_reload_status: Reloading filter
    Jul 2 11:36:36	check_reload_status: rc.newwanip starting ovpns1
    Jul 2 11:36:37	check_reload_status: Syncing firewall
    Jul 2 11:36:41	kernel: Bump sched buckets to 256 (was 0)
    Jul 2 11:36:43	php: rc.newwanip: rc.newwanip: Informational is starting ovpns1.
    Jul 2 11:36:43	php: rc.newwanip: rc.newwanip: on (IP address: 10.3.3.1) (interface: []) (real interface: ovpns1).
    Jul 2 11:36:43	php: rc.newwanip: pfSense package system has detected an ip change -> 10.3.3.1 ... Restarting packages.
    Jul 2 11:36:43	check_reload_status: Starting packages
    Jul 2 11:36:49	kernel: Bump sched buckets to 256 (was 0)
    Jul 2 11:36:51	php: rc.start_packages: Restarting/Starting all packages.
    Jul 2 11:36:55	kernel: Bump sched buckets to 256 (was 0)
    Jul 2 11:39:31	check_reload_status: Syncing firewall
    Jul 2 11:40:13	check_reload_status: Reloading filter
    Jul 2 11:40:17	kernel: Bump sched buckets to 256 (was 0)
    

    And the unbound logs from the same period:

    Jul 2 11:36:51	unbound: [23619:0] info: service stopped (unbound 1.4.22).
    Jul 2 11:36:51	unbound: [23619:0] info: server stats for thread 0: 7482 queries, 3107 answers from cache, 4375 recursions, 0 prefetch
    Jul 2 11:36:51	unbound: [23619:0] info: server stats for thread 0: requestlist max 31 avg 2.4464 exceeded 0 jostled 0
    Jul 2 11:36:51	unbound: [23619:0] info: average recursion processing time 0.222316 sec
    Jul 2 11:36:51	unbound: [23619:0] info: histogram of recursion processing times
    Jul 2 11:36:51	unbound: [23619:0] info: [25%]=0.0216225 median[50%]=0.0547315 [75%]=0.182784
    Jul 2 11:36:51	unbound: [23619:0] info: lower(secs) upper(secs) recursions
    Jul 2 11:36:51	unbound: [23619:0] info: 0.000000 0.000001 256
    Jul 2 11:36:51	unbound: [23619:0] info: 0.000512 0.001024 1
    Jul 2 11:36:51	unbound: [23619:0] info: 0.004096 0.008192 2
    Jul 2 11:36:51	unbound: [23619:0] info: 0.008192 0.016384 539
    Jul 2 11:36:51	unbound: [23619:0] info: 0.016384 0.032768 925
    Jul 2 11:36:51	unbound: [23619:0] info: 0.032768 0.065536 693
    Jul 2 11:36:51	unbound: [23619:0] info: 0.065536 0.131072 638
    Jul 2 11:36:51	unbound: [23619:0] info: 0.131072 0.262144 576
    Jul 2 11:36:51	unbound: [23619:0] info: 0.262144 0.524288 251
    Jul 2 11:36:51	unbound: [23619:0] info: 0.524288 1.000000 289
    Jul 2 11:36:51	unbound: [23619:0] info: 1.000000 2.000000 144
    Jul 2 11:36:51	unbound: [23619:0] info: 2.000000 4.000000 47
    Jul 2 11:36:51	unbound: [23619:0] info: 4.000000 8.000000 11
    Jul 2 11:36:51	unbound: [23619:0] info: 8.000000 16.000000 3
    Jul 2 11:36:52	unbound: [46406:0] notice: init module 0: validator
    Jul 2 11:36:52	unbound: [46406:0] notice: init module 1: iterator
    Jul 2 11:36:52	unbound: [46406:0] info: start of service (unbound 1.4.22).
    Jul 2 11:36:53	unbound: [46406:0] info: service stopped (unbound 1.4.22).
    Jul 2 11:36:53	unbound: [46406:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch
    Jul 2 11:36:53	unbound: [46406:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
    Jul 2 11:36:54	unbound: [51840:0] notice: init module 0: validator
    Jul 2 11:36:54	unbound: [51840:0] notice: init module 1: iterator
    Jul 2 11:36:54	unbound: [51840:0] info: start of service (unbound 1.4.22).
    Jul 2 11:36:56	unbound: [51840:0] info: service stopped (unbound 1.4.22).
    Jul 2 11:36:56	unbound: [51840:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch
    Jul 2 11:36:56	unbound: [51840:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
    


  • I'll also add that I reinstalled the unbound package in hopes that if there was any old cruft around, that might get rid of it, but no such luck.



  • Hello again,

    unbound is still useless for me… is this so difficult to debug?

    regards



  • My unbound runs OK. (2.1.x, x64)
    Have you tried, without the cache restoration option turned on?

    With cache restoration turned on, my system reboot would take forever, because of unbound hanging/processing a maybe corrupt cache-file.