2.4.4_1: Unbound Not Starting Upon Reboot



  • I noticed this in 2.4.4 and still see it in 2.4.4_1. If I reboot the pfSense box, upon re-logging in I find Unbound to be stopped. I have to manually restart the service. Once I manually start it, it seems to work just fine.
    Nothing leaps out at me from the DNS Resolver log (and I'm not sure what the exact time of my last reboot was), but here's a chunk of it:

    Dec 3 08:46:25	unbound	91958:3	info: generate keytag query _ta-4a5c-4f66. NULL IN
    Dec 3 08:46:24	unbound	91958:0	info: start of service (unbound 1.8.1).
    Dec 3 08:46:24	unbound	91958:0	notice: init module 1: iterator
    Dec 3 08:46:24	unbound	91958:0	notice: init module 0: validator
    Dec 3 08:27:42	unbound	39849:3	info: generate keytag query _ta-4a5c-4f66. NULL IN
    Dec 3 08:27:11	unbound	39849:0	info: start of service (unbound 1.8.1).
    Dec 3 08:27:11	unbound	39849:0	notice: init module 1: iterator
    Dec 3 08:27:11	unbound	39849:0	notice: init module 0: validator
    Dec 3 08:27:08	unbound	26959:0	info: 0.131072 0.262144 1
    Dec 3 08:27:08	unbound	26959:0	info: 0.065536 0.131072 1
    Dec 3 08:27:08	unbound	26959:0	info: lower(secs) upper(secs) recursions
    Dec 3 08:27:08	unbound	26959:0	info: [25%]=0 median[50%]=0 [75%]=0
    Dec 3 08:27:08	unbound	26959:0	info: histogram of recursion processing times
    Dec 3 08:27:08	unbound	26959:0	info: average recursion processing time 0.190406 sec
    Dec 3 08:27:08	unbound	26959:0	info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0
    Dec 3 08:27:08	unbound	26959:0	info: server stats for thread 3: 2 queries, 0 answers from cache, 2 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Dec 3 08:27:08	unbound	26959:0	info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0
    Dec 3 08:27:08	unbound	26959:0	info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Dec 3 08:27:08	unbound	26959:0	info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
    Dec 3 08:27:08	unbound	26959:0	info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Dec 3 08:27:08	unbound	26959:0	info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
    Dec 3 08:27:08	unbound	26959:0	info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Dec 3 08:27:08	unbound	26959:0	info: service stopped (unbound 1.8.1).
    Dec 3 08:27:07	unbound	26959:3	info: generate keytag query _ta-4a5c-4f66. NULL IN
    Dec 3 08:26:34	unbound	26959:0	info: start of service (unbound 1.8.1).
    

    EDIT: I found the time (08:37:53 - 08:37:56) and more info in the General log:

    Dec 3 08:37:56	php-cgi		rc.bootup: sync unbound done.
    Dec 3 08:37:56	php-cgi		rc.bootup: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1543855076] unbound[3652:0] error: can't bind socket: Can't assign requested address for fe80::208:a2ff:fe0b:9184 port 53 [1543855076] unbound[3652:0] fatal error: could not open ports'
    Dec 3 08:37:55	check_reload_status		Linkup starting igb0
    Dec 3 08:37:55	kernel		igb0: link state changed to UP
    Dec 3 08:37:55	php-cgi		rc.bootup: dhcp6 init complete. Continuing
    Dec 3 08:37:54	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    Dec 3 08:37:53	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    


  • Having the exact same problem. Running on an SG-3100.



  • @beremonavabi said in 2.4.4_1: Unbound Not Starting Upon Reboot:

    could not open ports

    I used these words "could not open ports" and had a grep into all my log files (starting at 2017-01-01) and found the same line.

    2017-11-16 16:19:46	Daemon.Error	192.168.1.1	Nov 16 16:19:45 php-fpm[9926]: /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1510845585] unbound[60104:0] error: bind: address already in use [1510845585] unbound[60104:0] fatal error: could not open ports'
    

    The context was : my WAN IP (interface) went down - I had a Link DOWN/UP and came back right away.
    At that moment, many scripts kick in, and as far as I know unbound restart several times.
    What probably happened is : unbound is instructed to stop - and restart on the fly. The old instance couldn't release ports it was listening on (maybe the entire interface vanished ? - isn't ready yet / not up yet) and the restart - new instance - errors out : an old instance is still on that interface & port.

    More info can be seen if you use the entire log from boot up until the point where unbound gets start - restarted - many times probably.

    Note : that I found this errors ones (2017-11-16) in 13 month of 24/24H pfSense usage.



  • I wonder if this bug report is related:

    https://redmine.pfsense.org/issues/7096

    As in that report, I've got specific interfaces selected in Services > DNS Resolver > General Settings > Network Interfaces instead of the default ALL. I haven't checked the addresses, yet, though.



  • @beremonavabi said in 2.4.4_1: Unbound Not Starting Upon Reboot:

    I wonder if this bug report is related:

    https://redmine.pfsense.org/issues/7096

    As in that report, I've got specific interfaces selected in Services > DNS Resolver > General Settings > Network Interfaces instead of the default ALL. I haven't checked the addresses, yet, though.

    Yep. It sure looks like it's that bug to me. If I change the Network Interfaces to the default ALL, pfSense reboots fine and Unbound comes up running. If I go back and change those interfaces back to be just my local interfaces (LAN, LAN IPv6 Link-Local, and Localhost) and reboot, when pfSense comes back, Unbound is not running. Also, it takes pfSense an additional 30 seconds to finish signing on and load the Dashboard (according to Diagnostics > Monitoring, it's not a CPU load issue). Checking Status > System Logs > System > General shows this:

    Dec 7 10:18:26	php-cgi		rc.bootup: sync unbound done.
    Dec 7 10:18:26	php-cgi		rc.bootup: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1544206706] unbound[6088:0] error: can't bind socket: Can't assign requested address for fe80::208:a2ff:fe0b:9184 port 53 [1544206706] unbound[6088:0] fatal error: could not open ports'
    Dec 7 10:18:26	php-cgi		rc.bootup: dhcp6 init complete. Continuing
    Dec 7 10:18:25	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    Dec 7 10:18:24	php-fpm	341	/rc.newwanipv6: Removing static route for monitor fe80::242:5aff:fe9e:b019 and adding a new route through fe80::242:5aff:fe9e:b019%igb1
    Dec 7 10:18:24	php-fpm	341	/rc.newwanipv6: rc.newwanipv6: on (IP address: 2600:8801:7f01:1600:57c:cf0c:99ff:9d51) (interface: wan) (real interface: igb1).
    Dec 7 10:18:24	php-fpm	341	/rc.newwanipv6: rc.newwanipv6: Info: starting on igb1.
    Dec 7 10:18:24	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    

    As that bug report notes, the interface it can't assign the requested address for is one off from the WAN interface's actual address and the WAN isn't even included in the interfaces it's supposed to be listening to.

    I suppose just accepting the default ALL on DNS Resolver's Network Interfaces is a possibility. But, first off, I don't want it listening on the WAN port for DNS queries, and secondly, I've got Enable SSL/TLS Service turned on. The text associated with that says:

    "Respond to incoming SSL/TLS queries from local clients
    Configures the DNS Resolver to act as a DNS over SSL/TLS server which can answer queries from clients which also support DNS over TLS. Activating this option disables automatic interface response routing behavior, thus it works best with specific interface bindings."



  • I've been lurking on this thread...the delay on login is going to be because it's trying to resolve DNS queries and timing out. There is at least one other recent thread about that from another perspective. For instance by default the home page checks for pfSense updates.

    If it was set to ALL, then just don't allow port 53 on the WAN? (I recognize the suggestion to use specific interface bindings is a problem, but using ALL might help others that aren't using DNS over SSL/TLS)

    Edit: I meant to note that we have routers I recently upgraded 2.4.4 with only certain interfaces selected, however it is on WAN, LAN, and localhost, with outgoing network interfaces set to All. I don't think we are seeing this issue...?