2.4.4_1: Unbound Not Starting Upon Reboot
-
I noticed this in 2.4.4 and still see it in 2.4.4_1. If I reboot the pfSense box, upon re-logging in I find Unbound to be stopped. I have to manually restart the service. Once I manually start it, it seems to work just fine.
Nothing leaps out at me from the DNS Resolver log (and I'm not sure what the exact time of my last reboot was), but here's a chunk of it:Dec 3 08:46:25 unbound 91958:3 info: generate keytag query _ta-4a5c-4f66. NULL IN Dec 3 08:46:24 unbound 91958:0 info: start of service (unbound 1.8.1). Dec 3 08:46:24 unbound 91958:0 notice: init module 1: iterator Dec 3 08:46:24 unbound 91958:0 notice: init module 0: validator Dec 3 08:27:42 unbound 39849:3 info: generate keytag query _ta-4a5c-4f66. NULL IN Dec 3 08:27:11 unbound 39849:0 info: start of service (unbound 1.8.1). Dec 3 08:27:11 unbound 39849:0 notice: init module 1: iterator Dec 3 08:27:11 unbound 39849:0 notice: init module 0: validator Dec 3 08:27:08 unbound 26959:0 info: 0.131072 0.262144 1 Dec 3 08:27:08 unbound 26959:0 info: 0.065536 0.131072 1 Dec 3 08:27:08 unbound 26959:0 info: lower(secs) upper(secs) recursions Dec 3 08:27:08 unbound 26959:0 info: [25%]=0 median[50%]=0 [75%]=0 Dec 3 08:27:08 unbound 26959:0 info: histogram of recursion processing times Dec 3 08:27:08 unbound 26959:0 info: average recursion processing time 0.190406 sec Dec 3 08:27:08 unbound 26959:0 info: server stats for thread 3: requestlist max 0 avg 0 exceeded 0 jostled 0 Dec 3 08:27:08 unbound 26959:0 info: server stats for thread 3: 2 queries, 0 answers from cache, 2 recursions, 0 prefetch, 0 rejected by ip ratelimiting Dec 3 08:27:08 unbound 26959:0 info: server stats for thread 2: requestlist max 0 avg 0 exceeded 0 jostled 0 Dec 3 08:27:08 unbound 26959:0 info: server stats for thread 2: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting Dec 3 08:27:08 unbound 26959:0 info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0 Dec 3 08:27:08 unbound 26959:0 info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting Dec 3 08:27:08 unbound 26959:0 info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 Dec 3 08:27:08 unbound 26959:0 info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting Dec 3 08:27:08 unbound 26959:0 info: service stopped (unbound 1.8.1). Dec 3 08:27:07 unbound 26959:3 info: generate keytag query _ta-4a5c-4f66. NULL IN Dec 3 08:26:34 unbound 26959:0 info: start of service (unbound 1.8.1).
EDIT: I found the time (08:37:53 - 08:37:56) and more info in the General log:
Dec 3 08:37:56 php-cgi rc.bootup: sync unbound done. Dec 3 08:37:56 php-cgi rc.bootup: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1543855076] unbound[3652:0] error: can't bind socket: Can't assign requested address for fe80::208:a2ff:fe0b:9184 port 53 [1543855076] unbound[3652:0] fatal error: could not open ports' Dec 3 08:37:55 check_reload_status Linkup starting igb0 Dec 3 08:37:55 kernel igb0: link state changed to UP Dec 3 08:37:55 php-cgi rc.bootup: dhcp6 init complete. Continuing Dec 3 08:37:54 php-cgi rc.bootup: Unbound start waiting on dhcp6c. Dec 3 08:37:53 php-cgi rc.bootup: Unbound start waiting on dhcp6c.
-
Having the exact same problem. Running on an SG-3100.
-
@beremonavabi said in 2.4.4_1: Unbound Not Starting Upon Reboot:
could not open ports
I used these words "could not open ports" and had a grep into all my log files (starting at 2017-01-01) and found the same line.
2017-11-16 16:19:46 Daemon.Error 192.168.1.1 Nov 16 16:19:45 php-fpm[9926]: /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1510845585] unbound[60104:0] error: bind: address already in use [1510845585] unbound[60104:0] fatal error: could not open ports'
The context was : my WAN IP (interface) went down - I had a Link DOWN/UP and came back right away.
At that moment, many scripts kick in, and as far as I know unbound restart several times.
What probably happened is : unbound is instructed to stop - and restart on the fly. The old instance couldn't release ports it was listening on (maybe the entire interface vanished ? - isn't ready yet / not up yet) and the restart - new instance - errors out : an old instance is still on that interface & port.More info can be seen if you use the entire log from boot up until the point where unbound gets start - restarted - many times probably.
Note : that I found this errors ones (2017-11-16) in 13 month of 24/24H pfSense usage.
-
I wonder if this bug report is related:
https://redmine.pfsense.org/issues/7096
As in that report, I've got specific interfaces selected in Services > DNS Resolver > General Settings > Network Interfaces instead of the default ALL. I haven't checked the addresses, yet, though.
-
@beremonavabi said in 2.4.4_1: Unbound Not Starting Upon Reboot:
I wonder if this bug report is related:
https://redmine.pfsense.org/issues/7096
As in that report, I've got specific interfaces selected in Services > DNS Resolver > General Settings > Network Interfaces instead of the default ALL. I haven't checked the addresses, yet, though.
Yep. It sure looks like it's that bug to me. If I change the Network Interfaces to the default ALL, pfSense reboots fine and Unbound comes up running. If I go back and change those interfaces back to be just my local interfaces (LAN, LAN IPv6 Link-Local, and Localhost) and reboot, when pfSense comes back, Unbound is not running. Also, it takes pfSense an additional 30 seconds to finish signing on and load the Dashboard (according to Diagnostics > Monitoring, it's not a CPU load issue). Checking Status > System Logs > System > General shows this:
Dec 7 10:18:26 php-cgi rc.bootup: sync unbound done. Dec 7 10:18:26 php-cgi rc.bootup: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1544206706] unbound[6088:0] error: can't bind socket: Can't assign requested address for fe80::208:a2ff:fe0b:9184 port 53 [1544206706] unbound[6088:0] fatal error: could not open ports' Dec 7 10:18:26 php-cgi rc.bootup: dhcp6 init complete. Continuing Dec 7 10:18:25 php-cgi rc.bootup: Unbound start waiting on dhcp6c. Dec 7 10:18:24 php-fpm 341 /rc.newwanipv6: Removing static route for monitor fe80::242:5aff:fe9e:b019 and adding a new route through fe80::242:5aff:fe9e:b019%igb1 Dec 7 10:18:24 php-fpm 341 /rc.newwanipv6: rc.newwanipv6: on (IP address: 2600:8801:7f01:1600:57c:cf0c:99ff:9d51) (interface: wan) (real interface: igb1). Dec 7 10:18:24 php-fpm 341 /rc.newwanipv6: rc.newwanipv6: Info: starting on igb1. Dec 7 10:18:24 php-cgi rc.bootup: Unbound start waiting on dhcp6c.
As that bug report notes, the interface it can't assign the requested address for is one off from the WAN interface's actual address and the WAN isn't even included in the interfaces it's supposed to be listening to.
I suppose just accepting the default ALL on DNS Resolver's Network Interfaces is a possibility. But, first off, I don't want it listening on the WAN port for DNS queries, and secondly, I've got Enable SSL/TLS Service turned on. The text associated with that says:
"Respond to incoming SSL/TLS queries from local clients
Configures the DNS Resolver to act as a DNS over SSL/TLS server which can answer queries from clients which also support DNS over TLS. Activating this option disables automatic interface response routing behavior, thus it works best with specific interface bindings." -
I've been lurking on this thread...the delay on login is going to be because it's trying to resolve DNS queries and timing out. There is at least one other recent thread about that from another perspective. For instance by default the home page checks for pfSense updates.
If it was set to ALL, then just don't allow port 53 on the WAN? (I recognize the suggestion to use specific interface bindings is a problem, but using ALL might help others that aren't using DNS over SSL/TLS)
Edit: I meant to note that we have routers I recently upgraded 2.4.4 with only certain interfaces selected, however it is on WAN, LAN, and localhost, with outgoing network interfaces set to All. I don't think we are seeing this issue...?
-
Similar issue over here, 2.4.5-RELEASE-p1 having LAN, VLAN and WAN1, WAN2 (LoadBalancing&Failover) and IPv4, IPv6 and pfb_dnsbl (stable) and snort (stable). Unbound was starting before 2.4.5 without any issues.
see also similar on pfsense.org:
Bug #9567
Bug #7455
Bug #7096