2.4.4_1: unbound frequently stops answering domain overrides
-
I found this issue after upgrading to 2.4.4-p1 (from 2.4.4) on my APU2 board.
pfSense running on 192.168.16.1 with unbound on ALL interfaces enabled. Here several host overrides are setup together with a couple of domain overrides. We have a separate DHCP and DNS running on 192.168.16.203. The separate DHCP server gives out the firewall's IP for nameserver and the DHCP server in pfsense is disabled.The 192.168.16.203 DNS is secondary for a couple of internal zones (other sites connected using OpenVPN). Unbound points several zones over to 192.168.16.203 for resolving.
We have had this setup for a long time but with 2.4.4-p1 we have started to get "not found" replies from the firewall meaning clients cannot connect to servers at other sites. The DNS resolves the address correctly but not unbound.
From a client in the network (192.168.17.0/24 is a remote network with a DNS zone hosted by 192.168.16.203):msa@sieglinde:~$ host fs01.internaldomain.local 192.168.16.1 Using domain server: Name: 192.168.16.1 Address: 192.168.16.1#53 Aliases: Host fs01.internaldomain.local not found: 3(NXDOMAIN) msa@sieglinde:~$ host fs01.internaldomain.local 192.168.16.203 Using domain server: Name: 192.168.16.203 Address: 192.168.16.203#53 Aliases: fs01.internaldomain.local has address 192.168.17.162
Restarting unbound on the firewall and all is well again:
msa@sieglinde:~$ host fs01.internaldomain.local 192.168.16.1 Using domain server: Name: 192.168.16.1 Address: 192.168.16.1#53 Aliases: fs01.internaldomain.local has address 192.168.17.162
The problem is that this works just for a while (sometimes hours sometimes less) before it is time again to restart unbound. I have checked the log-files but don't see anything except the unbound stop and start messages when I manually restart the service.
Dec 9 08:56:19 unbound 75625:0 info: generate keytag query _ta-4a5c-4f66. NULL IN Dec 9 08:56:16 unbound 75625:0 info: start of service (unbound 1.8.1). Dec 9 08:56:16 unbound 75625:0 notice: init module 1: iterator Dec 9 08:56:16 unbound 75625:0 notice: init module 0: validator Dec 9 08:56:13 unbound 15426:0 info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0 Dec 9 08:56:13 unbound 15426:0 info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting Dec 9 08:56:13 unbound 15426:0 info: 32.000000 64.000000 18 Dec 9 08:56:13 unbound 15426:0 info: 16.000000 32.000000 46 Dec 9 08:56:13 unbound 15426:0 info: 8.000000 16.000000 58 Dec 9 08:56:13 unbound 15426:0 info: 4.000000 8.000000 114 Dec 9 08:56:13 unbound 15426:0 info: 2.000000 4.000000 268 Dec 9 08:56:13 unbound 15426:0 info: 1.000000 2.000000 379 Dec 9 08:56:13 unbound 15426:0 info: 0.524288 1.000000 825 Dec 9 08:56:13 unbound 15426:0 info: 0.262144 0.524288 3115 Dec 9 08:56:13 unbound 15426:0 info: 0.131072 0.262144 5210 Dec 9 08:56:13 unbound 15426:0 info: 0.065536 0.131072 4818 Dec 9 08:56:13 unbound 15426:0 info: 0.032768 0.065536 3189 Dec 9 08:56:13 unbound 15426:0 info: 0.016384 0.032768 582 Dec 9 08:56:13 unbound 15426:0 info: 0.008192 0.016384 22 Dec 9 08:56:13 unbound 15426:0 info: 0.004096 0.008192 27 Dec 9 08:56:13 unbound 15426:0 info: 0.002048 0.004096 179 Dec 9 08:56:13 unbound 15426:0 info: 0.001024 0.002048 214 Dec 9 08:56:13 unbound 15426:0 info: 0.000512 0.001024 238 Dec 9 08:56:13 unbound 15426:0 info: 0.000256 0.000512 10 Dec 9 08:56:13 unbound 15426:0 info: 0.000000 0.000001 379 Dec 9 08:56:13 unbound 15426:0 info: lower(secs) upper(secs) recursions Dec 9 08:56:13 unbound 15426:0 info: [25%]=0.0666616 median[50%]=0.135789 [75%]=0.259635 Dec 9 08:56:13 unbound 15426:0 info: histogram of recursion processing times Dec 9 08:56:13 unbound 15426:0 info: average recursion processing time 0.384283 sec Dec 9 08:56:13 unbound 15426:0 info: server stats for thread 0: requestlist max 48 avg 1.42532 exceeded 0 jostled 0 Dec 9 08:56:13 unbound 15426:0 info: server stats for thread 0: 137145 queries, 117454 answers from cache, 19691 recursions, 0 prefetch, 0 rejected by ip ratelimiting Dec 9 08:56:13 unbound 15426:0 info: service stopped (unbound 1.8.1).
External lookups and host overrides continue to work even when the domain overrides are not working.
Nothing too fancy going on in unbound and the configuration has been carried over since at least 2.3.x of pfsense.
Any ideas?
-
Unbound was updated to 1.8.1 and has a bug where its single threaded. Enter this in custom options under Services | DNS Resolver
server:
so-reuseport: noSee this thread for the details:
https://forum.netgate.com/topic/138274/unbound-1-8-1-only-single-thread-processing-dns-requests/5 -
@lnguyen
Thanks. Added that and keeping fingers crossed :-) -
@matsan I had to restart the DNS Resolver service again so that workaround may not be related to this bug.
-
@matsan I disabled DNSSEC and that seems to be a workaround but compromises DNS security.
-
@lnguyen said in 2.4.4_1: unbound frequently stops answering domain overrides:
@matsan I disabled DNSSEC and that seems to be a workaround but compromises DNS security.
Will try that as well. Restarted once this morning already...
-
@matsan Did disabling DNSSEC work for you?
-
@lnguyen
So far so good. -
@lnguyen I am not the original person that started this thread but I had a problem that seems the same. I always had the problem where my Domain Override in DNS Resolver would stop working. With 2.4.3 and older it would happen not very often. With 2.4.4-RELEASE-p2 it happens relatively often. If I simply save/apply settings on the DNS Resolver page (may work for other pages...not sure) it then works for a while. I don't need to change anything. There is nothing in the logs that I can see.
I disabled DNSSEC and that seems to have kept things working. I will also try the method referenced where unbound threading config is changed.
-
-
I did notice that only forward zone domain overrides failed with DNSSEC enabled. Reverse zone donain overrides work perfectly fine whether DNSSEC is disabled or enabled.