2.4.4_1: unbound frequently stops answering domain overrides



  • I found this issue after upgrading to 2.4.4-p1 (from 2.4.4) on my APU2 board.
    pfSense running on 192.168.16.1 with unbound on ALL interfaces enabled. Here several host overrides are setup together with a couple of domain overrides. We have a separate DHCP and DNS running on 192.168.16.203. The separate DHCP server gives out the firewall's IP for nameserver and the DHCP server in pfsense is disabled.

    The 192.168.16.203 DNS is secondary for a couple of internal zones (other sites connected using OpenVPN). Unbound points several zones over to 192.168.16.203 for resolving.
    We have had this setup for a long time but with 2.4.4-p1 we have started to get "not found" replies from the firewall meaning clients cannot connect to servers at other sites. The DNS resolves the address correctly but not unbound.
    From a client in the network (192.168.17.0/24 is a remote network with a DNS zone hosted by 192.168.16.203):

    msa@sieglinde:~$ host fs01.internaldomain.local 192.168.16.1
    Using domain server:
    Name: 192.168.16.1
    Address: 192.168.16.1#53
    Aliases:
    
    Host fs01.internaldomain.local not found: 3(NXDOMAIN)
    msa@sieglinde:~$ host fs01.internaldomain.local 192.168.16.203
    Using domain server:
    Name: 192.168.16.203
    Address: 192.168.16.203#53
    Aliases:
    
    fs01.internaldomain.local has address 192.168.17.162
    

    Restarting unbound on the firewall and all is well again:

    msa@sieglinde:~$ host fs01.internaldomain.local 192.168.16.1
    Using domain server:
    Name: 192.168.16.1
    Address: 192.168.16.1#53
    Aliases:
    
    fs01.internaldomain.local has address 192.168.17.162
    

    The problem is that this works just for a while (sometimes hours sometimes less) before it is time again to restart unbound. I have checked the log-files but don't see anything except the unbound stop and start messages when I manually restart the service.

    Dec 9 08:56:19	unbound	75625:0	info: generate keytag query _ta-4a5c-4f66. NULL IN
    Dec 9 08:56:16	unbound	75625:0	info: start of service (unbound 1.8.1).
    Dec 9 08:56:16	unbound	75625:0	notice: init module 1: iterator
    Dec 9 08:56:16	unbound	75625:0	notice: init module 0: validator
    Dec 9 08:56:13	unbound	15426:0	info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
    Dec 9 08:56:13	unbound	15426:0	info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Dec 9 08:56:13	unbound	15426:0	info: 32.000000 64.000000 18
    Dec 9 08:56:13	unbound	15426:0	info: 16.000000 32.000000 46
    Dec 9 08:56:13	unbound	15426:0	info: 8.000000 16.000000 58
    Dec 9 08:56:13	unbound	15426:0	info: 4.000000 8.000000 114
    Dec 9 08:56:13	unbound	15426:0	info: 2.000000 4.000000 268
    Dec 9 08:56:13	unbound	15426:0	info: 1.000000 2.000000 379
    Dec 9 08:56:13	unbound	15426:0	info: 0.524288 1.000000 825
    Dec 9 08:56:13	unbound	15426:0	info: 0.262144 0.524288 3115
    Dec 9 08:56:13	unbound	15426:0	info: 0.131072 0.262144 5210
    Dec 9 08:56:13	unbound	15426:0	info: 0.065536 0.131072 4818
    Dec 9 08:56:13	unbound	15426:0	info: 0.032768 0.065536 3189
    Dec 9 08:56:13	unbound	15426:0	info: 0.016384 0.032768 582
    Dec 9 08:56:13	unbound	15426:0	info: 0.008192 0.016384 22
    Dec 9 08:56:13	unbound	15426:0	info: 0.004096 0.008192 27
    Dec 9 08:56:13	unbound	15426:0	info: 0.002048 0.004096 179
    Dec 9 08:56:13	unbound	15426:0	info: 0.001024 0.002048 214
    Dec 9 08:56:13	unbound	15426:0	info: 0.000512 0.001024 238
    Dec 9 08:56:13	unbound	15426:0	info: 0.000256 0.000512 10
    Dec 9 08:56:13	unbound	15426:0	info: 0.000000 0.000001 379
    Dec 9 08:56:13	unbound	15426:0	info: lower(secs) upper(secs) recursions
    Dec 9 08:56:13	unbound	15426:0	info: [25%]=0.0666616 median[50%]=0.135789 [75%]=0.259635
    Dec 9 08:56:13	unbound	15426:0	info: histogram of recursion processing times
    Dec 9 08:56:13	unbound	15426:0	info: average recursion processing time 0.384283 sec
    Dec 9 08:56:13	unbound	15426:0	info: server stats for thread 0: requestlist max 48 avg 1.42532 exceeded 0 jostled 0
    Dec 9 08:56:13	unbound	15426:0	info: server stats for thread 0: 137145 queries, 117454 answers from cache, 19691 recursions, 0 prefetch, 0 rejected by ip ratelimiting
    Dec 9 08:56:13	unbound	15426:0	info: service stopped (unbound 1.8.1).
    

    External lookups and host overrides continue to work even when the domain overrides are not working.

    Nothing too fancy going on in unbound and the configuration has been carried over since at least 2.3.x of pfsense.

    Any ideas?



  • Unbound was updated to 1.8.1 and has a bug where its single threaded. Enter this in custom options under Services | DNS Resolver

    server:
    so-reuseport: no

    See this thread for the details:
    https://forum.netgate.com/topic/138274/unbound-1-8-1-only-single-thread-processing-dns-requests/5



  • @lnguyen
    Thanks. Added that and keeping fingers crossed :-)



  • @matsan I had to restart the DNS Resolver service again so that workaround may not be related to this bug.



  • @matsan I disabled DNSSEC and that seems to be a workaround but compromises DNS security.



  • @lnguyen said in 2.4.4_1: unbound frequently stops answering domain overrides:

    @matsan I disabled DNSSEC and that seems to be a workaround but compromises DNS security.

    Will try that as well. Restarted once this morning already...



  • @matsan Did disabling DNSSEC work for you?



  • @lnguyen
    So far so good.



  • @lnguyen I am not the original person that started this thread but I had a problem that seems the same. I always had the problem where my Domain Override in DNS Resolver would stop working. With 2.4.3 and older it would happen not very often. With 2.4.4-RELEASE-p2 it happens relatively often. If I simply save/apply settings on the DNS Resolver page (may work for other pages...not sure) it then works for a while. I don't need to change anything. There is nothing in the logs that I can see.

    I disabled DNSSEC and that seems to have kept things working. I will also try the method referenced where unbound threading config is changed.



  • @john41 I had no issues with domain override (across IPSec VPN) until 2.4.4-p1. It is still an issue with 2.4.4-p2. @jimp Should I open a bug report for this on redmine?



  • I did notice that only forward zone domain overrides failed with DNSSEC enabled. Reverse zone donain overrides work perfectly fine whether DNSSEC is disabled or enabled.