23.01 Upgrade unbound Issue
-
@steveits Also, to clarify, I do not think LinkedIn was the only issue at the time. I was on my phone and flipping between things. I vaguely recall some issues on web pages, and the LinkedIn app wouldn't load most things, which got my attention, especially since I'd just upgraded to 23.01 that evening. Their web site wouldn't load, so I started investigating. No further issues in the last few days after turning off DNSSEC. My testing was querying pfSense, I flushed DNS cache on my PC, etc.
-
@bingo600 ok did a bit of looking at the neg cache time..
So you can tell that neg cache is being used, if you looking up something you know will nx.. So for example I did a query for random.cnn.com
If I look in the unbound cache, I can see that its counting down the ttl from 3600
[23.01-RELEASE][admin@sg4860.local.lan]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep lsjfdsld.cnn.com msg lsjfdsld.cnn.com. IN A 32899 1 3554 3 0 1 0 [23.01-RELEASE][admin@sg4860.local.lan]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep lsjfdsld.cnn.com msg lsjfdsld.cnn.com. IN A 32899 1 3546 3 0 1 0 [23.01-RELEASE][admin@sg4860.local.lan]/var/unbound:
See where it goes down from 3554, to 3546.. You could try setting the min-neg cache setting to something lower and see if using that via similar test that I did.
-
@bingo600 Which list are you using for DoH blocking? I'm currently using a list from "oneoffdallas" that appears to be maintained. Wondering if there is a better one.
-
@moelassus There's a TheGreatWall_DoH_IP list in pfBlocker, though the list itself says it was last updated in 2020.
-
@steveits Yeah, TheGreatWall list doesn't appear to be maintained. For example, it doesn't include any of Apple's DoH servers. OneOffDallas's list was last updated in Dec 2022.
Trying to block DoH is really a pointless exercise because bad actors aren't going to use a well-known DoH server anyway. DoH is a scourge. ļø
-
@moelassus said in 23.01 Upgrade unbound Issue:
going to use a well-known DoH server anyway
While I agree with you - it would be quite possible for some bad software to use something on their own, and not a well known doh server. What it does do is stop stuff that is just trying to do you a "favor" and use doh without specifically asking you.. Those would normally point to a well known doh service..
I block it - because I don't want my stuff using it, but sure if it just using some not well known doh server, not much I could do about that other than trying all of its https traffic, which is pretty difficult.
-
@johnpoz said in 23.01 Upgrade unbound Issue:
You could try setting the min-neg cache setting to something lower and see if using that via similar test that I did.
Thanx .. I will try that
And it works ....
server: #log-queries: yes #log-replies: yes # Set max failed lookup cache time cache-max-negative-ttl: 10 #
[23.01-RELEASE][admin@fwall]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep .cnn.com msg garbage.cnn.com. IN A 33155 1 6 3 0 1 0 msg cnn.com. IN DS 33152 1 6 0 0 3 0 [23.01-RELEASE][admin@fwall]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep .cnn.com msg garbage.cnn.com. IN A 33155 1 4 3 0 1 0 msg cnn.com. IN DS 33152 1 4 0 0 3 0 [23.01-RELEASE][admin@fwall]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep .cnn.com msg garbage.cnn.com. IN A 33155 1 3 3 0 1 0 msg cnn.com. IN DS 33152 1 3 0 0 3 0 [23.01-RELEASE][admin@fwall]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep .cnn.com msg garbage.cnn.com. IN A 33155 1 2 3 0 1 0 msg cnn.com. IN DS 33152 1 2 0 0 3 0 [23.01-RELEASE][admin@fwall]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep .cnn.com msg garbage.cnn.com. IN A 33155 1 1 3 0 1 0 msg cnn.com. IN DS 33152 1 1 0 0 3 0 [23.01-RELEASE][admin@fwall]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep .cnn.com msg garbage.cnn.com. IN A 33155 1 0 3 0 1 0 msg cnn.com. IN DS 33152 1 0 0 0 3 0 [23.01-RELEASE][admin@fwall]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep .cnn.com [23.01-RELEASE][admin@fwall]/var/unbound:
-
@moelassus said in 23.01 Upgrade unbound Issue:
@bingo600 Which list are you using for DoH blocking? I'm currently using a list from "oneoffdallas" that appears to be maintained. Wondering if there is a better one.
I was basically following this guid
https://github.com/jpgpi250/piholemanualURL List defined in pfS
And one of the lists got updates today
@moelassus - There's a pfSense specific guide in this doc
https://github.com/jpgpi250/piholemanual/tree/master/docI just skimmed the new guide ... He has made it very complicated.
I have attached the guide i followed, when i made it .... (a previous version)
Tip : USE FLOATING RULES ...
1-pfs-blockDOH-2021-simple.pdf.zip/Bingo
-
Definitely still seeing some Unbound issues since upgrading to 23.01... sometimes it takes a couple days to happen, and you can "fix it" by restarting unbound but it appears to randomly crash/restart on its own then log messages about DNSKEYs being insecure.
I also had the issue of IPv6 interfaces not being auto-added to the ACL; I had to manually override the ACL and put in all of my v6 subnets / internal IPs which I thought was the only issue (error would be Query Refused) but I just had an incident where unbound crashed and gave "Server Fails" in nslookup w/the ACL fix in place. I did nothing but wait a few moments and it was working again.
Every server restart you see in this log was not caused by me and this self-fixed after several moments of trying various DNS lookups (suddenly they all worked). Not really sure what to do about this except revert back to 22.05.
Under DNS Resolver > Advanced Settings I have "Prefetch DNS Key support" and "Harden DNSSEC Data" both UNchecked (been unchecked since I had the Query Failed/ACL issues two days after I upgraded)
Feb 22 17:32:33 unbound 73817 [73817:5] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 22 17:32:33 unbound 73817 [73817:5] info: generate keytag query _ta-4f66. NULL IN
Feb 22 17:31:25 unbound 73817 [73817:2] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 22 17:31:25 unbound 73817 [73817:2] info: generate keytag query _ta-4f66. NULL IN
Feb 22 17:30:24 unbound 73817 [73817:2] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 22 17:30:24 unbound 73817 [73817:2] info: generate keytag query _ta-4f66. NULL IN
Feb 22 17:29:21 unbound 73817 [73817:1] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 22 17:29:21 unbound 73817 [73817:8] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 22 17:29:21 unbound 73817 [73817:1] info: failed to prime trust anchor -- DNSKEY rrset is not secure . DNSKEY IN
Feb 22 17:29:21 unbound 73817 [73817:8] info: generate keytag query _ta-4f66. NULL IN
Feb 22 17:29:21 unbound 73817 [73817:1] info: generate keytag query _ta-4f66. NULL IN
Feb 22 17:29:21 unbound 73817 [73817:0] info: start of service (unbound 1.17.1).
Feb 22 17:29:21 unbound 73817 [73817:0] notice: init module 1: iterator
Feb 22 17:29:21 unbound 73817 [73817:0] notice: init module 0: validator
Feb 22 17:29:21 unbound 73817 [73817:0] notice: Restart of unbound 1.17.1.
(...) -
@inferno480 I had to truncate the log to fit my post in, but can attach it... I just don't think there's much of value to review in it unless I increase debugging somewhere. i.e. it shows the symptom more than the cause.
-
@inferno480 do you have dnssec checked, and your forwarding? If so to where?
Have you checked the date time on your box?
-
@johnpoz Time/Date seem accurate, they are NTP sync'd to 2.pfsense.pool.ntp.org - nothing noteworthy in the NTP logs
"DNSSEC" is checked under General Settings, I am using DNS Forwarding and my servers are Google DNS with the two V6 servers listed first, then the V4. DNS Resolution Behavior is "Use local DNS (127.0.0.1), fall back to remote DNS Servers (Default)". None of this was modified from 22.05 and I never had a problem until 23.01.
Any suggestions on additional logging I can enable (and how), for the next time it happens? I realize the randomness can make things difficult to troubleshoot.
-
@inferno480 uncheck DNSSEC and I suspect your issues will disappear. Unbound seems more sensitive in this version, when using it and forwarding. As discussed in this and other threads and the pfSense troubleshooting doc, DNSSEC is irrelevant if youāre already trusting the other DNS servers to do the lookup for you.
-
If 'DNSSEC' is enabled, pfSense, during preparation of the unbound start, gets a copy of the good, known DNSSEC root key.
/usr/bin/su -m unbound -c '/usr/local/sbin/unbound-anchor -a /var/unbound/root.key'
If this fails, unbound will know it is using a not-good copy, and bail out.
So, you could check with :
/usr/bin/su -m unbound -4 -v -c '/usr/local/sbin/unbound-anchor -a /var/unbound/root.key'
to know if your IPv4 works.
The same for IPv6 :/usr/bin/su -m unbound -6 -v -c '/usr/local/sbin/unbound-anchor -a /var/unbound/root.key'
Remember : no return message : all is well.
You could check content and time stamp of the /var/unbound/root.key file to see for yourself.But : if you are forwarding, as said a million times time by now : disable DNSSEC.
Remember : forwarding means : you don't want certified DNS answers. You've decided to trust "some one else"
That's why forwarding is not the default mode. -
It seems there's also an IPv6 ACL bug, if set to listen on "all" interfaces, that now has a patch:
https://forum.netgate.com/topic/176989/problems-with-pfsense-ipv6-dns-function-does-it-exist/36 -
@bingo600 said in 23.01 Upgrade unbound Issue:
following this guid[e]
https://github.com/jpgpi250/piholemanualI see that includes OneOffDallas. (@moelassus)
It has an important typo though, the three local-zone lines have a leading space:
local-zone: " use-application-dns.net."
That doesn't work in my testing; needs to be:
local-zone: "use-application-dns.net."
-
@steveits
Nice catchDid you use the new giude , or the old one i posted as a"PDF/zip" here (bottom):
https://forum.netgate.com/post/1089443I just briefly skimmed the new guide, and it seemed very "complicated" ..
I have just implemented the "Old guide"./Bingo
-
@bingo600 I just looked through what was on GitHub and set it up myself. Iād already done something similar using pfBlockerās Greatwall list.
-
@stephenw10 said in 23.01 Upgrade unbound Issue:
Mmm, it's odd because if it's enabled and fails then whatever you're forwarding to doesn't support it. So I would expect it to be an all or nothing situatiuon.
I wonder if it was previously disabled automatically in the old Unbound version.I'm starting to think it's TLS forwarding. We have changed over about a dozen firewalls now and all are having DNS issues with 23. Disabling "Use SSL/TLS for outgoing DNS Queries to Forwarding Servers" seems to be the only way to keep DNS Resolver working. We just switched all of them this morning to see if that holds up.
-
@cylosoft Interesting, let us know. I haven't noticed any DNS issues at home in a week after disabling DNSSEC while forwarding. Haven't upgraded others yet.
Either way 23.01 does have different/problematic behavior than prior versions for people, since there are a lot of posts about DNS.
To be fair I recall plenty of posts about DNS issues in 22.05, but I did not experience that in the routers we upgraded to 22.05.
@stephenw10 said in 23.01 Upgrade unbound Issue:
wonder if it was previously disabled automatically in the old Unbound version
Maybe internally to Unbound? I suggested pfSense do that in a redmine and it was declined, in the context of not wanting to disable people's security choices unexpectedly.