23.01 Upgrade unbound Issue

Cylosoft

@gertjan It's another layer of filtering/security. Encrypted connection out to a known DNS provider. For filtering we have a couple of different tiers based on how much filtering we want the location to have. Quad9 for example does some basic DNS blocking. Some sites get NextDNS connections so IT can manage rules centrally there rather than running pfBlocker locally. Some get wide open Cloudflare DNS servers. Some get filtered Cloudflare.

jimp

Some ISPs break acting as a resovler in various ways as well, either rate limiting or address limiting who you can reach on common DNS ports, or other similar shenanigans.

DNS over TLS/DNS over HTTPS are OK if you really trust the provider on the other end. Those are for privacy and not for authentication/validation, though. If you forward you have to trust that the upstream DNS servers are also not changing your query results in unexpected ways.

SteveITS

@jimp I understand your logic. Is it therefore a regression in unbound, if it worked with DNSSEC enabled in prior pfSense versions? If the answer is, "it's not supposed to work" then I can understand that too. I'm just trying to help people out. Worst case, after people upgrade to 23.01 and have a problem then they won't have it enabled for future upgrades anyway. Perhaps a "if you are using forwarding and have problems disable this option" note on that setting, if it's not too much handholding.

jimp

The answer really depends entirely on the upstream resolver behavior, so there is no way to know. If we put a note on the option then it's likely to be missed since they'd have to already know which option they'd need to disable to see that or to know their problem is related to forwarding, which isn't obvious. Bit of a chicken/egg issue there.

This is already covered in the docs under troubleshooting DNS issues:

https://docs.netgate.com/pfsense/en/latest/troubleshooting/dns.html#check-dns-service

SteveITS

@jimp It just seemed like a lot of people were now/newly hitting it on 23.01 so unbound behavior apparently changed. I need to check that setting when we get to updating the rest of our, and all our clients, routers.
(Always good to read the docs. Not sure I cited it this thread but I have in others.)

stephenw10

Mmm, it does seem like something has changed there causing people to hit this who weren't before. I wonder if it's some secondary effect though like they were always hitting it but cached values were hiding it until the upgrade.

johnpoz

@stephenw10 I have not had time to look through all the changes in unbound. But were we not on like 1.15.something, and now we are on 1.17.1 - have to assume lots of changes to unbound in such a big jump in version numbers.

is it also possible that something changed with say quad9.. I had not noticed their actual recommendation to disable dnssec if you forward to them until recently where it was pointed out in a thread. Has that been their standing recommendation for a long time, or did they possible make some changes that now make it the dnssec failures hit more often? If using 1.17.1 unbound?

bingo600

I haven't noticed any issues on my 23.01 "Test Box"
And i even use DNSSEC w. Forwarding.
Making my 20 meters from the pfSense to my two linux DNS servers "super secure" ...

/Bingo

stephenw10

That's your own hosted DNS resolvers? Interesting. They support DNSSec I preseume.

It seems unlikely any upstream change, at Quad9, would have coincided with 23.01 release. But perhaps people just started checking after upgrading. I agree some behaviour change in Unbound seems most likely to me.

Steve

johnpoz

@stephenw10 I just did a breeze over of the changes and bug fixes in unbound from 1.15 to 1.17.1 and nothing jumped out at me.. But possible something that was fixed now triggers more failures that should of failed before but didn't, etc.

I despise forwarding, so while I will turn it on to show how it can be done. I have no desire to leave it on for any length of time to see if errors occur on specific domains I might be going to, etc. Or after a amount of time..

if someone could point out a specific query that fails when forwarding using dnssec and forwarding, but works fine with just forwarding to a dnssec resolver, be it over tls or just clear, etc. I would be up for looking into that.

bingo600

I'm only forwarding because:

1:
I already had a working linux DNS/DHCP setup , that does dynamic DHCP entries in DNS.

2:
Unbound DHCP adding "hosts", created unacceptable reload/delays.
I want my DHCP devices to be resolvable, and more important to have "sane names" in the DNS system.

DNSSEC all 20 meters on a local network ... Is kind'a "well it hasen't broke yet" , and my Bind9's are setup to accept it.

/Bingo

johnpoz

@bingo600 said in 23.01 Upgrade unbound Issue:

only forwarding because

Your forwarding to your own dns? What does it do then to resolve public dns. Forwarding to your own internal resolvers is different ball game than forwarding to some outside dns ;)

bingo600

@johnpoz said in 23.01 Upgrade unbound Issue:

@bingo600 said in 23.01 Upgrade unbound Issue:

only forwarding because

Your forwarding to your own dns? What does it do then to resolve public dns.

My bind9's knows all my DNS names & Dynamic (DHCP) DNS names , and will resolve public names if asked to do so.

That made the decision on setting pfSense to query those , super easy ....

On (almost) all of my lan's/vlan's - I only allow DNS (UDP/TCP 53) to the def-gw ip (pfSense), all other DNS servers are blocked.

So client's can only ask pfSense , that asks my linux'es , they resolve local names , and ask A-Root for the rest ...

My Phone Vlan + MMedia Vlan (ATV etc) , asks a Pi-Hole directly , that asks my linux'es.
It cleans out a lot of "noise" on phones , and blocks a lot of ATV "callbacks" , and gives me the possibility to block stuff manually.

But you use Pi-Hole so .. You know that :-)

I (hate DOH) .... Have floating rules on most IF's that "try to block" DOH .... , using internet based lists, that pfS downloads once a day.

Forwarding to your own internal resolvers is way different than forwarding to some outside dns ;)

I know , and might be why i'm not hit , and haven't removed DNSSEC
/Bingo

SteveITS

@johnpoz said in 23.01 Upgrade unbound Issue:

specific query that fails when forwarding using dnssec and forwarding

Not sure if it was this thread or another (and was in the redmine) but I posted "nslookup linkedin.com" failed for me, repeatedly. I restarted unbound while testing, to no avail. Unchecking the DNSSEC option (which I believe restarts unbound unfortunately, for A/B testing) let it work immediately. I was unable to replicate it later, but I did not leave it with DNSSEC enabled for several hours as it was originally. So it may be time based, or Microsoft may have changed something, or something else triggers it.

re: why forward, Quad9 and others add malware protection. No it's not necessary but is a value to some.

johnpoz

@steveits said in 23.01 Upgrade unbound Issue:

Quad9 and others add malware protection

That for sure could be a valid reason - but most browsers do that on their own to be honest.

https://support.mozilla.org/en-US/kb/how-does-phishing-and-malware-protection-work

Not saying better or worse than what a bad site filtering might do - but that feature is not worth me sending all my dns traffic to them. Other might find it worth while sure..

johnpoz

@steveits said in 23.01 Upgrade unbound Issue:

linkedin.com

They are not even using dnssec.. But I see this error checking their dns

linkedin.com/TXT: No response was received until the UDP payload size was decreased, indicating that the server might be attempting to send a payload that exceeds the path maximum transmission unit (PMTU) size. (2600:2000:2210::43, 2600:2000:2220::43, 2600:2000:2230::43, 2600:2000:2240::43, UDP_-_EDNS0_4096_D_KN)

bingo600

I have experienced in my setup , that "Unbound" caches a "non resolved answer too".

If i have forgotten to add ie. a new host to my zone , and i try to resolve it on a client. I get a "not found" as i should
But when i then add the host to bind9 , and retry the "lookup" , then i still get a "not found" from unbound.
If i try to resolve the name on the bind9 linux machine, it resolves fine.
Cure is to restart unbound , then it begins to work.

I know the "zones" specifies the TTL , but does it have to "cache a bad answer too" ...

I might have to dig into my "Cricket book" .....
Prob. to find out that TTL also applies to "non resolvables too, for that zone".

Well just don't GOOF in the first place ...

/Bingo

johnpoz

@bingo600 said in 23.01 Upgrade unbound Issue:

cache a bad answer

hmmm - I just did a quick look, and it doesn't seem like the gui exposes the

cache-max-negative-ttl:

Setting... this could have something to do with it.. Especially if you have min ttl set. There was a bug a while back I believe where even if you had cache-max-negative-ttl: set to say 1, sometimes this could get cached as your min ttl time vs the negitive ttl..

I will have to do a bit of looking at what the actual unbound.conf has - and what is default if not set. But maybe I missed it, but doesn't seem like that option is exposed in the gui to mess with even.

edit: ok I don't see it in the .conf - so looks like it would default to 3600 from the unbound documentation

   cache-max-negative-ttl: <seconds>
          Time to live maximum for negative responses, these have a SOA in
          the authority section that is limited in time.  Default is 3600.
          This applies to nxdomain and nodata answers.

You should be able to set that in the custom options for unbound, if its something you feel you want to adjust - maybe put in a feature request to expose that in the gui.

SteveITS

@steveits Also, to clarify, I do not think LinkedIn was the only issue at the time. I was on my phone and flipping between things. I vaguely recall some issues on web pages, and the LinkedIn app wouldn't load most things, which got my attention, especially since I'd just upgraded to 23.01 that evening. Their web site wouldn't load, so I started investigating. No further issues in the last few days after turning off DNSSEC. My testing was querying pfSense, I flushed DNS cache on my PC, etc.

johnpoz

@bingo600 ok did a bit of looking at the neg cache time..

So you can tell that neg cache is being used, if you looking up something you know will nx.. So for example I did a query for random.cnn.com

If I look in the unbound cache, I can see that its counting down the ttl from 3600

[23.01-RELEASE][admin@sg4860.local.lan]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep lsjfdsld.cnn.com
msg lsjfdsld.cnn.com. IN A 32899 1 3554 3 0 1 0
[23.01-RELEASE][admin@sg4860.local.lan]/var/unbound: unbound-control -c /var/unbound/unbound.conf dump_cache | grep lsjfdsld.cnn.com
msg lsjfdsld.cnn.com. IN A 32899 1 3546 3 0 1 0
[23.01-RELEASE][admin@sg4860.local.lan]/var/unbound:

See where it goes down from 3554, to 3546.. You could try setting the min-neg cache setting to something lower and see if using that via similar test that I did.