Slow DNS after 22.05
-
@gertjan But that behavior is not obvious.
Getting DNS lookup failures in the log took me a while to trace where I had those references. I had long forgotten the alias I had created to test a f/w rule for DNS blocking.
-
@lohphat said in Slow DNS after 22.05:
Confusing
True.
But changing a setting on a first page that automatically change (disables) a setting on a second page is more confusing.
All this is IMHO of course.Right now, you are hinted to disable first the extra 'special' DNSSEC settings and the second page, and then, at last; disable DNSSEC all together on the main page.
And strange .... DNSSEC works so well for me the last several years already.
I'm using DNSSEC, as pfSense is setup out of the box to use it.
Because a good 'flat' classic Internet connection should not intervene with my outgoing traffic.
If this wasn't the case, I would change ISP ASAP.These :
I've checked years ago.
Never had the need to remove them (which means I rarely visit sites with DNSSEC issues, I guess) -
@gertjan After all the testing the setting which has seemed to solve my problem is to enable "Serve Expired".
That was not enabled when I was running pre 22.05 so I wonder what changed between 22.01 and 22.05 which changed the behavior in my environment.
So far, after re-enabling DNSSEC and the Experimental 0x 20 support, things are working again -- just "Serve Expired" seems to have been the issue.
-
@lohphat I would turn on prefetch as well.
You also might want to set a min ttl. You mention seems to be problem with cdn stuff - possible the ttl is so freaking small that if your having time problems with resolving - you could run into timeouts..
I looked at one example you gave - 300 seconds..
;; QUESTION SECTION: ;i.ytimg.com. IN A ;; ANSWER SECTION: i.ytimg.com. 300 IN A 142.250.191.150 i.ytimg.com. 300 IN A 142.250.191.182 i.ytimg.com. 300 IN A 142.250.191.214 i.ytimg.com. 300 IN A 142.250.191.246 i.ytimg.com. 300 IN A 142.251.32.22 i.ytimg.com. 300 IN A 142.250.190.22 i.ytimg.com. 300 IN A 142.250.190.54 i.ytimg.com. 300 IN A 142.250.190.86 i.ytimg.com. 300 IN A 142.250.190.118 i.ytimg.com. 300 IN A 142.250.190.150 i.ytimg.com. 300 IN A 172.217.0.182 i.ytimg.com. 300 IN A 172.217.1.118 i.ytimg.com. 300 IN A 172.217.2.54 i.ytimg.com. 300 IN A 172.217.4.54 i.ytimg.com. 300 IN A 172.217.5.22 i.ytimg.com. 300 IN A 172.217.4.86
-
Hi, +1 here on the issue with erratic behaviour on DNS lookups since 22.05 update.
I'm going to read the full chain here, but lots of similarities in the bits I've skim-read.
Lots of these entries in the logs for unbound
Jul 13 21:43:18 unbound 982 [982:1] error: recvfrom 22 failed: Protocol not available
I'm using Cloudflare DNS servers, not allowing my WAN connections DHCP settings to flow through and have things set to "use remote, ignore local". I don't have these DNS servers set in my DHCP settings.
DNS Forwarder is inactive, DNS Resolver is active.
-
Over the last few days the only change I've made in addition to "Serve Expired" and add a minimum TTL of 900 sec (setting the help text doesn't specify units, but I have a long-standing complaint on the lack of min detail in setting help text). I also turned off "Use Experimental 0x 20" for DNS spoofing; this too over several days proved unstable (and a change between 22.01 to 22.05 as it was working fine before).
So yes, something has significantly changed in unbound in the last release.
-
@lohphat said in Slow DNS after 22.05:
something has significantly changed
Yeah it did went from version 1.12 or .13.something to 1.15
I have had zero issues with resolving anything. And unbound currently has been running for
[22.05-RELEASE][admin@sg4860.local.lan]/root: unbound-control -c /var/unbound/unbound.conf status version: 1.15.0 verbosity: 1 threads: 4 modules: 2 [ validator iterator ] uptime: 899181 seconds options: control(ssl) unbound (pid 87400) is running... [22.05-RELEASE][admin@sg4860.local.lan]/root:
900k seconds = like 10 days..
While not saying your not having issues - clearly it something with your connection or unique to your setup because if it was something wrong with unbound itself - then everyone running 22.05 would complaining..
-
@johnpoz I think you're probably right. The issue is most likely down to a combination of 22.05 running on my specific hardware (NG 3100 which uses ARM that someone said further up has quirks on occasion) with my specific setup (which isn't far from a few tweaks from vanilla).
What I'm hoping is, someone smarter than me will be able to point me in the right direction.
I'm going to try telling my devices to use an external DHCP server, effectively bypassing pfSense and see if that improves things.
-
@istacey said in Slow DNS after 22.05:
@johnpoz I think you're probably right. The issue is most likely down to a combination of 22.05 running on my specific hardware (NG 3100 which uses ARM that someone said further up has quirks on occasion) with my specific setup (which isn't far from a few tweaks from vanilla).
What I'm hoping is, someone smarter than me will be able to point me in the right direction.
I'm going to try telling my devices to use an external DHCP server, effectively bypassing pfSense and see if that improves things.
Hey!
Like I told before I am also having the same problem since 22.05 on my NG-3100 without changing anything else on the configuration. I´ve also tested different settings with the DNS Resolver, but with no success. After all I´m now using a DNS Resolver installed on my NAS - System, wich is set up as DNS Server in the DHCP - Settings.
With this everything is fine and works like before. But I´d like to change the settings back to the pfSense as DNS Resolver and hope the error will be find.
Greetings,
Markus
-
So far so good with DNS servers issued via DHCP to client devices.
Simple things like playing audio via Amazon Echo works, no intermittent problems with websites that I know are up.
Fingers crossed this is a sufficient work around.
-
Hi! Many helpful posts here!
Just wanted to mention that I'm also seeing the intermittently slow resolution described above:
Loading of websites often require refreshes to either have the site name resolved or CDN for images or stylesheets. I'd like to emphasize the intermittent nature of the problem -- I have duckduckgo.com set as my default search engine (i.e. a very frequently visited site) and have gotten name resolution errors in the browser time and time again over the last weeks with no clear pattern for when it's happening.I have a Netgate 2100 and upgraded from version 22.01 to 22.05 a few weeks ago. The problem started with the upgrade. I had not made changes to the DNS Resolver settings before so the default of using the DNS servers given via DHCP on WAN was reflected on the front page with three servers listed, 127.0.0.1 being the first. Client devices were given the pfSense IP as their DNS Server.
To remedy the situation I tried adding CloudFlare's 1.1.1.1 and 1.0.0.1 as DNS servers in System > General Setup and subsequently unchecked "Allow DNS server list to be overridden by DHCP/PPP on WAN or remote OpenVPN server" but the problem persisted.
Based on replies in this thread, I checked "Serve Expired" on Services > DNS Resolver > Advanced Settings. The problem still occurs from time to time although seeingly less frequent. Resolution appears slow.
Further, I tried disabling DNSSEC (unchecked "Enable DNSSEC Support" in Services > DNS Resolver > General Settings) and disabled hardening of DNSSEC data (unchecked "Harden DNSSEC Data" in Services > DNS Resolver > Advanced Settings). Failures still occur.
To circumvent these problems I temporarily disabled the DNS Resolver.
I'll be watching this thread, hoping a solution pops up.
-
@kvhs said in Slow DNS after 22.05:
Hi! Many helpful posts here!
Just wanted to mention that I'm also seeing the intermittently slow resolution described above:
Loading of websites often require refreshes to either have the site name resolved or CDN for images or stylesheets. I'd like to emphasize the intermittent nature of the problem -- I have duckduckgo.com set as my default search engine (i.e. a very frequently visited site) and have gotten name resolution errors in the browser time and time again over the last weeks with no clear pattern for when it's happening.I have a Netgate 2100 and upgraded from version 22.01 to 22.05 a few weeks ago. The problem started with the upgrade. I had not made changes to the DNS Resolver settings before so the default of using the DNS servers given via DHCP on WAN was reflected on the front page with three servers listed, 127.0.0.1 being the first. Client devices were given the pfSense IP as their DNS Server.
To remedy the situation I tried adding CloudFlare's 1.1.1.1 and 1.0.0.1 as DNS servers in System > General Setup and subsequently unchecked "Allow DNS server list to be overridden by DHCP/PPP on WAN or remote OpenVPN server" but the problem persisted.
Based on replies in this thread, I checked "Serve Expired" on Services > DNS Resolver > Advanced Settings. The problem still occurs from time to time although seeingly less frequent. Resolution appears slow.
Further, I tried disabling DNSSEC (unchecked "Enable DNSSEC Support" in Services > DNS Resolver > General Settings) and disabled hardening of DNSSEC data (unchecked "Harden DNSSEC Data" in Services > DNS Resolver > Advanced Settings). Failures still occur.
To circumvent these problems I temporarily disabled the DNS Resolver.
I'll be watching this thread, hoping a solution pops up.
Following on from my original reply where it looked like restarting the service resolved... it didn't.
Just wanted to say I have had the same experience - tried many of the suggestions here. I have tried with the resolver/forwarder, with DNSSEC enabled/disabled. Tried pre-fetch keys, harden DNSSEC data.
I have given up with the slow or unresponsive DNS resolution since 22.05 and put my clients on Google DNS over TLS which is working perfectly.
Hopefully somebody can find a solution as I rather liked using the resolver on my SG2100.
-
In summary, my fixes have been stable.
- Enable Serve Expired -- this helped with CDN lookups. This was not set in 22.01
- Set minimum TTL to 300 seconds. This was not set in 22.01
- Disable Experimental 0x 20 support -- this was working in 22.01 but caused instability in 22.05.
So far things have been stable for over a week. I tried with and without pfBlocker-devel and various attempts to use forwarding or not (it was necessary while I was searching for a fix but I'm back to resolving locally again).
So yes, it seems "something has changed" but there's no smoking gun.
-
There are a number of bug fixes on Unbound since 1.15.0 which pfSense 22.05 uses, but I don't have enough knowledge of DNS to determine if those fixes are likely to fix these problems.
https://github.com/NLnetLabs/unbound/tags
I find this one solved in 1.16.0 interesting though: https://github.com/NLnetLabs/unbound/issues/670
-
Having this issue with an SG-6100 after going from 22.01 to 22.05 also. So far the Enable Serve Expired seems to be resolving the issue, but time will tell
-
Also seeing these intermittent DNS issues on my 5100 since updating to 22.05.
Haven't had a chance to troubleshoot yet but same issues outlined above.
Will try enabling Serve Expired tomorrow and see if that resolves. -
@lohphat said in Slow DNS after 22.05:
Set minimum TTL to 300 seconds. This was not set in 22.01
I enabled Serve Expired but this didn't seem to help in my case.
Experimental 0x 20 support was already disabled.Is the min TTL setting: Minimum TTL for RRsets and Messages?
So far I'm just thinking of rolling back to 22.01.
It seems like whatever was updated in unbound is causing issues for a small subset of us. -
I may have to roll back as well. The Enable Serve Expired (seemingly) does help a little, but I am still getting dns timeouts frequently. I have now also enabled cache-min-ttl (also known as Minimum TTL for RRsets and Messages) to 300 sec. My Experimental 0x20 support has never been enabled.
https://nlnetlabs.nl/documentation/unbound/unbound.conf/
Not sure if this is related (probably should be talking on unbound's GitHub at this point) but I'm seeing a bunch of "outnettcp got tcp error -1" in debug logs when turned up to logging level 4.
-
@kvhs said in Slow DNS after 22.05:
I find this one solved in 1.16.0 interesting though: https://github.com/NLnetLabs/unbound/issues/670
This seems a reasonable trail to start following -- this may be an out of memory/heap issue.
Just curious, for those of us seeing issues are you also running IPv6? I am.
In the bug notes it seems that disabling IPv6 addressed the issue as less memory overhead is needed. I wonder if the unbound changes may necessitate bumping up memory allocation to prevent spurious lookup failures.
-
Just enabled logging level 4 and also see a few 'outnettcp got tcp error -1' errors but no idea if it's related.
Also running IPv6.
Not sure I can actually rollback unless I can use config backup from 22.05 on 22.01.
Wondering if it would be better if I just wipe and reinstall 22.05, then restore config just in case something got messed up with the upgrade.I believe I saw @johnpoz runs an SG-5100 too, and upgraded from 22.01 to 22.05 and doesn't have the same problems.