Major DNS Bug 23.01 with Quad9 on SSL
-
I gave DNS over TLS another go after making two adjustments in my environment (under 23.01).
I unchecked Disable hardware checksum offload
I unchecked Enable the ALTQ support for hn NICs.Not sure why I had the last option ticked given I don't virtualise or use shaping, I use Intel igc / i225 on a dedicated Mini PC. Both these settings were on without issue in 22.05.
I ran some load testing on my machine and funny enough this thing is now stable, it actually completed.
348988 queries over 2853 seconds at an average of 120 queries a second way, way more than I normally do. It actually finished and I could WFH comfortable and browse websites whilst this was running. Prior to that dns stopped working after a couple of minutes.
I feel more confident I may have finally (fingers crossed) solved my specific config issue.
-
@joedan
Thanks for the reminder : I completely forgot to re install my munin unbound graphing for unbound. It's up and collecting as from now.Nice graph btw !
I'm still forwarding to 1.1.1.1, actually more using 2606:4700:4700::1111 using TLS.
pfBlockerng with some classic DNSBL, using python mode of course.All seems fine to me.
The munin charts will give me some visual insights, and is far better as the usual "DNS doesn't work".
Now I think about it : the built in Status> Monitoring should have some basic DNS activity monitoring.
And, because its friday : why not a flag on the pfSense dashboard : "You've broken DNS !" ? -
Hmm, that's interesting.
The ALTQ for hn NICs setting does nothing if you don't have hn NICs.
Re-enabling hardware checksum offload would do something. Only after a reboot though, I assume you did that?
It's hard to see how that wouldn't affect a lot more than just DNS over TLS though.
It would likely also be NIC specific too. Is it possible this only affects igc? That seems unlikely, but possible. -
Yes rebooted immediately after the change.
I am the only one with access to pfsense and do so keeping a detailed change log, snapshot and config backup for everything I modify. My last post talks about removing ntopng which may just have taken some load off however that still had issues where dns over tls did eventually stop working, always as the first symptom.
During my load testing post before that I did manage to break standard dns forwarding once but it was a lot harder to do after several attempts. Didnāt think much of it because of the huge dns load which seemed excessive anyway. Going back to standard resolving worked even better. When I did load it up with dns requests it wouldnāt break and was rock solid but things did on occasion slow down. Again due to the ridiculous amount of dns requests it was generating that seemed acceptable. I only have a small pipe (80mbit) to the internet and never had any other issues apart from dns over tls resolution on 23.01. Some other testing which I didnāt post about was to change from Cloudflare to Quad9 to Google for dns over tls but that made no difference. Dns over tls would eventually stop with any upstream provider.
My machine, ram and ssd are completely oversized running bare metal (specs in my post above) and never broke a sweat. I am just glad itās fixed for me and was thrilled to see dns over tls back on.
I used the same input file for the dns load tester which broke it last time, it was 25MB. When I observed the test finished without issues I reran twice which which just resulted in a lot of cached hits. I then got all of the parts from GitHub and had a 250MB monster. Even this couldnāt break it. Dns over tls has been rock solid since.
-
Yup, glad it seems good for you and all info is good.
-
-
Late to the party here guys. I am experiencing DNS resolution issues specifically using quad9 with DoT enabled sporradically throughout the day. If i disable DoT everything works fine. Or if i keep DoT enabled and switch to CloudFlaire then it works throughout the day with no issues.
Running netgate 6100Max, using pfBlockerng with DNSBL and unbound resolver in python mode.
Disable hardware checksum offload was already unchecked and I unchecked Enable the ALTQ support for hn NICs and rebooted.
Hope this helps.
-
Are you using one of the igc ports as WAN?
-
@stephenw10 Yes sir
igc3
needed 2.5G configured so i can reuse 10G ix0 for lan. -
Hmm.
Ok just for sanity can anyone confirm they are hitting this on some interface other than igc?
It seems very unlikely it would be that but....
-
I'll switch to
ix
on Wednesday and will report back just incase no none would confirm by then. -
@stephenw10 We have several with WAN on ix0 and LAN on ix1. They stall out just like the others.
-
Ok, thanks.
And just to confirm that's to Quad9 specifically?
-
@stephenw10 Yes. They all started out with Quad9 and TLS.
-
yessir, Quad9 in System > General Setup > DNS Server Settings
Below settings are causing intermittend DNS resolution issues described above by others (can't resolve anything for few minutes then eventually starts resolving):
Address: 9.9.9.9 Hostname for TLS Verification: dns.quad9.net Address: 149.112.112.112 Hostname for TLS Verification: dns.quad9.net
When I change above settings to use CloudFlare infont of Quad9 then resolves without issues:
Address: 1.1.1.1 Hostname for TLS Verification: cloudflare-dns.com
-
I am testing now on Quad9 with DoT and it is falilng right now:
āÆ ping github.com ping: cannot resolve github.com: Unknown host āÆ dig github.com ; <<>> DiG 9.10.6 <<>> github.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 40969 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1432 ;; QUESTION SECTION: ;github.com. IN A ;; Query time: 27 msec ;; SERVER: 10.11.100.1#53(10.11.100.1) ;; WHEN: Mon Apr 17 18:17:08 EDT 2023 ;; MSG SIZE rcvd: 39
pfSense UI opened and it sat on main page. Then unbound started to resolve:
āÆ dig github.com ; <<>> DiG 9.10.6 <<>> github.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25085 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1432 ;; QUESTION SECTION: ;github.com. IN A ;; ANSWER SECTION: github.com. 39 IN A 140.82.112.3 ;; Query time: 38 msec ;; SERVER: 10.11.100.1#53(10.11.100.1) ;; WHEN: Mon Apr 17 18:21:11 EDT 2023 ;; MSG SIZE rcvd: 55
What i noticed too is that it was failing until i opened pfSense UI and it started to resolve. No idea what is going on lol
-
Still nothing logged in the resolver log?
That's weird that it started after you logged in. What widgets do you have on the dashboard?
-
After changes made and posted https://forum.netgate.com/post/1094989 I do have fewer problems with DNS responses, but like somebody said it mostly appears when using a mobile device and trying to load some page with a bunch of images and scripts, for example reddit. I see previews but when trying to open large image by clicking on preview, not opening post, it can load it immediately or just hangs and give no output, showing broken image icon. This happens randomly. I am also using pfBlocker-NG, suricata and python mode. All found reddit domains are whitelisted. Nothing unusual in the pfBlocker or suricata logs. Forwarding to Cloudflare and Google servers, using TLS / SSL, using both IPV4 and IPv6 services. Also using multiwan setup. Planning to disable IPv6 and one of the two gateways just to test it again.
-
@stephenw10 Let me get some logs and i will post them here. As far as widgets go i have Gateways, zfs, traffic Grapsh, interfaces, ups status, Firewall logs, OpenVpn, WireGuard, HAProxy, Interface Statistics, Services status, pfBlockerNG, Smart status, Installed Packages, and system information.
As far as system packages:
Name Version
acme 0.7.3_1
arpwatch 0.2.1
haproxy 0.61_9
Netgate_Firmware_Upgrade 0.56
ntopng 0.8.13_10
nut 2.8.0_2
pfBlockerNG 3.2.0_4
Service_Watchdog 1.8.7_1
WireGuard 0.1.6_5 -
Hmm, hard to imagine anything there would affect it. Could Unbound have been restarted when you connect to the dash? The resolver logs would show that if so.
-
Ok it is happening now again lol
what i did was switched to quad9 this morning at 8:30 Eastern and logs show bunch of entries around that time. DNS works fine for som time and now it is failing again below is the dig output. Logs literally stop around the time i made the switch to quad9 and applied the changes (assuming it restarts unbound). There are no entries around when DNS failures occurāÆ dig github.com ; <<>> DiG 9.10.6 <<>> github.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 62433 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1432 ;; QUESTION SECTION: ;github.com. IN A ;; Query time: 27 msec ;; SERVER: 10.11.100.1#53(10.11.100.1) ;; WHEN: Tue Apr 18 08:48:29 EDT 2023 ;; MSG SIZE rcvd: 39
Unbound logs are set to 4