Major DNS Bug 23.01 with Quad9 on SSL
-
Unbound Logging Level 3 (although I did manage to change it to 4 very briefly but had to stop due to the amount of data). Snippet of Level 3 logging downloaded from graylog below.
Hopefully I am ingesting everything. -
There is a closed bug report on the
unbound
Github site with this exact same TCP error here: https://github.com/NLnetLabs/unbound/issues/535. Theunbound
developer attributed the TCP error to the other end of the connection (meaning not theunbound
side) closing the TCP socket. Maybe this is a clue ??? -
@gertjan said in Major DNS Bug 23.01 with Quad9 on SSL:
what about setting up the DNS (statically) so they all get their DNS from "1.1.1.1" using TLS (port 853)
@joedan Can you run your test against 1.1.1.1 directly, not using pfSense as the DNS server?
Quoting from @bmeeks' referenced link, "This is normal if the other server restarts for example, or maybe because it wants to manage the TCP connections that it has; possibly with timeouts for how long they can be used."
-
I've activated these again :
with :
When the log level was set to 3, I saw some
debug: tcp error for address 2606:4700:4700::1111 port 853
and
debug: tcp error for address 2606:4700:4700::1001 port 853
and
debug: tcp error for address 1.1.1.1 port 853
and
debug: tcp error for address 1.0.0.1 port 853Also some repeating :
....
outnettcp got tcp error -1
outnettcp got tcp error -1
outnettcp got tcp error -1
outnettcp got tcp error -1
....Recent reading makes me think these are rather harmless.
I'll leave it for the weekend.
When DNSSEC was activated, I saw a lot of DS and other DNSSEC related records were requested. DNS resolving worked just fine, though.
-
-
-
-
-
Mmm, it looks like log level 4 might be needed to see any additional logging associated with those errors.
-
@stephenw10 said in Major DNS Bug 23.01 with Quad9 on SSL:
level 4
shows 'nothing' special.
And what you can't see doesn't exist ;)
I'm forwarding right now to 1.1.1.1 etc using 853 - see setup above.
I even have DNSSEC activated + "Harden DNSSEC Data" (on the Resolver Advanced Settings page) .... because, as Clause Kellerman always says : "why not !!".
I'll leave it like this for the weekend. I'll get back on this monday morning.I'm banking, send some mails, received a ton of mail, all my colleges are also doing their toktok things, and no one came complaining to me (they know who to look) to tell me that I have to stop with "messing with the connection".
-
@gertjan
Hi, I don't see those errors here. I activated log level 4.
I use Cloudflare DNS servers. 1.1.1.1 and 1.0.0.1
But I use another hostname for the TLS verification.Sorry, I take it back. It was on Log level 3.
-
-
-
-
As far as iām concern, quad9, google, whatever you use. 23.01 has a outstanding bug base on DoT using forwarding mode (dnssec uncheck) what ever config you use you will get Ā«Ā FAILED TO RESOLVE HOSTĀ Ā» in DoT mode. For all alias that has to resolve to dynamic dns (xxxx.dyndns.org)
-
-
-
-
here log level 3
from pfsense resolution imself for ALIASES with dynamic dns (xxxx.dyndns.org)
Apr 7 21:33:30 unbound 88702 [88702:0] info: finishing processing for vrac-nicolas.dyndns.org.jimbohello.arpa. AAAA IN
Apr 7 21:33:30 unbound 88702 [88702:0] info: query response was NXDOMAIN ANSWER
Apr 7 21:33:30 unbound 88702 [88702:0] info: reply from <.> 1.1.1.1#853
Apr 7 21:33:30 unbound 88702 [88702:0] info: response for vrac-nicolas.dyndns.org.jimbohello.arpa. AAAA IN
Apr 7 21:33:30 unbound 88702 [88702:0] info: iterator operate: query vrac-nicolas.dyndns.org.jimbohello.arpa. AAAA IN
Apr 7 21:33:30 unbound 88702 [88702:0] debug: iterator[module 0] operate: extstate:module_wait_reply event:module_event_replyFrom the client side (lan)
Apr 7 21:38:42 unbound 88702 [88702:0] info: finishing processing for vrac-nicolas.dyndns.org. A IN
Apr 7 21:38:42 unbound 88702 [88702:0] info: query response was ANSWER
Apr 7 21:38:42 unbound 88702 [88702:0] info: reply from <.> 8.8.8.8#853
Apr 7 21:38:42 unbound 88702 [88702:0] info: response for vrac-nicolas.dyndns.org. A IN
Apr 7 21:38:42 unbound 88702 [88702:0] info: iterator operate: query vrac-nicolas.dyndns.org. A INJESUS I FOUND THE ISSUE I GUEST :
WHY IS PFSENSE ITSELF TRY TO RESOLVE
vrac-nicolas.dyndns.org.jimbohello.arpa
when it suppose to be vrac-nicolas.dyndns.orgpfsense is adding the domain part of itself ! no wonder why it can't resolve
-
@jimbohello said in Major DNS Bug 23.01 with Quad9 on SSL:
vrac-nicolas.dyndns.org.jimbohello.arpa
when it suppose to be vrac-nicolas.dyndns.org
pfsense is adding the domain part of itself ! no wonder why it can't resolveYour Windows PC is doing the same thing ...
Have a look what 'nslookup' does :
C:\Users\Gauche>nslookup Serveur par defaut : pfSense.mydomain.tld Address: 2a01:cb19:beef:a6dc::1 > set debug > google.com Serveur : pfSense.mydomain.tld Address: 2a01:cb19:beef:a6dc::1 ------------ Got answer: HEADER: opcode = QUERY, id = 2, rcode = NXDOMAIN header flags: response, want recursion, recursion avail. questions = 1, answers = 0, authority records = 1, additional = 0 QUESTIONS: google.com.mydomain.tld, type = A, class = IN AUTHORITY RECORDS: -> mydomain.tld ttl = 446 (7 mins 26 secs) primary name server = ns1.mydomain.tld responsible mail addr = postmaster.mydomain.tld serial = 2023020723 refresh = 14400 (4 hours) retry = 3600 (1 hour) expire = 1209600 (14 days) default TTL = 10800 (3 hours) ------------ ------------ Got answer: HEADER: opcode = QUERY, id = 3, rcode = NXDOMAIN header flags: response, want recursion, recursion avail. questions = 1, answers = 0, authority records = 1, additional = 0 QUESTIONS: google.com.mydomain.tld.net, type = AAAA, class = IN AUTHORITY RECORDS: -> mydomain.tld ttl = 446 (7 mins 26 secs) primary name server = ns1.mydomain.tld responsible mail addr = postmaster.mydomain.tld serial = 2023020723 refresh = 14400 (4 hours) retry = 3600 (1 hour) expire = 1209600 (14 days) default TTL = 10800 (3 hours) ------------ ------------ Got answer: HEADER: opcode = QUERY, id = 4, rcode = NOERROR header flags: response, want recursion, recursion avail. questions = 1, answers = 1, authority records = 0, additional = 0 QUESTIONS: google.com, type = A, class = IN ANSWERS: -> google.com internet address = 142.250.74.238 ttl = 30 (30 secs) ------------ RƩponse ne faisant pas autoritƩ : ------------ Got answer: HEADER: opcode = QUERY, id = 5, rcode = NOERROR header flags: response, want recursion, recursion avail. questions = 1, answers = 1, authority records = 0, additional = 0 QUESTIONS: google.com, type = AAAA, class = IN ANSWERS: -> google.com AAAA IPv6 address = 2a00:1450:4007:80c::200e ttl = 30 (30 secs) ------------ Nom : google.com Addresses: 2a00:1450:4007:80c::200e 142.250.74.238 >
You saw what happened ?
I wanted details (fact checking) so I used 'set debug' first.Then it showed that when I look up a domain, it adds the local PC domain first, mydomain.tld.
Because ..... we (me and you) are doing it wrong
When you want to do a DNS lookup, you have to ask :
google.com.
The final dot is important.It's not really an issue.
If I wanted to look up the IP of my PC, called 'gauche2' :
( which is just the host name, not the FQDN !)
nslookup adds again mydomain.tld. and this time and asks pfSense
gauche2.mydomain.tldand that is a 'good' question :
I got an IPv4 and IPv6 as nslookup asks both by default.So, not really an error, and you could consider adding a final dot if the GUI accepts it (it does, I guess).
Btw : when unbound receives "google.com.mydomain.tld." as the request, it knows that it is authoritative for "mydomain.tld." so it isn't going to ask upstream details about "mydomain.tld." : after all "unbound handles "mydomain.tld" and the upstream resolver doesn't know anything about local domains and resources (normally).
I'm not going to ask 9.9.9.9 or 1.1.1.1 about FQDN info for the device in my LAN, that's not logic.
Btw : I have the resolvers/unound "System Domain Local Zone Type" set to "Static", not to the (default?) "Transparant".
When set to "Transparant", unbound will ask 9.9.9.9 to resolve "vrac-nicolas.dyndns.org.jimbohello.arpa." which ... no surprise, will give no answer or a "NXDOMAIN" as this domain is unknown or "new ?" to 9.9.9.9edit : since yesterday, I'm doing the forward thing : DoT to :
No issues what so ever.
-
Well I am 95% sure I fixed my issue. Decided to switch dns over tls back on and after a couple of hours had the dreaded dns failures. This time I removed ntopng package completely. Ntopng has been monitoring both lan and wan since Nov 22 under 22.05 and 23.01 since release candidate.
As soon as I removed ntopng, dns over tls through Cloudflare has been running ok for 24 hours. Browsing websites is super quick. My machine and bandwidth were never under stress however since removing ntopng it has drastically sped up overall speed to load a website and even the pfsense web interface itself. Pfblockerng is showing around 20k dns entries per hour which is normal load.
-
Nevermind, thought I had fixed it, been over 24 hours and it happened again. Will just stay in Unbound resolver mode for now and leave it be. That seems to be stable and working at least.
-
Iāve tried static ! All dyndns in my aliases does not resolves.
Before 22.05 was transparent and ad no issue
I did a work arround
Instead of regular network/host aliases i did Ā«Ā url ip table aliasesĀ Ā» update frequency 1 days. Now itās working as expected !
-
To be clear you created a file with the dyndns FQDNs in it hosted locally and added that as the URL Table location?
-
Exaclly
Aliases url ip table
Host on a web server
Http://server.com/mydnamicdns.txt
All my dynamic in that files
DoT activated
All good ninja style
-
Ok, good. I thought for a minute it was handling URL aliases incorrectly.
Well that seems like a clue then. Why is it resolving those entries differently.
-
@stephenw10
hey that's why i'm doing debugging !
i'm not pfsene engeenir !
but i don't let my self goes down until i found solution.
and for DoT activated with formarding to remote dns ! that's the only solution a found so far
hope help :)have a nice one !
i know that pfsense aliases HOST/NETWORK seem to use someting call "dns filter"
maybe when resolving from "URL IP TABLE" it does it using nslookup or dig or something else ! -
-
-
I gave DNS over TLS another go after making two adjustments in my environment (under 23.01).
I unchecked Disable hardware checksum offload
I unchecked Enable the ALTQ support for hn NICs.Not sure why I had the last option ticked given I don't virtualise or use shaping, I use Intel igc / i225 on a dedicated Mini PC. Both these settings were on without issue in 22.05.
I ran some load testing on my machine and funny enough this thing is now stable, it actually completed.
348988 queries over 2853 seconds at an average of 120 queries a second way, way more than I normally do. It actually finished and I could WFH comfortable and browse websites whilst this was running. Prior to that dns stopped working after a couple of minutes.
I feel more confident I may have finally (fingers crossed) solved my specific config issue.
-
@joedan
Thanks for the reminder : I completely forgot to re install my munin unbound graphing for unbound. It's up and collecting as from now.Nice graph btw !
I'm still forwarding to 1.1.1.1, actually more using 2606:4700:4700::1111 using TLS.
pfBlockerng with some classic DNSBL, using python mode of course.All seems fine to me.
The munin charts will give me some visual insights, and is far better as the usual "DNS doesn't work".
Now I think about it : the built in Status> Monitoring should have some basic DNS activity monitoring.
And, because its friday : why not a flag on the pfSense dashboard : "You've broken DNS !" ? -
Hmm, that's interesting.
The ALTQ for hn NICs setting does nothing if you don't have hn NICs.
Re-enabling hardware checksum offload would do something. Only after a reboot though, I assume you did that?
It's hard to see how that wouldn't affect a lot more than just DNS over TLS though.
It would likely also be NIC specific too. Is it possible this only affects igc? That seems unlikely, but possible.