Certain domains failing in DNS Resolver/unbound
-
@clesports could be stalling on waiting for dnssec info.. Or the dnssec info is large, and your having an issue with UDP size..
I would sniff on your wan when you do that test.. So you can see traffic..
Could you do this test for me
[22.01-RELEASE][admin@sg4860.local.lan]/root: dig +short rs.dns-oarc.net TXT rst.x4090.rs.dns-oarc.net. rst.x4058.x4090.rs.dns-oarc.net. rst.x4064.x4090.rs.dns-oarc.net. [22.01-RELEASE][admin@sg4860.local.lan]/root:
info here
https://www.dns-oarc.net/oarc/services/replysizetest
edit: also look for interception of your dns.. I went over how to do testing for this in this thread.
https://forum.netgate.com/post/1034563
edit2: Upon looking at your traces again - you do seem to be getting to really quick responses..
;; Received 218 bytes from 37.209.196.14#53(ns3.cctld.co) in 12 ms
That seems to be same for the different level of servers in the trace..
;; Received 411 bytes from 192.5.5.241#53(f.root-servers.net) in 11 ms
;; Received 256 bytes from 205.251.192.179#53(a.r06.twtrdns.net) in 12 msIts quite possible especially in light of your dnssec comment, that some shenanigans are going on... I would do some of the tests I pointed out in that thread I linked too.
edit3: You mentioned something about putting this into production.. Where exactly is pfsense currently in your network - is it directly connected to internet? Or is internal to your network, is it possible network pfsense is on could be redirecting dns?
-
@johnpoz Here's what I got from running that first command:
[2.5.2-RELEASE][root@pfSense1.home.arpa]/root: dig +short rs.dns-oarc.net TXT rst.x1363.rs.dns-oarc.net. rst.x1373.x1363.rs.dns-oarc.net. rst.x1344.x1373.x1363.rs.dns-oarc.net. "172.253.8.134 sent EDNS buffer size 1400" "172.253.8.134 DNS reply size limit is at least 1373"
I think it's ok since my MTU is 1500 but I'm not sure. DNS is not being redirected because my dig @1.2.3.4 www.google.com command timed out (followed your linked forum post).
My production pfSense is behind an Internet load-balancer of sorts, but I would think shouldn't matter because these requests are UDP. The vast majority of traffic leaves 1 Internet leg. That prod. pfsense has DNS Resolver and Forwarder disabled.
Not sure if this is related, but when I ran the packet capture, I saw a bunch of packets to my DNS server like this (assigned via dhcp) even when I wasn't trying DNS lookups. Turned off pfblocker but they were still happening.
12:54:42.283726 IP (tos 0x0, ttl 64, id 30711, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.11885 > 10.1.0.185.53: [bad udp cksum 0x1519 -> 0xd84e!] 18342+% [1au] DNSKEY? co. ar: . OPT UDPsize=512 DO (31) 12:54:42.847699 IP (tos 0x0, ttl 64, id 17220, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.41917 > 10.1.0.185.53: [bad udp cksum 0x1519 -> 0x66db!] 17353+% [1au] DNSKEY? co. ar: . OPT UDPsize=512 DO (31) 12:54:43.222827 IP (tos 0x0, ttl 64, id 23923, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.15248 > 10.1.0.185.53: [bad udp cksum 0x1519 -> 0x7429!] 40616+% [1au] DNSKEY? co. ar: . OPT UDPsize=512 DO (31)
On my external firewall, here was the capture output running the "dig t.co @127.0.0.1 +trace +nodnssec" command:
12:54:41.674000 IP (tos 0x0, ttl 64, id 46512, offset 0, flags [none], proto UDP (17), length 73) 10.1.0.38.9606 > 198.97.190.53.53: [udp sum ok] 8236 [1au] A? t.co. ar: . OPT UDPsize=4096 (45) 12:54:41.699642 IP (tos 0x0, ttl 51, id 29929, offset 0, flags [none], proto UDP (17), length 439) 198.97.190.53.53 > 10.1.0.38.9606: [udp sum ok] 8236- q: A? t.co. 0/6/13 ns: co. NS ns1.cctld.co., co. NS ns2.cctld.co., co. NS ns3.cctld.co., co. NS ns4.cctld.co., co. NS ns5.cctld.co., co. NS ns6.cctld.co. ar: ns1.cctld.co. A 37.209.192.14, ns2.cctld.co. A 37.209.194.14, ns3.cctld.co. A 37.209.196.14, ns4.cctld.co. A 156.154.103.25, ns5.cctld.co. A 156.154.104.25, ns6.cctld.co. A 156.154.105.25, ns1.cctld.co. AAAA 2001:dcd:1::14, ns2.cctld.co. AAAA 2001:dcd:2::14, ns3.cctld.co. AAAA 2001:dcd:3::14, ns4.cctld.co. AAAA 2610:a1:1010::21, ns5.cctld.co. AAAA 2610:a1:1011::21, ns6.cctld.co. AAAA 2610:a1:1012::21, . OPT UDPsize=1232 (411) 12:55:43.341876 IP (tos 0x0, ttl 64, id 25555, offset 0, flags [none], proto UDP (17), length 73) 10.1.0.38.7725 > 37.209.194.14.53: [udp sum ok] 2571 [1au] A? t.co. ar: . OPT UDPsize=4096 (45) 12:55:43.357935 IP (tos 0x0, ttl 51, id 23788, offset 0, flags [none], proto UDP (17), length 246) 37.209.194.14.53 > 10.1.0.38.7725: [udp sum ok] 2571- q: A? t.co. 0/8/1 ns: t.co. NS b.r06.twtrdns.net., t.co. NS a.r06.twtrdns.net., t.co. NS d.r06.twtrdns.net., t.co. NS d01-02.ns.twtrdns.net., t.co. NS ns4.p34.dynect.net., t.co. NS d01-01.ns.twtrdns.net., t.co. NS c.r06.twtrdns.net., t.co. NS ns3.p34.dynect.net. ar: . OPT UDPsize=1232 (218) 12:55:45.021631 IP (tos 0x0, ttl 64, id 40569, offset 0, flags [none], proto UDP (17), length 73) 10.1.0.38.30222 > 108.59.164.34.53: [udp sum ok] 27645 [1au] A? t.co. ar: . OPT UDPsize=4096 (45) 12:55:45.049542 IP (tos 0x0, ttl 51, id 63418, offset 0, flags [none], proto UDP (17), length 374) 108.59.164.34.53 > 10.1.0.38.30222: [udp sum ok] 27645*- q: A? t.co. 4/10/1 t.co. A 104.244.42.69, t.co. A 104.244.42.197, t.co. A 104.244.42.5, t.co. A 104.244.42.133 ns: t.co. NS d01-02.ns.twtrdns.net., t.co. NS a.r06.twtrdns.net., t.co. NS b.r06.twtrdns.net., t.co. NS d.r06.twtrdns.net., t.co. NS c.r06.twtrdns.net., t.co. NS ns1.p34.dynect.net., t.co. NS ns2.p34.dynect.net., t.co. NS ns3.p34.dynect.net., t.co. NS ns4.p34.dynect.net., t.co. NS d01-01.ns.twtrdns.net. ar: . OPT UDPsize=1232 (346) 12:56:00.383798 IP (tos 0x0, ttl 64, id 31475, offset 0, flags [none], proto UDP (17), length 76) 10.1.0.38.52598 > 202.12.27.33.53: [udp sum ok] 48489 [1au] A? nbc.com. ar: . OPT UDPsize=4096 (48)
The large gap in time matches the previous video I posted.
-
@clesports said in Certain domains failing in DNS Resolver/unbound:
"172.253.8.134 DNS reply size limit is at least 1373"
Yeah that can be problematic - for large records, say dnssec.. Especially if you do not allow switching to tcp, or allow for tcp on 53.
You seem to have a mismatch going on, see here
OPT UDPsize=4096
But your not allowing for that size upstream of pfsense.. Be it your network, or your isp network.. But something upstream is limiting your size to 1373.
You could adjust this setting
And make sure your allowing for tcp 53 upstream so you can fallback to tcp when something is too large for your UDP limits.
-
@johnpoz I just checked the unbound logs and earlier today I was getting a bunch of messages like this
[42819:3] info: failed to prime trust anchor -- could not fetch DNSKEY rrset . DNSKEY IN
Plus I saw these in captures I was running from the lab firewall I've been testing with:
15:27:16.282653 IP (tos 0x0, ttl 128, id 9169, offset 0, flags [DF], proto UDP (17), length 60) 10.1.0.185.53 > 10.1.0.38.50941: [udp sum ok] 7159 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (32) 15:27:16.282658 IP (tos 0x0, ttl 128, id 9170, offset 0, flags [DF], proto UDP (17), length 60) 10.1.0.185.53 > 10.1.0.38.21142: [udp sum ok] 13473 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (32) 15:27:16.282705 IP (tos 0x0, ttl 64, id 16374, offset 0, flags [none], proto UDP (17), length 60) 10.1.0.38.15283 > 10.1.0.185.53: [bad udp cksum 0x151a -> 0x53af!] 11278+% [1au] DNSKEY? co. ar: . OPT UDPsize=4096 DO (32) 15:27:16.282706 IP (tos 0x0, ttl 64, id 25421, offset 0, flags [none], proto UDP (17), length 60) 10.1.0.38.56556 > 10.1.0.185.53: [bad udp cksum 0x151a -> 0xcf88!] 3835+% [1au] DNSKEY? co. ar: . OPT UDPsize=4096 DO (32) 15:27:16.282768 IP (tos 0x0, ttl 128, id 9171, offset 0, flags [DF], proto UDP (17), length 60) 10.1.0.185.53 > 10.1.0.38.24595: [udp sum ok] 52395 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (32) 15:27:16.282804 IP (tos 0x0, ttl 64, id 57620, offset 0, flags [none], proto UDP (17), length 60) 10.1.0.38.34577 > 10.1.0.185.53: [bad udp cksum 0x151a -> 0xbc52!] 30732+% [1au] DNSKEY? co. ar: . OPT UDPsize=4096 DO (32)
So, with all the ServFail responses, could possibly be because of my internal DNS servers? Opened up TCP 53 on my external FW and it didn't seem to help, but I'm thinking this will all go away when I set a static IP and configure external DNS servers
Also, I modified the EDNS Buffer size to 4096 like you said, but still ran into the delays.
-
@clesports said in Certain domains failing in DNS Resolver/unbound:
I modified the EDNS Buffer size to 4096 like you said, but still ran into the delays.
That is not what I said - that is the default ;)
You could adjust this setting
I meant for you to lower to something below what your limited too.
You seem to be limited..
"172.253.8.134 sent EDNS buffer size 1400" "172.253.8.134 DNS reply size limit is at least 1373"
So you need to adjust that to be inline with what you can do, which seems to be lower than 1400..
Try changing it to one of these
-
@johnpoz I'm sorry, I misunderstood what you were saying about the EDNS buffer size. I set it to 1232 and ran the reply size test and it returned the same information (buffer size 1400, size limit at least 1373)
The more I test, the more I'm starting to think the problem might be stemming from my Domain Controllers which are my DNS servers internally.
May 23 10:58:21 unbound 62587 [62587:2] info: generate keytag query _ta-4f66. NULL IN --Packet capture at that time-- 10:58:21.944462 00:1a:8c:5c:ca:05 > d4:ae:52:a2:1d:3c, ethertype IPv4 (0x0800), length 79: (tos 0x0, ttl 64, id 4903, offset 0, flags [none], proto UDP (17), length 65) 10.1.0.38.5393 > 10.1.0.186.53: [bad udp cksum 0x1520 -> 0x6fb2!] 9469+ [1au] NULL? _ta-4f66. ar: . OPT UDPsize=1232 DO (37) 10:58:21.944791 d4:ae:52:a2:1d:3c > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 79: (tos 0x0, ttl 128, id 17252, offset 0, flags [DF], proto UDP (17), length 65) 10.1.0.186.53 > 10.1.0.38.5393: [udp sum ok] 9469 ServFail q: NULL? _ta-4f66. 0/0/1 ar: . OPT UDPsize=4000 DO (37) 10:58:22.138338 00:1a:8c:5c:ca:05 > d4:ae:52:a2:1d:3c, ethertype IPv4 (0x0800), length 79: (tos 0x0, ttl 64, id 6195, offset 0, flags [none], proto UDP (17), length 65) 10.1.0.38.13070 > 10.1.0.186.53: [bad udp cksum 0x1520 -> 0x8684!] 61469+% [1au] NULL? _ta-4f66. ar: . OPT UDPsize=1232 DO (37) 10:58:22.138686 d4:ae:52:a2:1d:3c > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 79: (tos 0x0, ttl 128, id 17253, offset 0, flags [DF], proto UDP (17), length 65) 10.1.0.186.53 > 10.1.0.38.13070: [udp sum ok] 61469 ServFail q: NULL? _ta-4f66. 0/0/1 ar: . OPT UDPsize=4000 DO (37)
The long delays/timeouts still happen, and during the 50-some-odd seconds of deadtime waiting for the 'dig trace' to complete, there are a TON of DNSKEY packets being sent that get ServFail responses 8-13 seconds later:
10:58:55.782487 00:1a:8c:5c:ca:05 > d4:ae:52:a2:1d:3c, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 64, id 3039, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.61614 > 10.1.0.186.53: [bad udp cksum 0x151a -> 0x9688!] 63270+% [1au] DNSKEY? co. ar: . OPT UDPsize=1232 DO (31) 10:58:55.891835 00:1a:8c:5c:ca:05 > d4:ae:52:a2:1d:3c, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 64, id 37474, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.47573 > 10.1.0.186.53: [bad udp cksum 0x151a -> 0x4032!] 33878+% [1au] DNSKEY? co. ar: . OPT UDPsize=1232 DO (31) 10:58:56.101717 00:1a:8c:5c:ca:05 > d4:ae:52:a2:1d:3c, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 64, id 50977, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.13093 > 10.1.0.186.53: [bad udp cksum 0x151a -> 0xcab4!] 32900+% [1au] DNSKEY? co. ar: . OPT UDPsize=1232 DO (31) 10:58:56.319455 00:1a:8c:5c:ca:05 > d4:ae:52:a2:1d:3c, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 64, id 2036, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.31293 > 10.1.0.186.53: [bad udp cksum 0x151a -> 0x5601!] 44575+% [1au] DNSKEY? co. ar: . OPT UDPsize=1232 DO (31) 10:58:56.729863 00:1a:8c:5c:ca:05 > d4:ae:52:a2:1d:3c, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 64, id 44968, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.42776 > 10.1.0.186.53: [bad udp cksum 0x151a -> 0xaa95!] 11440+% [1au] DNSKEY? co. ar: . OPT UDPsize=1232 DO (31) 10:58:57.079466 00:1a:8c:5c:ca:05 > c8:1f:66:bd:c7:61, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 64, id 19060, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.62855 > 10.1.0.185.53: [bad udp cksum 0x1519 -> 0xf5dd!] 37625+% [1au] DNSKEY? co. ar: . OPT UDPsize=1232 DO (31) 10:58:57.164326 00:1a:8c:5c:ca:05 > c8:1f:66:bd:c7:61, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 64, id 62098, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.38.36867 > 10.1.0.185.53: [bad udp cksum 0x1519 -> 0x6f74!] 32487+% [1au] DNSKEY? co. ar: . OPT UDPsize=1232 DO (31) 10:59:05.108434 d4:ae:52:a2:1d:3c > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 128, id 17326, offset 0, flags [DF], proto UDP (17), length 59) 10.1.0.186.53 > 10.1.0.38.47573: [udp sum ok] 33878 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (31) 10:59:05.108503 d4:ae:52:a2:1d:3c > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 128, id 17325, offset 0, flags [DF], proto UDP (17), length 59) 10.1.0.186.53 > 10.1.0.38.61614: [udp sum ok] 63270 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (31) 10:59:05.108504 d4:ae:52:a2:1d:3c > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 128, id 17327, offset 0, flags [DF], proto UDP (17), length 59) 10.1.0.186.53 > 10.1.0.38.13093: [udp sum ok] 32900 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (31) 10:59:05.108562 d4:ae:52:a2:1d:3c > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 128, id 17328, offset 0, flags [DF], proto UDP (17), length 59) 10.1.0.186.53 > 10.1.0.38.31293: [udp sum ok] 44575 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (31) 10:59:05.108563 d4:ae:52:a2:1d:3c > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 128, id 17329, offset 0, flags [DF], proto UDP (17), length 59) 10.1.0.186.53 > 10.1.0.38.42776: [udp sum ok] 11440 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (31) 10:59:05.108564 d4:ae:52:a2:1d:3c > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 128, id 17330, offset 0, flags [DF], proto UDP (17), length 59) 10:59:08.475907 c8:1f:66:bd:c7:61 > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 128, id 11056, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.185.53 > 10.1.0.38.36867: [udp sum ok] 32487 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (31) 10:59:08.475937 c8:1f:66:bd:c7:61 > 00:1a:8c:5c:ca:05, ethertype IPv4 (0x0800), length 73: (tos 0x0, ttl 128, id 11057, offset 0, flags [none], proto UDP (17), length 59) 10.1.0.185.53 > 10.1.0.38.62855: [udp sum ok] 37625 ServFail q: DNSKEY? co. 0/0/1 ar: . OPT UDPsize=4000 DO (31)
Maybe there's a setting on the Domain Controllers to allow DNSSEC or something similar. No TCP session gets opened either, it sticks to UDP.
If I change my configured DNS servers in my lab FW to 1.1.1.1 or 8.8.8.8 and disabled the ones applied via DHCP, it seems to work (as of right now) and I see TCP DNS packets too for the DNSKEY part
-
@clesports you made no mention of pointing pfsense to any dns.. Out of the box pfsense RESOLVES.. the whole point of the +trace.
If your forwarding to something then no you shouldn't have dnssec checked. Where you forward either does dnssec for you, ie its a resolver or it doesn't.
If your not going to actually resolve with pfsense, then dnssec shouldn't be checked.
-
@johnpoz said in Certain domains failing in DNS Resolver/unbound:
@clesports you made no mention of pointing pfsense to any dns.. Out of the box pfsense RESOLVES.. the whole point of the +trace.
If your forwarding to something then no you shouldn't have dnssec checked. Where you forward either does dnssec for you, ie its a resolver or it doesn't.
If your not going to actually resolve with pfsense, then dnssec shouldn't be checked.
Ahh, that makes a big difference. In my initial post I did say I was doing DNS forwarding at the bottom of the post, but maybe it was missed:
I do have "Enable Forwarding Mode" checked on the DNS Resolver page under "DNS Query Forwarding". If it's disabled, no DNS lookups work at all. Even with that enabled, Twitter's t.co and others still don't work
I'll disable DNSSEC. I assume unbound in DNS Forwarding mode would still cache some results, and that's all that I'm looking to do since our Domain Controllers handle DNS records internally. I'm looking to point the DCs external requests to pfSense to take advantage of pfBlockerNG features.
Thank you for all your help John, I really do appreciate your knowledge and experience.
-
@clesports maybe I missed it - would of never had you do a trace if you forwarding. Forwarding is only helpful in troubleshooting a resolver.
If your forwarding - if you don't get an answer its the forwarder problem. or communication issue with the forwarder. What a trace shows is how resolver gets the answer.
dnssec is nothing but a problem if your forwarding - it shouldn't be checked. Either where you forward to is doing dnssec already, if not you asking for it doesn't do anything anyway other then wasted extra queries..
Makes sure if you have further questions with unbound, that you state in the beginning in forwarder mode - and where your actually forwarding too.. I looked over your first post again, and yeah you did say that. But its buried at the end after posts showing your attempt at lookup.
I assumed, my bad seems that Resolver/Unbound in the title that you were using default which is resolver..
If working for you now - all good, just took us longer to get to the problem.. Sorry about that.. if would of realized forwarding, prob could of gotten this straightened out in a couple of posts..
-
@johnpoz No problem, all water under the bridge. Maybe this lengthy thread will be help to someone in the future in regular Resolver mode.
I should have been more clear in my post too. I knew the DNS Forwarder was dnsmasq and wanted to make sure someone knew it was unbound instead. Next time I'll state it upfront which mode I'm running in.
I learned more abound unbound and some dig queries along the way which is always helpful. Thanks again!