Insanely weird issue with DNS resolution to www.cdc.gov
-
Hi All -- having a ridiculously strange issue with DNS resolution related to www.cdc.gov.
What's happening is that DNS queries for www.cdc.gov from network clients are resulting in a SERVFAIL response. Whenever querying the CloudFlare DNS servers directly using
dig
, the results are okay.; <<>> DiG 9.10.6 <<>> @1.1.1.1 www.cdc.gov ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62488 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;www.cdc.gov. IN A ;; ANSWER SECTION: www.cdc.gov. 248 IN CNAME www.akam.cdc.gov. www.akam.cdc.gov. 1 IN A 104.86.21.106 ;; Query time: 7 msec ;; SERVER: 1.1.1.1#53(1.1.1.1) ;; WHEN: Fri Dec 18 14:36:34 PST 2020 ;; MSG SIZE rcvd: 79
From the router itself, resolution also appears to be okay in the diagnostics -> DNS lookup pane:
But when clients query directly against the router, they get SERVFAIL:
; <<>> DiG 9.10.6 <<>> @10.10.0.1 www.cdc.gov ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 55373 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;www.cdc.gov. IN A ;; Query time: 588 msec ;; SERVER: 10.10.0.1#53(10.10.0.1) ;; WHEN: Fri Dec 18 14:38:34 PST 2020 ;; MSG SIZE rcvd: 40
I'm currently not noticing this with any other website except the CDC, but I do feel like I've seen this behavior a handful of other times with random websites.
Are there additional debug logs I could gather from the router to identify if this is a bug in Unbound or something else going on? A colleague with the same device (SG-3100), software version, DNS servers (CloudFlare), and different ISP was able to reproduce.
Thanks!
- Mike
-
I can not duplicate it here..
$ dig @192.168.9.253 www.cdc.gov ; <<>> DiG 9.16.9 <<>> @192.168.9.253 www.cdc.gov ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44762 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;www.cdc.gov. IN A ;; ANSWER SECTION: www.cdc.gov. 2106 IN CNAME www.akam.cdc.gov. www.akam.cdc.gov. 2106 IN A 23.66.90.90 ;; Query time: 0 msec ;; SERVER: 192.168.9.253#53(192.168.9.253) ;; WHEN: Fri Dec 18 18:01:24 Central Standard Time 2020 ;; MSG SIZE rcvd: 79
What are you doing with cloudflare - forwarding, tls? If unbound resolved it fine, then your client asking for it would get cache anyway.. So your test doesn't make a lot of sense.
-
@johnpoz Forwarding, yes, and I do have TLS enabled, but I can reproduce with it off as well. What would be next steps in trying to figure out what's going on here?
-
If you forward you are at the mercy of where you forward wants to answer or not answer..
Sniff your forward to where your sending it, cloudflare - without using tls.. Do you actually query, what does it send back for answer? Or does it not, etc.
I have no issues doing a directed query to 1.1.1.1 and getting an answer.. If you find from doing your sniff or turning on logging of queries that unbound is not sending that on to 1.1.1.1 you will need to figure out why its failing on unbound. Servfail could be lots of things - its a pretty generic failure.. Basically something went wrong.. Its not specific like nx or refused, etc.
-
@mboylan Hi Mike, I have the exact same issue (cdc.gov, CloudFlare, etc). Were you able to figure out what is going on?
Thanks
-
@marshmallow No. Unfortunately not. I did a packet capture on the WAN interface and can see the response coming back, but then capturing between the client and the router results in a SERVFAIL. Something is amuck in passing the response back to the client, and I can’t figure out what. I’ve tried adjusting some of the cache settings, without any luck. Given this is reproducible by several people (at least 3) at this point, I hope Netgate can help figure out what’s going on. The CDC website is kind of important right now. I tried switching to Google for DNS and am getting the same result.
-
Post up this pcap of what they sent back and what you asked for.
Its not a pfsense thing - I resolve it just fine..
-
I have an additional data point for this discussion.
Let me start by saying I am not a DNS guru like @johnpoz, so my DNS troubleshooting skills are more limited.
I have a Windows 2012 R2 DNS server in my network. It's actually part of my Active Directory. I currently have the Windows DNS server resolving via the root servers. I have
unbound
configured in default mode on my pfSense firewall, so it is the default DNS server for pfSense itself and it also resolves via the roots.unbound
has a domain override for my private AD domain, so it asks my AD DNS server for any local stuff. All my network clients point to the AD DNS and use the AD DHCP server (which hands out the AD DNS IP as the DNS server for my LAN).On pfSense, at a shell prompt using
dig
and the localunbound
server, both "cdc.gov" and "www.cdc.gov" resolve just fine. Interestingly, "www.cdc.gov" is a CNAME that points to "akam.cdc.gov". The IP for that host is a totally different IP block than "cdc.gov". I did not go searching to verify this, but my guess is the CDC is using the Akamai CDN for their web site hosting. That would make sense for loading issues.But on Windows DNS, "www.cdc.gov" will not resolve. It produces a SERVFAIL type of error. I tried turning off DNSSEC, clearing the cache, restarting the server (and even uttering some magic spells ... ), and it just would not work when resolving to the root servers. However, when I turned off resolving on the Microsoft side and just told my AD DNS to forward to
unbound
on pfSense everything worked. So in my case, it appears the Microsoft DNS server does not like something about the info returned for "www.cdc.gov". At least when it resolves it. It seems happy to serve up the reply to requesting clients when it gets it via forwarding tounbound
.I've also had other sporadic weirdness in the past with resolving using the Microsoft DNS server and DNSSEC (at least in the 2012 R2 variant I have). So I am turning off resolving on the Microsoft side and just switching over to let it forward to
unbound
on pfSense. I don't really "need" the AD setup, so I may unwind it at some point. The only real reason I've kept it around is the DFS feature supporting a shared data setup in my LAN. -
Not being a DNS expert myself, I wonder if this can shed some light on the issue:
https://community.cloudflare.com/t/cdc-gov-not-resolving/228798/3
-
@marshmallow said in Insanely weird issue with DNS resolution to www.cdc.gov:
Not being a DNS expert myself, I wonder if this can shed some light on the issue:
https://community.cloudflare.com/t/cdc-gov-not-resolving/228798/3
Thanks for the link with the possible answer to the riddle. Strange that
unbound
does not seem particularly bothered by the DNS reply, but other DNS resolvers don't seem to like it. Based on the link you shared, it seems the root issue is with the CNAME record in their DNS and it's not a problem with anything on pfSense. -
@bmeeks said in Insanely weird issue with DNS resolution to www.cdc.gov:
it's not a problem with anything on pfSense.
Not anything to do with unbound.. Or pfsense
There’s a subset of nameservers for akam.cdc.gov that doesn’t return keys https://dnsviz.net/d/www.cdc.gov/dnssec/ so if you’re unlucky it’s going to fail. I added another workaround so it should be better.
So lets state this once again - when you forward you are at the mercy of where you forward..
This does not, nor ever had anything to do with pfsense or unbound.. But is a cloudflare problem.. or to be honest a cdc problem with their dnssec on some of their servers. But when you forward to somewhere - that becomes their problem.
If you can not resolve a cname, that something points to - be it your asking for dnssec or not, then sure you can have problems.. If they have something wrong with their dnssec - you quite often can have more problems. This seems to be a group of NS that are part of that whole process that are having issues. If you try and talk to those - then you have problems, if those have issues talking to who you forward to, you could have problems.
This is why its always better to resolve.. Since you can trace such problems yourself, vs just luck of the draw who you forwarded to having issues. Which could just be a connectivity issue to some NS in the chain when they are resolving, etc.
If you look to where they linked to
https://dnsviz.net/d/www.cdc.gov/dnssec/You can see that some of the NS are having issues.. Not all of them - so its going to be hit or miss.. I have never seen the problem, because prob not talking to those specific NS. They have multiples of them, etc.
Look at all the NS for that domain the cname points too
;; QUESTION SECTION: ;akam.cdc.gov. IN NS ;; ANSWER SECTION: akam.cdc.gov. 86393 IN NS a8-67.akam.net. akam.cdc.gov. 86393 IN NS a5-66.akam.net. akam.cdc.gov. 86393 IN NS a9-64.akam.net. akam.cdc.gov. 86393 IN NS a1-43.akam.net. akam.cdc.gov. 86393 IN NS a2-64.akam.net. akam.cdc.gov. 86393 IN NS a28-65.akam.net.
That is a huge CDN.. which depending on which part region of the globe your in - could even point to some other NSers.. etc.. If some of those have bad or old info - and those are the ones your trying to talk to you - then you could have issues, etc.
-
@bmeeks What doesn't make a lot of sense in my case though is that my clients are using the pfSense box as their DNS server. pfSense is forwarding off the query to CloudFlare, getting a response, and then somehow that response is not making it back to the clients. This seems different from your case where once you told your windows servers to forward to unbound, it started working. I'm already doing that, and I get SERVFAIL. I'm happy to escalate to CloudFlare, but seeing as I can query the host from the pfSense box itself, as well as directly against CloudFlare using dig from my clients (but NOT when forwarding through unbound), I'm hard pressed to believe it's a CloudFlare issue. :-/
Edit: I can post the packet captures later today.
-
@mboylan said in Insanely weird issue with DNS resolution to www.cdc.gov:
@bmeeks What doesn't make a lot of sense in my case though is that my clients are using the pfSense box as their DNS server. pfSense is forwarding off the query to CloudFlare, getting a response, and then somehow that response is not making it back to the clients. This seems different from your case where once you told your windows servers to forward to unbound, it started working. I'm already doing that, and I get SERVFAIL. I'm happy to escalate to CloudFlare, but seeing as I can query the host from the pfSense box itself, as well as directly against CloudFlare using dig from my clients (but NOT when forwarding through unbound), I'm hard pressed to believe it's a CloudFlare issue. :-/
Edit: I can post the packet captures later today.
I agree your issue does not make sense. Are you 100% positive those clients are actually using
unbound
on pfSense? As I posted, in my case letting the AD DNS server forward tounbound
on pfSense solved the issue. And I haveunbound
on pfSense resolving, not forwarding. I think in your case you have it forwarding to Cloudfare if I recall correctly. But then you said on pfSense itselfunbound
can resolve "www.cdc.gov". I assume that is with the Cloudfare forwarding in place ?? -
Saw this thread last night and for kicks tried to go www.cdc.gov - page would not load. Tried again this morning with a
dig www.cdc.gov
and came back withSERVFAIL
. This is using a Pi-hole / Unbound setup (i.e. clients talk to Pi-hole and Pi-hole forwards the DNS query to pfSense/Unbound if not cached, and Unbound then resolves if not already cached). Tried again this afternoon (a few hours ago) and now all is working fine (i.e. DNS resolves properly and page loads fine). I made no changes on my end in the meantime.I think @johnpoz might be on to something - perhaps the related name servers aren't or weren't properly configured and that causes issues. I do have DNSSEC enabled as well on Unbound - could that have been what was failing?
-
Just look at
https://dnsviz.net/d/www.cdc.gov/dnssec/They have quite a few problems going on.. Its not cloudflare's job to fix it.. Its the domain owners job to make sure their dns works correctly and is valid.
I would contact the cdc webmaster and show him that above dnsviz link.. Tell him to fix his shit..
All kinds of stuff wrong..
net to edgekey.net: The following NS name(s) were found in the authoritative NS RRset, but not in the delegation NS RRset (i.e., in the net zone): a11-65.akam.net, ns1-2.akam.net, a9-65.akam.net, a3-65.akam.net net to edgekey.net: The following NS name(s) were found in the delegation NS RRset (i.e., in the net zone), but not in the authoritative NS RRset: ns1-66.akam.net, ns4-66.akam.net, ns5-66.akam.net, ns7-65.akam.net www.akam.cdc.gov/CNAME: The server returned CNAME for www.akam.cdc.gov, but records of other types exist at that name.
That it resolves sometimes at all is just luck to be honest ;)
They have issues way up the chain..
gov to cdc.gov: The following NS name(s) were found in the authoritative NS RRset, but not in the delegation NS RRset (i.e., in the gov zone): icdc-us-ns1.cdc.gov, icdc-us-ns3.cdc.gov, icdc-us-ns2.cdc.gov gov to cdc.gov: The following NS name(s) were found in the delegation NS RRset (i.e., in the gov zone), but not in the authoritative NS RRset: auth00.ns.uu.net, auth100.ns.uu.net
So again its all going to depend on which NSs your talking too, and what info they have or don't have
Sometimes it will work, sometimes it won't.. the cdc.gov is who should get this fixed..
If a domain has issues with their dnssec - and you forward to somewhere that does dnssec like cloudflare. Your setting of dnssec isn't on or off isn't going to do anything. It should be OFF if you forward.. Where you forward either does dnssec or it doesn't.. There is no point for asking for dnssec when you forward. If you want dnssec when you forward, then pick a place to forward to that does dnssec. I have been over this countless times ;)
edit: Even asking clouldflare you get different responses.. Depending I assume which NS you hit of theirs via anycast..
;www.cdc.gov. IN A ;; ANSWER SECTION: www.cdc.gov. 78 IN CNAME www.akam.cdc.gov. www.akam.cdc.gov. 3378 IN CNAME www.cdc.gov.edgekey.net. www.cdc.gov.edgekey.net. 20544 IN CNAME e9313.dscb.akamaiedge.net. e9313.dscb.akamaiedge.net. 20 IN A 23.222.138.25 ;; Query time: 15 msec ;; SERVER: 1.1.1.1#53(1.1.1.1) ;; WHEN: Tue Dec 29 06:17:04 Central Standard Time 2020 ;; MSG SIZE rcvd: 152 sec later ;www.cdc.gov. IN A ;; ANSWER SECTION: www.cdc.gov. 76 IN CNAME www.akam.cdc.gov. www.akam.cdc.gov. 19 IN A 23.222.138.25 ;; Query time: 132 msec ;; SERVER: 1.1.1.1#53(1.1.1.1) ;; WHEN: Tue Dec 29 06:17:05 Central Standard Time 2020 ;; MSG SIZE rcvd: 79
The cdc really should fix up their shit ;)
-
@johnpoz said in Insanely weird issue with DNS resolution to www.cdc.gov:
The cdc really should fix up their shit ;)
I’m experiencing this problem, also. When I disable DNSSEC the problem goes away and CDC.GOV loads.
Can anything else be done as a workaround, which wouldn’t have as broad an scope as toggling DNSSEC?
Thank you —
-
@timtrace said in Insanely weird issue with DNS resolution to www.cdc.gov:
Can anything else be done as a workaround
One way would be to do a domain override to say 9.9.9.10, which is quad9 that doesn't do dnssec.. So that shouldn't fail.. You do a domain override for cdc.gov to any NS that doesn't do dnssec..
Another option should be to set unbound not to do dnssec for that domain.. In the options box
server:
domain-insecure: "cdc.gov"You would think they would have fixed their shit by now to be honest.. You might actually have to do it for domains the cnames point to if you don't do the domain override forwarding to a non dnssec ns..
But looks like they just have the 1 cname currently www.akam.cdc.gov, so cdc.gov as the unsecure domain should work.
Worse case is you add the other domains as unsecure as well
www.akam.cdc.gov. 3378 IN CNAME www.cdc.gov.edgekey.net. www.cdc.gov.edgekey.net. 20544 IN CNAME e9313.dscb.akamaiedge.net.
Who ever is in charge of their dns should really be fired..
-
@johnpoz said in Insanely weird issue with DNS resolution to www.cdc.gov:
server:
domain-insecure: "cdc.gov"Thanks, man! That worked perfectly.
-
@johnpoz Thanks! This option fixed the issue immediately.
-
@johnpoz said in Insanely weird issue with DNS resolution to www.cdc.gov:
Another option should be to set unbound not to do dnssec for that domain.. In the options box
server:
domain-insecure: "cdc.gov"Thank you! Worked for me, too.