DNS suddenly broken [on some VLANs]
-
@johnpoz Just chiming in here because I'm observing the same issue, which began this Monday around 6PM eastern for me, on two physically disparate pfSense machines both with unbound configured to use only NordVPN interfaces for outgoing. Completely agree with your assessment that it would not make obvious sense for Nord to block only the root servers and not other DNS, although my workaround so far has been to put unbound into forwarding mode with the system DNS servers set to Google, Cloudflare, and Quad9. I have the resolution behavior set to use 127.0.0.1 and ignore remote, and unbound's outgoing interfaces are still set to only Nord. That's working just fine, and I have verified by examining the states that the queries are being routed through the VPN.
So, not sure what's going on. I tried switching unbound back to recursive mode today to see if it was a transient issue, but went right back to failing (with SERVFAIL the same as @wfx). I cam also observe that if I just switch the outgoing interfaces from Nord to WAN it works fine. Whatever it is, it definitely has to do with Nord, and yet they're also definitely not just blocking everything but their own DNS servers.
I also realize these forums are not for diagnosing VPN provider issues :) Just wanted to provide some corroborating evidence, but not sure if it's helpful at all.
Quick edit just to be comprehensive: The onset of this issue for me did not coincide with any config changes to pfSense, and unbound in recursive mode routed through Nord has worked for ~8 years prior, so I'll definitely be on the lookout for any information about what they may have changed.
-
@TheNarc well its possible that that the roots are blocking - but its a weird block to send back servfail - if I was going to block someone from talking to me, I would block them from even opening the connection. I wouldn't let them talk to me and then send them back - sorry buddy can't look that up ;) ie servfail.
But maybe its something they do internal and not at the edge - were they jsut say hey don't answer stuff for these IPs.. But you would think that would send back a refused..
But also you sure they are not just redirecting dns - so you think you got an answer from google, but you really got an answer from their servers? Did you try a directed query to 1.2.3.4? If that answers - its a smoking gun that dns is being redirect.. Because 1.2.3.4 doesn't answer dns.. There are a few other ways to check for redirection as well - but that is very quick easy test.
If that doesn't answer doesn't mean they are not redirecting, there are other ways to check for redirection. They might only be redirecting queries to specific dns, etc. vs all port 53..
You can do a query to an authoritative server and look for the aa flag in the response, or you can check the ttl, if you do not get back the full ttl when you talk to the authoritative server, that is another smoking gun you were redirected and got something from cache. If your talking to the authoritative ns, it will always send the full ttl. If you ask it again and get a lower ttl, then that was pulled from cache and you didn't get the answer from the authoritative server.
-
@johnpoz Quick and easy tests ftw, sorry for neglecting to see that request earlier. It would appear that you're correct!
$ dig @1.2.3.4 netgate.com ; <<>> DiG 9.18.24-1-Debian <<>> @1.2.3.4 netgate.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10891 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;netgate.com. IN A ;; ANSWER SECTION: netgate.com. 60 IN A 199.60.103.4 netgate.com. 60 IN A 199.60.103.104 ;; Query time: 109 msec ;; SERVER: 1.2.3.4#53(1.2.3.4) (UDP) ;; WHEN: Fri Mar 08 16:39:50 EST 2024 ;; MSG SIZE rcvd: 72
That seems like pretty solid evidence of a "security enhancement" courtesy of Nord. Thank you for suggesting that test, wish I'd thought of it!
-
@TheNarc yeah they are redirecting your dns - for a FACT.. because 1.2.3.4 isn't doing dns..
$ dig @1.2.3.4 netgate.com ; <<>> DiG 9.16.48 <<>> @1.2.3.4 netgate.com ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached
I edited and added a few other ways to spot redirection.
here is example of another test.. Notice the aa in the flags when I talked directly to one of the authoritative ns for netgate
There are lots of ways to spot redirection.. Also another way is doing query to some server you know is X ms away, and getting a response that is no way possible.. you query abc ns on the other side of the planet like 200ms away, and you get a response in 10 ms or something - you know for sure you didn't actually talk to that ns on the other side of the planet ;)
edit:
also notice the recursion requested but not available - pretty much all authoritative ns, would not allow recursion - so you should see that in the response. At least anyone setup correctly would not do recursion, auth severs shouldn't be allow for recursion. So if you ask an aa server some domain and you don't see that.. Either they are misconfigured and allowing it - or you have been redirected to some other server that is doing recursion. -
@johnpoz Great information, thanks again. And can doubly (triply?) confirm by looking for the other "tells" you point out. No aa flag and a query time that drops from 258ms to 42ms when querying a root server twice in a row (most output excluded for brevity):
dig @192.33.4.12 arstechnica.com ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; Query time: 258 msec dig @192.33.4.12 arstechnica.com ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; Query time: 42 msec
-
@TheNarc yeah - unless your vpn can change the laws of physics and get you there faster ;) or you normal path is horrible ;) that is pretty good proof that you were redirected..
But that answer has been redirected for another tell.. The root server you queried "c.root-servers.net" wouldn't answer such a query in the first place ;)
He would send you back this
$ dig @c.root-servers.net arstechnica.com ; <<>> DiG 9.16.48 <<>> @c.root-servers.net arstechnica.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42743 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 6f21d9186b8310cf0100000065eb8b15aabe1bc77d5654a3 (good) ;; QUESTION SECTION: ;arstechnica.com. IN A ;; AUTHORITY SECTION: com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. ;; ADDITIONAL SECTION: m.gtld-servers.net. 172800 IN A 192.55.83.30 l.gtld-servers.net. 172800 IN A 192.41.162.30 k.gtld-servers.net. 172800 IN A 192.52.178.30 j.gtld-servers.net. 172800 IN A 192.48.79.30 i.gtld-servers.net. 172800 IN A 192.43.172.30 h.gtld-servers.net. 172800 IN A 192.54.112.30 g.gtld-servers.net. 172800 IN A 192.42.93.30 f.gtld-servers.net. 172800 IN A 192.35.51.30 e.gtld-servers.net. 172800 IN A 192.12.94.30 d.gtld-servers.net. 172800 IN A 192.31.80.30 c.gtld-servers.net. 172800 IN A 192.26.92.30 b.gtld-servers.net. 172800 IN A 192.33.14.30 a.gtld-servers.net. 172800 IN A 192.5.6.30 m.gtld-servers.net. 172800 IN AAAA 2001:501:b1f9::30 l.gtld-servers.net. 172800 IN AAAA 2001:500:d937::30 k.gtld-servers.net. 172800 IN AAAA 2001:503:d2d::30 j.gtld-servers.net. 172800 IN AAAA 2001:502:7094::30 i.gtld-servers.net. 172800 IN AAAA 2001:503:39c1::30 h.gtld-servers.net. 172800 IN AAAA 2001:502:8cc::30 g.gtld-servers.net. 172800 IN AAAA 2001:503:eea3::30 f.gtld-servers.net. 172800 IN AAAA 2001:503:d414::30 e.gtld-servers.net. 172800 IN AAAA 2001:502:1ca1::30 d.gtld-servers.net. 172800 IN AAAA 2001:500:856e::30 c.gtld-servers.net. 172800 IN AAAA 2001:503:83eb::30 b.gtld-servers.net. 172800 IN AAAA 2001:503:231d::2:30 a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30 ;; Query time: 14 msec ;; SERVER: 192.33.4.12#53(192.33.4.12) ;; WHEN: Fri Mar 08 16:03:02 Central Standard Time 2024 ;; MSG SIZE rcvd: 868
Saying hey buddy I don't answer for those, here is where you can ask for that, and just answer the NS for .com in what you answer.
The root servers only answer for the NSs for the TLDs.. they won't answer for anything else..
-
@TheNarc you going to ask Nord wtf they are doing? Curious what they respond with.. ;)
-
OP here. Late to check back because with all of the troubleshooting that I was doing (and settings changes), my pfSense box became FUBAR. Oddly, even a restore didn't fix it. So I've been completely rebuilding it.
Happy to see a lot of discussion. Like other people in here, I am also using NordVPN. And just like @wfx, I also tried using Mullvad (lol) without any luck, which is why I was leaning away from VPN provider being the issue.
HOWEVER, unlike the other people in here, I did do a
dig @8.8.8.8 netgate.com
, which still failed to resolve. With that said, I was deep into modifying settings, so there may have been a DNS tweak that messed with things. I've almost got my box rebuilt, so I'll test over the weekend.Side Question: Since both NordVPN and mullvad seem to be greedy for DNS queries, are there any VPN providers that folks are having luck with that they would recommend?
-
@Generally-Lost In the past I have used Air VPN. If you trust your ISP check to see if they provide one. Mine does with my service.
-
@johnpoz I will yes. Although I walked away from the computer and won't be back for an hour or two (on a phone now) and also want to run another test. Because as so often happens, while on the treadmill and left to my own thoughts, I realized I'm stupid once again. Someone is absolutely redirecting my DNS, but that person is me. Or at least I'm first in line. I've got a port forward to prevent LAN clients from going around the VPN, and wouldn't you know it, I forgot about it until just now. So Nord may well be redirecting me too, but all my test proved was that I'm an idiot, and I'm redirecting me. As soon as I can ill disable my redirect rule, test again, and post results. Should be able to within two hours.
-
@TheNarc that is actually good info to know.. Since maybe others enabled redirection? So yeah if your redirecting yourself - you would see the same sort of tells of redirection your provider or vpn service was doing it..
-
@Generally-Lost said in DNS suddenly broken [on some VLANs]:
dig @8.8.8.8 netgate.com, which still failed to resolv
Well a directed query like that shouldn't fail - unless you were blocking it yourself, or upstream they were blocking it.. That would have nothing to do with the root servers blocking anything for sure.. Because that query asks 8.8.8.8 hey look this up for me, or hand me whatever you have in your cache for it.. So you get a timeout, a nx a servfail a refused?
-
This post is deleted! -
@johnpoz Okay, just tested with my redirection port forward rules disabled, and the results were the same. So I think it's fairly conclusive that Nord decided to start hijacking all DNS without notifying any of their customers, but at least it seems like we know what's going on.
Edit: One other (maybe?) interesting data point. I was expecting that if Nord is redirecting all DNS that if I set my system DNS server to 1.2.3.4 and left unbound in forwarding mode, DNS resolution would still work. But it doesn't; that configuration yields SERVFAIL. Likely a misconception or misunderstanding on my part, but does that still track with the theory here?
-
OK, got my new box up and running. I switched to AirVPN, and the DNS resolutions are working flawlessly. Shout out to @Uglybrian for the recommendation.
-
@TheNarc said in DNS suddenly broken [on some VLANs]:
left unbound in forwarding mode
You have turned off dnssec right - forwarding and dnssec is combination for failure.
But sure in general concept, if they are intercepting traffic which if you are doing a directed query to 1.2.3.4 and just asking for www.google.co you should get an answer there is redirection going on for sure. Then yeah it should work.. In theory, but we are not sure exactly what they are doing.. You would have to figure out where is the servfail coming from, if you are doing dnssec and you forward and that gets messed up it could be unbound saying yeah hey buddy that stuff isn't passing dnssec, servfail.
For example - this fails dnssec, which I am resolving and using
What all kinds of weirdness can happen when you are being redirected and also asking for dnssec - I have never gone down that rabbit hole to the end to see exactly where it fails, etc.. But its a bad combination..
You could sniff on pfsense and see what is actually being asked, and what gets answered or doesn't get answered, or what might get answered in the dnssec chain that is failing to why your unbound being told to do dnssec when it forwards might say it failed with a servfail.. Which is generic answer and doesn't say exactly what failed, just that something did.
Are you forwarding to a resolver, or just another forwarder? there are quite a few variables at play.. And we don't know what exactly they are doing and how they are doing it.. But doing a directed query to 1.2.3.4 and getting an answer should work unless someone is messing with the query somewhere in the path. Because that IP doesn't answer dns.. Try some other IP you know for a fact doesn't answer dns etc..
Here is an IP for www.netgate.com, its not answer to dns
dig @199.60.103.30 www.google.com ;; communications error to 199.60.103.30#53: timed out ;; communications error to 199.60.103.30#53: timed out ;; communications error to 199.60.103.30#53: timed out ; <<>> DiG 9.18.24-1+ubuntu22.04.1+deb.sury.org+1-Ubuntu <<>> @199.60.103.30 www.google.com ; (1 server found) ;; global options: +cmd ;; no servers could be reached
Do you get a query asking them through your vpn?
-
@johnpoz I have made sure to disable DNSSEC when I've got unbound in forwarding mode. I'll try to gather some more data over the weekend and report back, and also see whether I can extract any information from Nord's support.
-
This post is deleted! -
@just_a_user_34721 Thanks for providing more corroboration. I've still found no evidence on their site of an admission to this change. It's rather bizarre. I don't have much new data yet either. I'm going to try to test some more if I can think of things that seem valuable to try. One strange thing, though, was that I figured if all my DNS queries are going to be forwarded to Nord's DNS servers anyway, why not just set them as my system DNS servers that unbound will forward to? So I did (103.86.96.100 and 103.86.99.100) and that broke DNS entirely (SERVFAIL on every query).
-
Hi all,
I tried a few more tests over the weekend. I set the resolver to forwarding mode. Like other posters DNS started working again.
I initially used NordVPN's DNS servers in general settings but then switched to Quad9. It kept working but they could obviously be redirecting the requests to their own servers.
I also tried a few dig commands suggested by @johnpoz to check for redirection of requests made to root servers. Requests were redirected.
I tried dig @1.2.3.4 netgate.com and it resolved so that was clearly redirected however I then tried to set 1.2.3.4 as sole DNS server in pfSense general settings and DNS stopped working. I would have thought it would also get redirected.
I think that, considering multiple people have had the same problem at the same time, it's quite obvious NordVPN has changed their servers' configuration and are now preventing the use of DNS resolvers in recursive mode (despite their tech support claiming no change was made!!!) and redirecting DNS requests to their own servers.
I am about to write a "please explain" email to them. I'll report back what their answer is.
Thanks to all who posted. I've learned quite a bit as a result of the discussion.