DNS suddenly broken [on some VLANs]
-
@Gertjan I initially thought as well this had to be because of something NordVPN did. That is why I created an account with Mullvad. I wanted to be able to say to NordVPN support that the problem was with them only. I didn't expect I would have the same problem with Mullvad! Is there any chance the root servers are not responding because of something incorrectly setup in pfSense?
Edit: Since root DNS servers are not answering, I would have thought a DNS lookup done with the diagnostics tools of pfSense would also fail since all DNS requests are sent through the VPN. However, those lookups succeed. How is that possible?
-
@wfx said in DNS suddenly broken [on some VLANs]:
all DNS requests are sent through the VPN.
You sure about that.. If you are policy routing out your vpn, then no all traffic will not go out the vpn. And even if your not queries from the wan interface made by unbound wouldn't go out the vpn..
A simple sniff would validate if pfsense unbound is sending traffic out the vpn or just your normal wan.
Why not just do a directed query to the roots.. do you get an answer? from your client that is going out the vpn? And then via query that doesn't go through the vpn..
$ dig @192.58.128.30 net. ; <<>> DiG 9.16.48 <<>> @192.58.128.30 net. ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32761 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1472 ;; QUESTION SECTION: ;net. IN A ;; AUTHORITY SECTION: net. 172800 IN NS a.gtld-servers.net. net. 172800 IN NS b.gtld-servers.net. net. 172800 IN NS c.gtld-servers.net. net. 172800 IN NS d.gtld-servers.net. net. 172800 IN NS e.gtld-servers.net. net. 172800 IN NS f.gtld-servers.net. net. 172800 IN NS g.gtld-servers.net. net. 172800 IN NS h.gtld-servers.net. net. 172800 IN NS i.gtld-servers.net. net. 172800 IN NS j.gtld-servers.net. net. 172800 IN NS k.gtld-servers.net. net. 172800 IN NS l.gtld-servers.net. net. 172800 IN NS m.gtld-servers.net. ;; ADDITIONAL SECTION: a.gtld-servers.net. 172800 IN A 192.5.6.30 b.gtld-servers.net. 172800 IN A 192.33.14.30 c.gtld-servers.net. 172800 IN A 192.26.92.30 d.gtld-servers.net. 172800 IN A 192.31.80.30 e.gtld-servers.net. 172800 IN A 192.12.94.30 f.gtld-servers.net. 172800 IN A 192.35.51.30 g.gtld-servers.net. 172800 IN A 192.42.93.30 h.gtld-servers.net. 172800 IN A 192.54.112.30 i.gtld-servers.net. 172800 IN A 192.43.172.30 j.gtld-servers.net. 172800 IN A 192.48.79.30 k.gtld-servers.net. 172800 IN A 192.52.178.30 l.gtld-servers.net. 172800 IN A 192.41.162.30 m.gtld-servers.net. 172800 IN A 192.55.83.30 a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30 b.gtld-servers.net. 172800 IN AAAA 2001:503:231d::2:30 c.gtld-servers.net. 172800 IN AAAA 2001:503:83eb::30 d.gtld-servers.net. 172800 IN AAAA 2001:500:856e::30 e.gtld-servers.net. 172800 IN AAAA 2001:502:1ca1::30 f.gtld-servers.net. 172800 IN AAAA 2001:503:d414::30 g.gtld-servers.net. 172800 IN AAAA 2001:503:eea3::30 h.gtld-servers.net. 172800 IN AAAA 2001:502:8cc::30 i.gtld-servers.net. 172800 IN AAAA 2001:503:39c1::30 j.gtld-servers.net. 172800 IN AAAA 2001:502:7094::30 k.gtld-servers.net. 172800 IN AAAA 2001:503:d2d::30 l.gtld-servers.net. 172800 IN AAAA 2001:500:d937::30 m.gtld-servers.net. 172800 IN AAAA 2001:501:b1f9::30 ;; Query time: 9 msec ;; SERVER: 192.58.128.30#53(192.58.128.30) ;; WHEN: Fri Mar 08 06:46:27 Central Standard Time 2024 ;; MSG SIZE rcvd: 825
That was a directed query to root server, asking for the ns of .net tld
-
Hi @johnpoz,
Pretty sure. The only outgoing interfaces selected in the resolver are the VPNs and the firewall rules only allow traffic to go out through the VPN group.
I ran the dig @192.58.128.30 net. command. Here is the result:
; <<>> DiG 9.18.24 <<>> @192.58.128.30 net. ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 55994 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1424 ;; QUESTION SECTION: ;net. IN A ;; Query time: 603 msec ;; SERVER: 192.58.128.30#53(192.58.128.30) (UDP) ;; WHEN: Fri Mar 08 23:24:57 AWST 2024 ;; MSG SIZE rcvd: 32
No authority and additional sections so I guess that means no answer.
Thought further about why the pfSense DNS lookup works. It gives me results showing the DNS servers configured in System -> General Setup so, if I understand correctly, that means it forwards the request to those servers even though the resolver is not configured as a forwarder.
-
@wfx said in DNS suddenly broken [on some VLANs]:
that means it forwards the request to those servers even though the resolver is not configured as a forwarder.
Correct.. If you didn't want pfsense to do that, then you would set it not to use remote..
If pfsense doesn't get an answer from loopback.. It could try other servers listed, just like your windows client would if you had 2 dns servers listed.
Is it possible roots are blocking - sure ok maybe.. If some IPs were considered attacking or something. But I wouldn't think they would just say hey this range of IPs are vpn IPs - block them..
Its more likely the vpn services, says use are dns.. And we will force you to by not allowing other dns.. Can you query 8.8.8.8 through your vpn? If they were going to the trouble to block roots, you would think they would block all the other major dns providers..
edit: you got a servfail back.. hmmm try doing a query to say 1.2.3.4 for like www.google.com - if you get an answer that is smoking gun that your dns is being intercepted..
-
@johnpoz Just chiming in here because I'm observing the same issue, which began this Monday around 6PM eastern for me, on two physically disparate pfSense machines both with unbound configured to use only NordVPN interfaces for outgoing. Completely agree with your assessment that it would not make obvious sense for Nord to block only the root servers and not other DNS, although my workaround so far has been to put unbound into forwarding mode with the system DNS servers set to Google, Cloudflare, and Quad9. I have the resolution behavior set to use 127.0.0.1 and ignore remote, and unbound's outgoing interfaces are still set to only Nord. That's working just fine, and I have verified by examining the states that the queries are being routed through the VPN.
So, not sure what's going on. I tried switching unbound back to recursive mode today to see if it was a transient issue, but went right back to failing (with SERVFAIL the same as @wfx). I cam also observe that if I just switch the outgoing interfaces from Nord to WAN it works fine. Whatever it is, it definitely has to do with Nord, and yet they're also definitely not just blocking everything but their own DNS servers.
I also realize these forums are not for diagnosing VPN provider issues :) Just wanted to provide some corroborating evidence, but not sure if it's helpful at all.
Quick edit just to be comprehensive: The onset of this issue for me did not coincide with any config changes to pfSense, and unbound in recursive mode routed through Nord has worked for ~8 years prior, so I'll definitely be on the lookout for any information about what they may have changed.
-
@TheNarc well its possible that that the roots are blocking - but its a weird block to send back servfail - if I was going to block someone from talking to me, I would block them from even opening the connection. I wouldn't let them talk to me and then send them back - sorry buddy can't look that up ;) ie servfail.
But maybe its something they do internal and not at the edge - were they jsut say hey don't answer stuff for these IPs.. But you would think that would send back a refused..
But also you sure they are not just redirecting dns - so you think you got an answer from google, but you really got an answer from their servers? Did you try a directed query to 1.2.3.4? If that answers - its a smoking gun that dns is being redirect.. Because 1.2.3.4 doesn't answer dns.. There are a few other ways to check for redirection as well - but that is very quick easy test.
If that doesn't answer doesn't mean they are not redirecting, there are other ways to check for redirection. They might only be redirecting queries to specific dns, etc. vs all port 53..
You can do a query to an authoritative server and look for the aa flag in the response, or you can check the ttl, if you do not get back the full ttl when you talk to the authoritative server, that is another smoking gun you were redirected and got something from cache. If your talking to the authoritative ns, it will always send the full ttl. If you ask it again and get a lower ttl, then that was pulled from cache and you didn't get the answer from the authoritative server.
-
@johnpoz Quick and easy tests ftw, sorry for neglecting to see that request earlier. It would appear that you're correct!
$ dig @1.2.3.4 netgate.com ; <<>> DiG 9.18.24-1-Debian <<>> @1.2.3.4 netgate.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10891 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;netgate.com. IN A ;; ANSWER SECTION: netgate.com. 60 IN A 199.60.103.4 netgate.com. 60 IN A 199.60.103.104 ;; Query time: 109 msec ;; SERVER: 1.2.3.4#53(1.2.3.4) (UDP) ;; WHEN: Fri Mar 08 16:39:50 EST 2024 ;; MSG SIZE rcvd: 72
That seems like pretty solid evidence of a "security enhancement" courtesy of Nord. Thank you for suggesting that test, wish I'd thought of it!
-
@TheNarc yeah they are redirecting your dns - for a FACT.. because 1.2.3.4 isn't doing dns..
$ dig @1.2.3.4 netgate.com ; <<>> DiG 9.16.48 <<>> @1.2.3.4 netgate.com ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached
I edited and added a few other ways to spot redirection.
here is example of another test.. Notice the aa in the flags when I talked directly to one of the authoritative ns for netgate
There are lots of ways to spot redirection.. Also another way is doing query to some server you know is X ms away, and getting a response that is no way possible.. you query abc ns on the other side of the planet like 200ms away, and you get a response in 10 ms or something - you know for sure you didn't actually talk to that ns on the other side of the planet ;)
edit:
also notice the recursion requested but not available - pretty much all authoritative ns, would not allow recursion - so you should see that in the response. At least anyone setup correctly would not do recursion, auth severs shouldn't be allow for recursion. So if you ask an aa server some domain and you don't see that.. Either they are misconfigured and allowing it - or you have been redirected to some other server that is doing recursion. -
@johnpoz Great information, thanks again. And can doubly (triply?) confirm by looking for the other "tells" you point out. No aa flag and a query time that drops from 258ms to 42ms when querying a root server twice in a row (most output excluded for brevity):
dig @192.33.4.12 arstechnica.com ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; Query time: 258 msec dig @192.33.4.12 arstechnica.com ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; Query time: 42 msec
-
@TheNarc yeah - unless your vpn can change the laws of physics and get you there faster ;) or you normal path is horrible ;) that is pretty good proof that you were redirected..
But that answer has been redirected for another tell.. The root server you queried "c.root-servers.net" wouldn't answer such a query in the first place ;)
He would send you back this
$ dig @c.root-servers.net arstechnica.com ; <<>> DiG 9.16.48 <<>> @c.root-servers.net arstechnica.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42743 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 6f21d9186b8310cf0100000065eb8b15aabe1bc77d5654a3 (good) ;; QUESTION SECTION: ;arstechnica.com. IN A ;; AUTHORITY SECTION: com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. ;; ADDITIONAL SECTION: m.gtld-servers.net. 172800 IN A 192.55.83.30 l.gtld-servers.net. 172800 IN A 192.41.162.30 k.gtld-servers.net. 172800 IN A 192.52.178.30 j.gtld-servers.net. 172800 IN A 192.48.79.30 i.gtld-servers.net. 172800 IN A 192.43.172.30 h.gtld-servers.net. 172800 IN A 192.54.112.30 g.gtld-servers.net. 172800 IN A 192.42.93.30 f.gtld-servers.net. 172800 IN A 192.35.51.30 e.gtld-servers.net. 172800 IN A 192.12.94.30 d.gtld-servers.net. 172800 IN A 192.31.80.30 c.gtld-servers.net. 172800 IN A 192.26.92.30 b.gtld-servers.net. 172800 IN A 192.33.14.30 a.gtld-servers.net. 172800 IN A 192.5.6.30 m.gtld-servers.net. 172800 IN AAAA 2001:501:b1f9::30 l.gtld-servers.net. 172800 IN AAAA 2001:500:d937::30 k.gtld-servers.net. 172800 IN AAAA 2001:503:d2d::30 j.gtld-servers.net. 172800 IN AAAA 2001:502:7094::30 i.gtld-servers.net. 172800 IN AAAA 2001:503:39c1::30 h.gtld-servers.net. 172800 IN AAAA 2001:502:8cc::30 g.gtld-servers.net. 172800 IN AAAA 2001:503:eea3::30 f.gtld-servers.net. 172800 IN AAAA 2001:503:d414::30 e.gtld-servers.net. 172800 IN AAAA 2001:502:1ca1::30 d.gtld-servers.net. 172800 IN AAAA 2001:500:856e::30 c.gtld-servers.net. 172800 IN AAAA 2001:503:83eb::30 b.gtld-servers.net. 172800 IN AAAA 2001:503:231d::2:30 a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30 ;; Query time: 14 msec ;; SERVER: 192.33.4.12#53(192.33.4.12) ;; WHEN: Fri Mar 08 16:03:02 Central Standard Time 2024 ;; MSG SIZE rcvd: 868
Saying hey buddy I don't answer for those, here is where you can ask for that, and just answer the NS for .com in what you answer.
The root servers only answer for the NSs for the TLDs.. they won't answer for anything else..
-
@TheNarc you going to ask Nord wtf they are doing? Curious what they respond with.. ;)
-
OP here. Late to check back because with all of the troubleshooting that I was doing (and settings changes), my pfSense box became FUBAR. Oddly, even a restore didn't fix it. So I've been completely rebuilding it.
Happy to see a lot of discussion. Like other people in here, I am also using NordVPN. And just like @wfx, I also tried using Mullvad (lol) without any luck, which is why I was leaning away from VPN provider being the issue.
HOWEVER, unlike the other people in here, I did do a
dig @8.8.8.8 netgate.com
, which still failed to resolve. With that said, I was deep into modifying settings, so there may have been a DNS tweak that messed with things. I've almost got my box rebuilt, so I'll test over the weekend.Side Question: Since both NordVPN and mullvad seem to be greedy for DNS queries, are there any VPN providers that folks are having luck with that they would recommend?
-
@Generally-Lost In the past I have used Air VPN. If you trust your ISP check to see if they provide one. Mine does with my service.
-
@johnpoz I will yes. Although I walked away from the computer and won't be back for an hour or two (on a phone now) and also want to run another test. Because as so often happens, while on the treadmill and left to my own thoughts, I realized I'm stupid once again. Someone is absolutely redirecting my DNS, but that person is me. Or at least I'm first in line. I've got a port forward to prevent LAN clients from going around the VPN, and wouldn't you know it, I forgot about it until just now. So Nord may well be redirecting me too, but all my test proved was that I'm an idiot, and I'm redirecting me. As soon as I can ill disable my redirect rule, test again, and post results. Should be able to within two hours.
-
@TheNarc that is actually good info to know.. Since maybe others enabled redirection? So yeah if your redirecting yourself - you would see the same sort of tells of redirection your provider or vpn service was doing it..
-
@Generally-Lost said in DNS suddenly broken [on some VLANs]:
dig @8.8.8.8 netgate.com, which still failed to resolv
Well a directed query like that shouldn't fail - unless you were blocking it yourself, or upstream they were blocking it.. That would have nothing to do with the root servers blocking anything for sure.. Because that query asks 8.8.8.8 hey look this up for me, or hand me whatever you have in your cache for it.. So you get a timeout, a nx a servfail a refused?
-
This post is deleted! -
@johnpoz Okay, just tested with my redirection port forward rules disabled, and the results were the same. So I think it's fairly conclusive that Nord decided to start hijacking all DNS without notifying any of their customers, but at least it seems like we know what's going on.
Edit: One other (maybe?) interesting data point. I was expecting that if Nord is redirecting all DNS that if I set my system DNS server to 1.2.3.4 and left unbound in forwarding mode, DNS resolution would still work. But it doesn't; that configuration yields SERVFAIL. Likely a misconception or misunderstanding on my part, but does that still track with the theory here?
-
OK, got my new box up and running. I switched to AirVPN, and the DNS resolutions are working flawlessly. Shout out to @Uglybrian for the recommendation.
-
@TheNarc said in DNS suddenly broken [on some VLANs]:
left unbound in forwarding mode
You have turned off dnssec right - forwarding and dnssec is combination for failure.
But sure in general concept, if they are intercepting traffic which if you are doing a directed query to 1.2.3.4 and just asking for www.google.co you should get an answer there is redirection going on for sure. Then yeah it should work.. In theory, but we are not sure exactly what they are doing.. You would have to figure out where is the servfail coming from, if you are doing dnssec and you forward and that gets messed up it could be unbound saying yeah hey buddy that stuff isn't passing dnssec, servfail.
For example - this fails dnssec, which I am resolving and using
What all kinds of weirdness can happen when you are being redirected and also asking for dnssec - I have never gone down that rabbit hole to the end to see exactly where it fails, etc.. But its a bad combination..
You could sniff on pfsense and see what is actually being asked, and what gets answered or doesn't get answered, or what might get answered in the dnssec chain that is failing to why your unbound being told to do dnssec when it forwards might say it failed with a servfail.. Which is generic answer and doesn't say exactly what failed, just that something did.
Are you forwarding to a resolver, or just another forwarder? there are quite a few variables at play.. And we don't know what exactly they are doing and how they are doing it.. But doing a directed query to 1.2.3.4 and getting an answer should work unless someone is messing with the query somewhere in the path. Because that IP doesn't answer dns.. Try some other IP you know for a fact doesn't answer dns etc..
Here is an IP for www.netgate.com, its not answer to dns
dig @199.60.103.30 www.google.com ;; communications error to 199.60.103.30#53: timed out ;; communications error to 199.60.103.30#53: timed out ;; communications error to 199.60.103.30#53: timed out ; <<>> DiG 9.18.24-1+ubuntu22.04.1+deb.sury.org+1-Ubuntu <<>> @199.60.103.30 www.google.com ; (1 server found) ;; global options: +cmd ;; no servers could be reached
Do you get a query asking them through your vpn?