DNS suddenly broken [on some VLANs]
-
Hi @Generally-Lost ,
I've got the exact same problem. Went to bed Monday night. Had been online all day and everything worked fine. My pfSense configuration has been unchanged for a long time (at least since pfSense 2.5 was released), yet, when I got back to my PC on Tuesday morning, domain name resolution was no longer working. The issue affects all devices connected to the network (via ethernet and WiFi) regardless of which VLAN is used. Traffic goes through the VPN tunnels fine (like torrent or IP pings).
I have been trying to resolve the issue since then but, so far, no cigar! I have contacted NordVPN support but, unfortunately, they have so far proved to be pretty useless.
My setup is as follows:
- Dual WAN: one DSL connection and one 4G connection in a fail-over group
- Triple VPN: all with NordVPN. Setup in a fail-over group as well. DNS servers used are the NordVPN servers.
- All DNS requests are routed through the VPN tunnels only.
- Multi-VLANs
Initially, I thought there was a problem with the NordVPN service however what I tried so far seems to indicate it is not.
I have tried the following:
- Contacted my ISP to confirm there were no issues with my DSL connection. Test came up OK. Also replaced my DSL modem to completely rule out a DSL connection problem.
- Connected my laptop via ethernet directly to DSL modem, DNS works fine so hardware problem can be ruled out I think.
- Tried different NordVPN servers. No change.
- Configured another VPN tunnel from Mullvad and removed all NordVPN clients. Thought this was gonna fix it but no! Domain names still not resolving.
- Disconnected my DSL cable to force fail-over to 4G. The DNS issues are still there. 4G ISP is not the same as DSL ISP.
- Checked if I could ping domain names. Not working.
- Checked if I could ping IPs. OK.
- Checked if I could ping DNS servers' IPs. OK.
- Checked if local names resolved. OK.
- Tried pfSense DNS lookup. Didn't expect that but also OK.
- Tried without DNSSEC. No change.
- Although my firewall rules have been unchanged and working for years, I checked the firewall logs to confirmed DNS requests were going through. OK.
- Ran packet capture on one of the VPN interfaces when running nslookup on one of my network clients, I can see DNS traffic going back and forth (the capture doesn't show whether the request was successful or not though).
- Tried with pfBlocker turned off. No change.
- In desperation, I even turned of all my WiFi APs! No change.
- I ran the sockstat -4 | grep 'unbound' command suggested by @Gertjan . localhost and all vpn tunnels are listed.
Although I do not see how ISPs could block DNS requests sent through the VPN tunnels, I am starting to wonder if they might have something to do with it as I am at a complete loss to explain why a config that has worked for years is suddenly not! Working network configs don't suddenly stop working without something changing!
Also, been wondering if location might be a factor? I am located in Australia. Where are you located @Generally-Lost ?
Any suggestion about the possible cause or what could be tried to narrow things down would be greatly appreciated. I am totally pulling my hair out on this one!!!
-
@wfx said in DNS suddenly broken [on some VLANs]:
Although I do not see how ISPs could block DNS requests sent through the VPN tunnels
They can't. All they see is a UDP or TCP TLS stream to the VPN server. They can't see (decode) the data payload, the actual 'Internet' traffic, whether this is a DNS request, a mail or a web site or whatever.
You can make unbound very verbose by selecting :
Now you'll see all DNS requests from pfSense and your LAN client and the forward/resolve process = traffic going out and answers coming back.
Remember to it back to 1 when done, this will generate huge loge files.
-
@Gertjan Thanks for the reply.
@Gertjan said in DNS suddenly broken [on some VLANs]:
They can't. All they see is a UDP or TCP TLS stream to the VPN server. They can't see (decode) the data payload, the actual 'Internet' traffic, whether this is a DNS request, a mail or a web site or whatever.
Yeah, that's what I thought. I'm just grasping at straws! What's got me is that the issue is occurring across 2 different VPN providers without me making any changes to the pfSense config!
I set the unbound log levels to level 3. Here are the last 1500 lines of the unbound log: unbound_logs.txt. I can see where the failures occur however I'm no DNS expert so I don't understand what the debug lines actually mean. Hopefully, this makes a lot more sense to you?
-
many :
Mar 8 16:24:44 unbound 18848 [18848:2] info: reply from <.> 193.0.14.129#53
Mar 8 16:24:44 unbound 18848 [18848:2] info: query response was THROWAWAYwhere "193.0.14.129" is special : it's one of these unique 13 : DNS root servers - you can see unbound uses some of these 13 known DNS root servers.
The thing is : they don't answer / nothing comes back.Yeah, looks like your can't reach any DNS root servers.... that's bad for DNS business.
I don't think these '13' are blocking your (VPN) IP - if something is blocking, its more close to you : I put my money on 'your VPN' .... -
@Gertjan I initially thought as well this had to be because of something NordVPN did. That is why I created an account with Mullvad. I wanted to be able to say to NordVPN support that the problem was with them only. I didn't expect I would have the same problem with Mullvad! Is there any chance the root servers are not responding because of something incorrectly setup in pfSense?
Edit: Since root DNS servers are not answering, I would have thought a DNS lookup done with the diagnostics tools of pfSense would also fail since all DNS requests are sent through the VPN. However, those lookups succeed. How is that possible?
-
@wfx said in DNS suddenly broken [on some VLANs]:
all DNS requests are sent through the VPN.
You sure about that.. If you are policy routing out your vpn, then no all traffic will not go out the vpn. And even if your not queries from the wan interface made by unbound wouldn't go out the vpn..
A simple sniff would validate if pfsense unbound is sending traffic out the vpn or just your normal wan.
Why not just do a directed query to the roots.. do you get an answer? from your client that is going out the vpn? And then via query that doesn't go through the vpn..
$ dig @192.58.128.30 net. ; <<>> DiG 9.16.48 <<>> @192.58.128.30 net. ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32761 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1472 ;; QUESTION SECTION: ;net. IN A ;; AUTHORITY SECTION: net. 172800 IN NS a.gtld-servers.net. net. 172800 IN NS b.gtld-servers.net. net. 172800 IN NS c.gtld-servers.net. net. 172800 IN NS d.gtld-servers.net. net. 172800 IN NS e.gtld-servers.net. net. 172800 IN NS f.gtld-servers.net. net. 172800 IN NS g.gtld-servers.net. net. 172800 IN NS h.gtld-servers.net. net. 172800 IN NS i.gtld-servers.net. net. 172800 IN NS j.gtld-servers.net. net. 172800 IN NS k.gtld-servers.net. net. 172800 IN NS l.gtld-servers.net. net. 172800 IN NS m.gtld-servers.net. ;; ADDITIONAL SECTION: a.gtld-servers.net. 172800 IN A 192.5.6.30 b.gtld-servers.net. 172800 IN A 192.33.14.30 c.gtld-servers.net. 172800 IN A 192.26.92.30 d.gtld-servers.net. 172800 IN A 192.31.80.30 e.gtld-servers.net. 172800 IN A 192.12.94.30 f.gtld-servers.net. 172800 IN A 192.35.51.30 g.gtld-servers.net. 172800 IN A 192.42.93.30 h.gtld-servers.net. 172800 IN A 192.54.112.30 i.gtld-servers.net. 172800 IN A 192.43.172.30 j.gtld-servers.net. 172800 IN A 192.48.79.30 k.gtld-servers.net. 172800 IN A 192.52.178.30 l.gtld-servers.net. 172800 IN A 192.41.162.30 m.gtld-servers.net. 172800 IN A 192.55.83.30 a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30 b.gtld-servers.net. 172800 IN AAAA 2001:503:231d::2:30 c.gtld-servers.net. 172800 IN AAAA 2001:503:83eb::30 d.gtld-servers.net. 172800 IN AAAA 2001:500:856e::30 e.gtld-servers.net. 172800 IN AAAA 2001:502:1ca1::30 f.gtld-servers.net. 172800 IN AAAA 2001:503:d414::30 g.gtld-servers.net. 172800 IN AAAA 2001:503:eea3::30 h.gtld-servers.net. 172800 IN AAAA 2001:502:8cc::30 i.gtld-servers.net. 172800 IN AAAA 2001:503:39c1::30 j.gtld-servers.net. 172800 IN AAAA 2001:502:7094::30 k.gtld-servers.net. 172800 IN AAAA 2001:503:d2d::30 l.gtld-servers.net. 172800 IN AAAA 2001:500:d937::30 m.gtld-servers.net. 172800 IN AAAA 2001:501:b1f9::30 ;; Query time: 9 msec ;; SERVER: 192.58.128.30#53(192.58.128.30) ;; WHEN: Fri Mar 08 06:46:27 Central Standard Time 2024 ;; MSG SIZE rcvd: 825
That was a directed query to root server, asking for the ns of .net tld
-
Hi @johnpoz,
Pretty sure. The only outgoing interfaces selected in the resolver are the VPNs and the firewall rules only allow traffic to go out through the VPN group.
I ran the dig @192.58.128.30 net. command. Here is the result:
; <<>> DiG 9.18.24 <<>> @192.58.128.30 net. ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 55994 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1424 ;; QUESTION SECTION: ;net. IN A ;; Query time: 603 msec ;; SERVER: 192.58.128.30#53(192.58.128.30) (UDP) ;; WHEN: Fri Mar 08 23:24:57 AWST 2024 ;; MSG SIZE rcvd: 32
No authority and additional sections so I guess that means no answer.
Thought further about why the pfSense DNS lookup works. It gives me results showing the DNS servers configured in System -> General Setup so, if I understand correctly, that means it forwards the request to those servers even though the resolver is not configured as a forwarder.
-
@wfx said in DNS suddenly broken [on some VLANs]:
that means it forwards the request to those servers even though the resolver is not configured as a forwarder.
Correct.. If you didn't want pfsense to do that, then you would set it not to use remote..
If pfsense doesn't get an answer from loopback.. It could try other servers listed, just like your windows client would if you had 2 dns servers listed.
Is it possible roots are blocking - sure ok maybe.. If some IPs were considered attacking or something. But I wouldn't think they would just say hey this range of IPs are vpn IPs - block them..
Its more likely the vpn services, says use are dns.. And we will force you to by not allowing other dns.. Can you query 8.8.8.8 through your vpn? If they were going to the trouble to block roots, you would think they would block all the other major dns providers..
edit: you got a servfail back.. hmmm try doing a query to say 1.2.3.4 for like www.google.com - if you get an answer that is smoking gun that your dns is being intercepted..
-
@johnpoz Just chiming in here because I'm observing the same issue, which began this Monday around 6PM eastern for me, on two physically disparate pfSense machines both with unbound configured to use only NordVPN interfaces for outgoing. Completely agree with your assessment that it would not make obvious sense for Nord to block only the root servers and not other DNS, although my workaround so far has been to put unbound into forwarding mode with the system DNS servers set to Google, Cloudflare, and Quad9. I have the resolution behavior set to use 127.0.0.1 and ignore remote, and unbound's outgoing interfaces are still set to only Nord. That's working just fine, and I have verified by examining the states that the queries are being routed through the VPN.
So, not sure what's going on. I tried switching unbound back to recursive mode today to see if it was a transient issue, but went right back to failing (with SERVFAIL the same as @wfx). I cam also observe that if I just switch the outgoing interfaces from Nord to WAN it works fine. Whatever it is, it definitely has to do with Nord, and yet they're also definitely not just blocking everything but their own DNS servers.
I also realize these forums are not for diagnosing VPN provider issues :) Just wanted to provide some corroborating evidence, but not sure if it's helpful at all.
Quick edit just to be comprehensive: The onset of this issue for me did not coincide with any config changes to pfSense, and unbound in recursive mode routed through Nord has worked for ~8 years prior, so I'll definitely be on the lookout for any information about what they may have changed.
-
@TheNarc well its possible that that the roots are blocking - but its a weird block to send back servfail - if I was going to block someone from talking to me, I would block them from even opening the connection. I wouldn't let them talk to me and then send them back - sorry buddy can't look that up ;) ie servfail.
But maybe its something they do internal and not at the edge - were they jsut say hey don't answer stuff for these IPs.. But you would think that would send back a refused..
But also you sure they are not just redirecting dns - so you think you got an answer from google, but you really got an answer from their servers? Did you try a directed query to 1.2.3.4? If that answers - its a smoking gun that dns is being redirect.. Because 1.2.3.4 doesn't answer dns.. There are a few other ways to check for redirection as well - but that is very quick easy test.
If that doesn't answer doesn't mean they are not redirecting, there are other ways to check for redirection. They might only be redirecting queries to specific dns, etc. vs all port 53..
You can do a query to an authoritative server and look for the aa flag in the response, or you can check the ttl, if you do not get back the full ttl when you talk to the authoritative server, that is another smoking gun you were redirected and got something from cache. If your talking to the authoritative ns, it will always send the full ttl. If you ask it again and get a lower ttl, then that was pulled from cache and you didn't get the answer from the authoritative server.
-
@johnpoz Quick and easy tests ftw, sorry for neglecting to see that request earlier. It would appear that you're correct!
$ dig @1.2.3.4 netgate.com ; <<>> DiG 9.18.24-1-Debian <<>> @1.2.3.4 netgate.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10891 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;netgate.com. IN A ;; ANSWER SECTION: netgate.com. 60 IN A 199.60.103.4 netgate.com. 60 IN A 199.60.103.104 ;; Query time: 109 msec ;; SERVER: 1.2.3.4#53(1.2.3.4) (UDP) ;; WHEN: Fri Mar 08 16:39:50 EST 2024 ;; MSG SIZE rcvd: 72
That seems like pretty solid evidence of a "security enhancement" courtesy of Nord. Thank you for suggesting that test, wish I'd thought of it!
-
@TheNarc yeah they are redirecting your dns - for a FACT.. because 1.2.3.4 isn't doing dns..
$ dig @1.2.3.4 netgate.com ; <<>> DiG 9.16.48 <<>> @1.2.3.4 netgate.com ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached
I edited and added a few other ways to spot redirection.
here is example of another test.. Notice the aa in the flags when I talked directly to one of the authoritative ns for netgate
There are lots of ways to spot redirection.. Also another way is doing query to some server you know is X ms away, and getting a response that is no way possible.. you query abc ns on the other side of the planet like 200ms away, and you get a response in 10 ms or something - you know for sure you didn't actually talk to that ns on the other side of the planet ;)
edit:
also notice the recursion requested but not available - pretty much all authoritative ns, would not allow recursion - so you should see that in the response. At least anyone setup correctly would not do recursion, auth severs shouldn't be allow for recursion. So if you ask an aa server some domain and you don't see that.. Either they are misconfigured and allowing it - or you have been redirected to some other server that is doing recursion. -
@johnpoz Great information, thanks again. And can doubly (triply?) confirm by looking for the other "tells" you point out. No aa flag and a query time that drops from 258ms to 42ms when querying a root server twice in a row (most output excluded for brevity):
dig @192.33.4.12 arstechnica.com ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; Query time: 258 msec dig @192.33.4.12 arstechnica.com ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; Query time: 42 msec
-
@TheNarc yeah - unless your vpn can change the laws of physics and get you there faster ;) or you normal path is horrible ;) that is pretty good proof that you were redirected..
But that answer has been redirected for another tell.. The root server you queried "c.root-servers.net" wouldn't answer such a query in the first place ;)
He would send you back this
$ dig @c.root-servers.net arstechnica.com ; <<>> DiG 9.16.48 <<>> @c.root-servers.net arstechnica.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42743 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 27 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; COOKIE: 6f21d9186b8310cf0100000065eb8b15aabe1bc77d5654a3 (good) ;; QUESTION SECTION: ;arstechnica.com. IN A ;; AUTHORITY SECTION: com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. ;; ADDITIONAL SECTION: m.gtld-servers.net. 172800 IN A 192.55.83.30 l.gtld-servers.net. 172800 IN A 192.41.162.30 k.gtld-servers.net. 172800 IN A 192.52.178.30 j.gtld-servers.net. 172800 IN A 192.48.79.30 i.gtld-servers.net. 172800 IN A 192.43.172.30 h.gtld-servers.net. 172800 IN A 192.54.112.30 g.gtld-servers.net. 172800 IN A 192.42.93.30 f.gtld-servers.net. 172800 IN A 192.35.51.30 e.gtld-servers.net. 172800 IN A 192.12.94.30 d.gtld-servers.net. 172800 IN A 192.31.80.30 c.gtld-servers.net. 172800 IN A 192.26.92.30 b.gtld-servers.net. 172800 IN A 192.33.14.30 a.gtld-servers.net. 172800 IN A 192.5.6.30 m.gtld-servers.net. 172800 IN AAAA 2001:501:b1f9::30 l.gtld-servers.net. 172800 IN AAAA 2001:500:d937::30 k.gtld-servers.net. 172800 IN AAAA 2001:503:d2d::30 j.gtld-servers.net. 172800 IN AAAA 2001:502:7094::30 i.gtld-servers.net. 172800 IN AAAA 2001:503:39c1::30 h.gtld-servers.net. 172800 IN AAAA 2001:502:8cc::30 g.gtld-servers.net. 172800 IN AAAA 2001:503:eea3::30 f.gtld-servers.net. 172800 IN AAAA 2001:503:d414::30 e.gtld-servers.net. 172800 IN AAAA 2001:502:1ca1::30 d.gtld-servers.net. 172800 IN AAAA 2001:500:856e::30 c.gtld-servers.net. 172800 IN AAAA 2001:503:83eb::30 b.gtld-servers.net. 172800 IN AAAA 2001:503:231d::2:30 a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30 ;; Query time: 14 msec ;; SERVER: 192.33.4.12#53(192.33.4.12) ;; WHEN: Fri Mar 08 16:03:02 Central Standard Time 2024 ;; MSG SIZE rcvd: 868
Saying hey buddy I don't answer for those, here is where you can ask for that, and just answer the NS for .com in what you answer.
The root servers only answer for the NSs for the TLDs.. they won't answer for anything else..
-
@TheNarc you going to ask Nord wtf they are doing? Curious what they respond with.. ;)
-
OP here. Late to check back because with all of the troubleshooting that I was doing (and settings changes), my pfSense box became FUBAR. Oddly, even a restore didn't fix it. So I've been completely rebuilding it.
Happy to see a lot of discussion. Like other people in here, I am also using NordVPN. And just like @wfx, I also tried using Mullvad (lol) without any luck, which is why I was leaning away from VPN provider being the issue.
HOWEVER, unlike the other people in here, I did do a
dig @8.8.8.8 netgate.com
, which still failed to resolve. With that said, I was deep into modifying settings, so there may have been a DNS tweak that messed with things. I've almost got my box rebuilt, so I'll test over the weekend.Side Question: Since both NordVPN and mullvad seem to be greedy for DNS queries, are there any VPN providers that folks are having luck with that they would recommend?
-
@Generally-Lost In the past I have used Air VPN. If you trust your ISP check to see if they provide one. Mine does with my service.
-
@johnpoz I will yes. Although I walked away from the computer and won't be back for an hour or two (on a phone now) and also want to run another test. Because as so often happens, while on the treadmill and left to my own thoughts, I realized I'm stupid once again. Someone is absolutely redirecting my DNS, but that person is me. Or at least I'm first in line. I've got a port forward to prevent LAN clients from going around the VPN, and wouldn't you know it, I forgot about it until just now. So Nord may well be redirecting me too, but all my test proved was that I'm an idiot, and I'm redirecting me. As soon as I can ill disable my redirect rule, test again, and post results. Should be able to within two hours.
-
@TheNarc that is actually good info to know.. Since maybe others enabled redirection? So yeah if your redirecting yourself - you would see the same sort of tells of redirection your provider or vpn service was doing it..
-
@Generally-Lost said in DNS suddenly broken [on some VLANs]:
dig @8.8.8.8 netgate.com, which still failed to resolv
Well a directed query like that shouldn't fail - unless you were blocking it yourself, or upstream they were blocking it.. That would have nothing to do with the root servers blocking anything for sure.. Because that query asks 8.8.8.8 hey look this up for me, or hand me whatever you have in your cache for it.. So you get a timeout, a nx a servfail a refused?