DNS Resolver Timeouts
-
I have two pfsense boxes in two different places with the same ISP. Both sites are connected by VPN.
- Site A has a static public IP subscription (WAN assigned a public IP). Clients use pihole as a DNS server and pihole forwards to the pfsense DNS resolver.
- Site B is behind a CGNAT (WAN assigned a private IP). It has the same pihole setup as site A.
Site A's DNS resolver (no forwarding) works perfectly. Very seldom do I see timeouts in Status -> DNS Resolver
Site B's DNS resolver, however, is experiencing tons of timeouts but these happens randomly within a day. You can immediately notice this when Internet access is halted for all devices in the network. In the past ( a few months ago), using unbound and forwarding to, say, 8.8.8.8 or 1.1.1.1 solves the problem but now even with forwarding enabled I get the same timeout issue. When doing a packet capture on the WAN interface, I see packets from the WAN IP to DNS servers but don't see replies. For example:
https://www.dropbox.com/s/pxkgn3usii38qrn/packetcapture%281%29.cap?dl=0 -> pcap when unbound is set as a resolver. Try filtering the "205.251.194.154" ip address and it should show no replies
https://www.dropbox.com/s/ile9lc2wmfjvmhi/packetcapture%282%29.cap?dl=0 -> pcap when unbound is set to forward to 1.1.1.1
This has been happening for almost a year now and my ISP's solution is to subscribe to a static IP subscription which does not make sense! I decided to create a new thread but this is basically a continuation of my thread here.
When I forward to the DNS server in site A (on the other end of the tunnel), I don't see these timeouts but DNS resolution sometimes fails. I'm guessing it's because DNS through a tunnel is not that reliable?
Just for the heck of it, I tried using dnsmasq overnight and it seems to be better. Although I cannot really confirm because there are no stats for dnsmasq unlike what you have in unbound in Status -> DNS Resolver. It's just that dnsmasq seems to have not acted up during my testing. Could just be a coincidence and that I need to wait more to conclude more accurately, I don't know.
At this point, what is my next best step? I'm really tired talking to my ISP's customer support agents as they are pretty useless. Their network team does not even want to talk directly to customers, it's insane. I still want to continue troubleshooting this to have a more conclusive idea on what's happening. Any tips please?
Thanks.
-
@kevindd992002 said in DNS Resolver Timeouts:
and pihole forwards to the pfsense DNS resolver.
Almost the same setup everywhere (in our sys) and there is no problem.
WITHOUT Pi-hole! +++edit: of course!I know everyone is stoning me now, but why don’t we forget this Pi - hole stuff (on a serious NGFW) anymore... = pfBlocker.......tatatta
(I think it's also run on a crappy container and * or on a RaspberryPI)@kevindd992002 "I'm guessing it's because DNS through a tunnel is not that reliable?"
so this surprised me and why?
+++edit2:
Seriously, Pi-hole also has a place, but pfSense offers it all in one, pls. take advantage -
@daddygo Let's not get into the pihole vs. pfblocker debate, please. Pihole is not the issue here. It has been serving me well for the past few months. I have no complains.
As for your surprise regarding the DNS through a tunnel not being reliable, I don't know the answer to that. It's exactly why I said "I'm guessing" as what I experience is that even though there are no timeouts when using the DNS through the tunnel interface, I do get DNS resolution error when browsing the Internet. Refreshing the page multiple times usually solves the issue.
On the same note, I've been running dnsmasq in pfsense now for a couple of hours without any issues. So I'm not sure what the difference between dnsmasq and unbound with forwarding. Why am I seeing a difference in performance when both do forwarding? My ultimate goal here is to use unbound as a resolver but I'm having major timeout problems when doing that.
-
@kevindd992002 said in DNS Resolver Timeouts:
Let's not get into the pihole vs. pfblocker debate,
OK
Believe me there should be no problem with DNS through the tunnel, if it were, many of us would sweat here...
Have you ever tried to analyze the transmission parameters of your tunnel?
and yes I am definitely a fan of UNBOND use and all things that are given and can be found in an environment (NGFW), because they are coordinated with each other
-
@daddygo said in DNS Resolver Timeouts:
@kevindd992002 said in DNS Resolver Timeouts:
Let's not get into the pihole vs. pfblocker debate,
OK
Believe me there should be no problem with DNS through the tunnel, if it were, many of us would sweat here...
Have you ever tried to analyze the transmission parameters of your tunnel?
and yes I am definitely a fan of UNBOND use and all things that are given and can be found in an environment (NGFW), because they are coordinated with each other
Right, that's what I thought too. There shouldn't be any issues with the tunnel for whatever type of traffic.
What are the "transmission parameters"? Sorry, I'm not an expert with VPN but I know the basics. How do you propose I start with my analysis?
Yes, I'm also a fan of unbound. I want to maintain my own DNS Resolver and I don't relying/forwarding client DNS queries to other DNS servers but I really don't have a choice at this point because of this very issue.
I still don't understand why dnsmasq seems to be working perfectly fine though. I'm not experiencing any timeouts with it so far (a few days into the testing now).
-
@kevindd992002 said in DNS Resolver Timeouts:
How do you propose I start with my analysis?
I don't know how far apart the endpoints are and how many ISPs are in this ...
but I would also take short- and longer-term measurements (on tunnel network):
short: iperf3
https://docs.netgate.com/pfsense/en/latest/packages/iperf.htmlfor long-term discipline I use this, free for 5 endpoints:
https://emcosoftware.com/ping-monitor
(I set it up and let it run for hours)although I still suspect it will be a different issue than the tunnel itself
DNS:
this pairing is best in your case
https://github.com/synackray/dns-load-generator
https://www.wireshark.org/ -
@daddygo said in DNS Resolver Timeouts:
@kevindd992002 said in DNS Resolver Timeouts:
How do you propose I start with my analysis?
I don't know how far apart the endpoints are and how many ISPs are in this ...
but I would also take short- and longer-term measurements (on tunnel network):
short: iperf3
https://docs.netgate.com/pfsense/en/latest/packages/iperf.htmlfor long-term discipline I use this, free for 5 endpoints:
https://emcosoftware.com/ping-monitor
(I set it up and let it run for hours)although I still suspect it will be a different issue than the tunnel itself
DNS:
this pairing is best in your case
https://github.com/synackray/dns-load-generator
https://www.wireshark.org/Yeah, I don't have issues with using iperf3 across the tunnel. Pfsense also has gateway monitoring for the IPsec tunnel (routed VTI) and long term ping monitoring seems stable.
Using the DNS server across the tunnel is just my last resort. My real goal here is for each site to use their own DNS resolvers. Do you have any ideas why I'm getting a lot of timeouts when DNS resolver is enabled in site B? Can an ISP block DNS requests to external DNS servers if it's a resolver (as opposed to a forwarder)?
-
@kevindd992002 said in DNS Resolver Timeouts:
Do you have any ideas why I'm getting a lot of timeouts when DNS resolver is enabled in site B?
So, honestly not, because there is little info - here Wiresark can help solve this
@kevindd992002 "Can an ISP block DNS requests to external DNS servers if it's a resolver (as opposed to a forwarder)?"
it would be very annoying and unfair (from ISP, but we have already seen a crow on a stick)
a packet capture also shows this
-
@daddygo said in DNS Resolver Timeouts:
@kevindd992002 said in DNS Resolver Timeouts:
Do you have any ideas why I'm getting a lot of timeouts when DNS resolver is enabled in site B?
So, honestly not, because there is little info - here Wiresark can help solve this
@kevindd992002 "Can an ISP block DNS requests to external DNS servers if it's a resolver (as opposed to a forwarder)?"
it would be very annoying and unfair (from ISP, but we have already seen a crow on a stick)
a packet capture also shows this
What can Wireshark provide that the packet capture (from pfsense's tcpdump) I already provided don't? If you check the OP again, you'll see that I have packet captures there.
-
Might not be related.
But when i had unbound "DNS issues".
I had "ticked" register DHCP Leases in unbound
That made unbound restart every time a DHCP event happened, and made my system unusable.
Untick that DHCP Registration if set
/Bingo
-
@bingo600 Yeah, tried that already, didn't make a difference. In Site A where unbound is perfectly working, I have that checked and the DHCP service restart is very fast that it is barely noticeable. These are both for home sites and it's not like the lease of my few DHCP clients are always expiring.
-
@kevindd992002 said in DNS Resolver Timeouts:
What can Wireshark provide that the packet capture
you can see an online and / or real-time scan on the Wireshark screen - when you launch an action
do you pass this on site B?
(for this installation, for example, the DNS goes through a tunnel)translate this, of course, into your example (GW A site)
-
@daddygo said in DNS Resolver Timeouts:
@kevindd992002 said in DNS Resolver Timeouts:
What can Wireshark provide that the packet capture
you can see an online and / or real-time scan on the Wireshark screen - when you launch an action
do you pass this on page B?
(for this installation, for example, the DNS goes through a tunnel)translate this, of course, into your example (GW A site)
Sorry, what? What do you mean by "page B"?
Let's forget about the tunnel for now. Like I said, that is my last resort/workaround. Let's treat site B as an independent site without an S2S VPN. My goal here is to simply use unbound on site B as a resolver (not forwarder) without any issues.
-
@kevindd992002 said in DNS Resolver Timeouts:
Let's treat site B as an independent site without an S2S VPN
Okay then we misunderstand each other...
can you draw a quick diagram of what you want to achieve?
A site pfSense A
B site pfSense Bor exactly what
-
@daddygo said in DNS Resolver Timeouts:
@kevindd992002 said in DNS Resolver Timeouts:
Let's treat site B as an independent site without an S2S VPN
Okay then we misunderstand each other...
can you draw a quick diagram of what you want to achieve?
A site pfSense A
B site pfSense Bor exactly what
It looks like it, yes.
So I have two sites that are connected through IPsec VPN, yes, but I just gave that information here because it was one of the tests I had (using the DNS resolver on the far end of the tunnel).
Site A (main site)
- WAN interface has a public static IP
- no problems with being a DNS resolver (without forwarding)
Site B (remote site)
- WAN interface is assigned a private IP since it is behind a CGNAT
- when DNS resolver (without forwarding) is set, tons of timeouts are seen in Status -> DNS Resolver and the whole network is affected, browsing is very intermittent
- when DNS resolver (with forwarding to 1.1.1.1, or 8.8.8.8, or even to the ISP's own DNS servers) is set, same behavior, lots of timeouts. I must say though, that this was my workaround before like a few months ago and it worked. For some reason, it is also timing out these past few days I tested.
- when DNS Forwarder (dnsmasq) is enabled instead, everything is working properly. It's been almost two days without any issues.
- as soon as I go back to using DNS resolver (unbound), then the problem is immediately back
-
@kevindd992002 said in DNS Resolver Timeouts:
It looks like it, yes.
so I understand, so in terms of your question, it has nothing to do with A - B.
in summary:
- The pfSense installation which is used in point B, works with a timeout.... DNS, if UNBOUND is used
- in addition, it is behind CGNAT
Can you do a test with this for both conditions? (Unbound / Forwarder):
https://www.grc.com/dns/benchmark.htmFinaly, you can show UNBOUND settings such as:
-
You got it.
I have to get back to you after Christmas for that benchmark test (which I'm familiar with as I used it before). I'm physically at site A right now and while troubleshooting another issue with IPsec, I accidentally lost access to site B's pfsense and no one is physically there to undo what I did.
As for the settings, they are exactly the same with the unbound settings I have site A and here they are:
I don't have a DNS server in the DNS settings under General because I don't need one. I'm using unbound as a "resolver" so it queries the root hints directly. In the settings that you've shown, it looks like you're using unbound as a forwarder too, why?
-
How is this not the same exact problem you had before.. If you have a shit isp, then you have a shit isp..
Your previous thread showed loss on your isp.. If either of these sites its on that isp, or whatever isp they have is loosing packets.. Then yes you can have issue, be it dns or anything else.
Doesn't matter if you forward or tunnel or whatever.. If your isp sucks it sucks.. Nothing pfsense can do about that.
Previous you had sniffs showing traffic leaving your wan, with no answer.. There is nothing pfsense can do to fix that..
-
@kevindd992002 said in DNS Resolver Timeouts:
it looks like you're using unbound as a forwarder too, why?
Forwarding Mode to 1.1.1.1 = general tab
as I try to achieve more privacy and greater security
CloudFlare / 853 DoT
-
@daddygo said in DNS Resolver Timeouts:
as I try to achieve more privacy and greater security
Well that sure isn't doing anything about that..
-
@johnpoz said in DNS Resolver Timeouts:
Well that sure isn't doing anything about that..
I say I'm trying
at least I don't interrogate root servers through my own ISP, hihihihi
-
@johnpoz said in DNS Resolver Timeouts:
How is this not the same exact problem you had before.. If you have a shit isp, then you have a shit isp..
Your previous thread showed loss on your isp.. If either of these sites its on that isp, or whatever isp they have is loosing packets.. Then yes you can have issue, be it dns or anything else.
Doesn't matter if you forward or tunnel or whatever.. If your isp sucks it sucks.. Nothing pfsense can do about that.
Previous you had sniffs showing traffic leaving your wan, with no answer.. There is nothing pfsense can do to fix that..
Right, I just actually continued that old thread to this thread to make it "cleaner". The only new information I have now is that I tried with dnsmasq and it seems to have no timeouts. As to why, I don't know. But I was still having problems with unbound set as forwarder.
If you see my packet captures in the OP of this thread, it still does show traffic leaving the WAN and not getting any replies back. You're still right, I'm still pushing hard for my ISP to fix this shit, but what I don't understand is why dnsmasq seems to be working just fine?
-
@kevindd992002 said in DNS Resolver Timeouts:
I'm still pushing hard for my ISP to fix this shit
Indeed, if you have a shitty ISP, there’s nothing you can do, but my tests suggested above they are caught quickly
-
@daddygo said in DNS Resolver Timeouts:
I say I'm trying
But all you have accomplished is handing your info off to someone else on silver platter. With explicit trust of what they hand you back.. Your sure not hiding anything from your ISP that.. Since they still know every IP you go to, and simple if they wanted to to just sniff your sni for any https traffic to know what specific domain your going to.. Just like they could with your dns.
So what your trying to hide from the root servers?
Oh - the other thing you did accomplish is slowing down dns.. Guess you got that going for you ;)
-
@johnpoz said in DNS Resolver Timeouts:
@daddygo said in DNS Resolver Timeouts:
I say I'm trying
But all you have accomplished is handing your info off to someone else on silver platter. With explicit trust of what they hand you back.. Your sure not hiding anything from your ISP that.. Since they still know every IP you go to, and simple if they wanted to to just sniff your sni for any https traffic to know what specific domain your going to.. Just like they could with your dns.
So what your trying to hide from the root servers?
Oh - the other thing you did accomplish is slowing down dns.. Guess you got that going for you ;)
@DaddyGo sorry but I'm on @johnpoz on this one. He is completely right. If you're using unbound, then its primary purpose should be a "resolver" like what I've been telling you with my earlier posts. I guess you misunderstood again.
-
@johnpoz said in DNS Resolver Timeouts:
Oh - the other thing you did accomplish is slowing down dns.. Guess you got that going for you ;)
I'm not that simple.....
look at the following...it's not that bad (3 ms)
which I did not show...... where is the ISP here.....
+++edit:
our ISP can't even set foot on us, only the VPN IP can see and it's done -
Handing info over to company B, because you don't trust company A - while company A still has all this info (if they want it). When you don't even know if company A is doing anything with that info in the first place in no way shape or form increasing privacy or security. If anything it lowers both of those..
I could see doing dot if for example you knew that company A was intercepting your dns and messing with it..
But unless company A is doing that, forwarding all your dns to company B does not provide anything of value..
edit:
Your doing a query through a vpn, to cloudflare over dot in 3 ms.. Sorry but BS!!edit: So you have hidden your traffic from your ISP with your vpn.. .You have hidden your IP from the bad old root servers. But now you have handed over all your dns to xyz dns provider.. So how does that again do anything for privacy or security... You have just handed over all your info on a silver platter is all..
You have just traded where you going via IP and sni from your isp to your vpn provider.. How does that improve anything? Again unless you know your isp is messing with this traffic or filtering it, etc.
-
@johnpoz said in DNS Resolver Timeouts:
Your doing a query through a vpn, to cloudflare over dot in 3 ms.. Sorry but BS!!
I know your opinion on this theme (DNS) John, so I do not argue...
indeed, you are half right, but he / she who does nothing will stick his / her head in the sand...or rather I quote :
As Edward Snowden says:
“Arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say.”
+++edit:
otherwise pls. name a secure third party DNS provider, 1.1.1.1 is only because we have a lot of services running on them, otherwise we use ExpVPN DNS servers / VPN servers
They run in RAM and restart every 24 hoursgood old root servers:
-
@kevindd992002 said in DNS Resolver Timeouts:
I guess you misunderstood again.
for sure, that's right
-
@daddygo So to try and resolve this problem for Site B, I want to use unbound on Site B and forward all requests to the unbound in Site A (which acts as a real unbound, not forwarding, DNS server). The problem is when I do this, I still get DNS query timeouts even though the unbound server in Site A is perfectly working. This is randomly happening and is evident when shows me entries with a "retried" status:
When I do a DNS bench test, I get 100% results but then again this test is only ran for a very few seconds and does not catch when the drops are happening:
I also get a fairly stable IPsec tunnel between the two sites:
So I'm not sure why there are DNS drops here. How can I troubleshoot further?
-
@johnpoz I see a lot of these errors in unbound:
Any thoughts?
-
You have a binding issue there.. you have something using the same port already trying to run bind with control 953? Something else.
If you can not bind then no you can not sent to..
What else do you have running that could be trying to use 53 or 953? Do you have bind installed?
Or you trying to bind to an address that is not there, like a vpn interface - use localhost as your outbound interface.
-
@johnpoz said in DNS Resolver Timeouts:
You have a binding issue there.. you have something using the same port already trying to run bind with control 953? Something else.
If you can not bind then no you can not sent to..
What else do you have running that could be trying to use 53 or 953? Do you have bind installed?
Or you trying to bind to an address that is not there, like a vpn interface - use localhost as your outbound interface.
Well, I mean I have dnsmasq disabled, of course. I don't have BIND installed. These are all my packages:
These are my outbound interfaces:
So yes, I'm binding to a VPN interface. I was using only localhost before (because that was your suggestion when we were talking another issue) but that was when I was using OpenVPN where outbound NAT through the OpenVPN interface is actually working. Outbound NAT is needed so that the return traffic for DNS requests from one site to another comes back properly to the source device. Now that I'm using an IPsec tunnel, outbound NAT does not work and is a known issue so I didn't have much choice but to use those individual interfaces as outbound interfaces in unbound. Does it matter though?
-
Well my take is that you having issues with ipsec interface then.. If the connection is updown or having issue then sure unbound could have issues sending on that interface.
Not sure how you think sending dns over a vpn is going to fix a connectivity problem.. If you have connectivity issues over this connection, then your going to have problems..
Putting the traffic inside a tunnel is just going to make it harder to troubleshoot that..
-
@johnpoz said in DNS Resolver Timeouts:
Well my take is that you having issues with ipsec interface then.. If the connection is updown or having issue then sure unbound could have issues sending on that interface.
Not sure how you think sending dns over a vpn is going to fix a connectivity problem.. If you have connectivity issues over this connection, then your going to have problems..
Putting the traffic inside a tunnel is just going to make it harder to troubleshoot that..
That's the thing, the IPsec interface is very stable (as you can see in the graph ping monitor). I'm also using it for a couple of site-to-site traffic traversal and I don't have any issues with it.
Well, since the IPsec tunnel is stable, forwarding DNS requests to the DNS server on the other side "can" server as a workaround. I'm just testing it because sending over DNS requests from a branch site to a main site in an enterprise environment is kinda common so why not try it in my home setup.
-
Well your remote site clearly doesn't think something is stable or is having issues if your getting errors like you posted..
You have something wrong that is clear - what that something is, is the tricky part.. If your vpn is stable - why not move the dns function off the pfsense box and just route the traffic through pfsense.
-
@johnpoz said in DNS Resolver Timeouts:
Well your remote site clearly doesn't think something is stable or is having issues if your getting errors like you posted..
You have something wrong that is clear - what that something is, is the tricky part.. If your vpn is stable - why not move the dns function off the pfsense box and just route the traffic through pfsense.
Right, it's just unbound though so I don't know.
Yeah, I thought of that as well. Since I have pihole anyway, I probably can try forwarding from pihole directly to the DNS servers and see if there's any difference. The only downside to that is I lose the static DHCP DNS entries I have in pfsense.
-
Can these system tunables be related to any of the issues I'm having?
I disabled redirect because it was recommended in a thread about the PCEngines APU2C4 with pfsense.
-
@kevindd992002 said in DNS Resolver Timeouts:
it was recommended in a thread about the PCEngines APU2C4 with pfsense.
Why would that be an issue.. with some specific box? Doesn't make any sense to me at all.. Prob yet another idiot on the net thinking something they changed had some effect on whatever issue they were having without a clue..
What is the reasoning behind why some apu2c4 would have issues with redirects?
I don't see how that could be causing an issue with unbound.. Or your sendto or binding errors.
I lose the static DHCP DNS entries I have in pfsense.
No you could have conditional forward setup on your downstream dns to query pfsense for those.. Domain Override is what its called in unbound.
Do you even have a pppoe connection? I believe I found the thread where that was mentioned do to a kernel problem in freebsd.. But my take is that is related to pppoe connection?
And corrected in 2.4.5?
-
@johnpoz said in DNS Resolver Timeouts:
@kevindd992002 said in DNS Resolver Timeouts:
it was recommended in a thread about the PCEngines APU2C4 with pfsense.
Why would that be an issue.. with some specific box? Doesn't make any sense to me at all.. Prob yet another idiot on the net thinking something they changed had some effect on whatever issue they were having without a clue..
What is the reasoning behind why some apu2c4 would have issues with redirects?
I don't see how that could be causing an issue with unbound.. Or your sendto or binding errors.
Here's some specific posts about it and it was explained in detail:
https://forum.netgate.com/post/908003
https://forum.netgate.com/post/908186
https://forum.netgate.com/post/908187I don't think @dugeem is an idiot at all. He knows his stuff, from the looks of it. The issue is not APU2C4 specific as explained in the posts.
But yeah, I'm just thinking hard of all the "basic" modifications I did so far with pfsense to see if I messed up something but I doubt it because there isn't really a lot of modifications here.
Oh, you're right! I forgot about domain override. Yeah, that makes sense.
Another option (just for the heck of it) I'm testing now is to use dnsmasq to forward to the main site DNS server.