Strange issue - not sure how to fix
-
Fair enough. I appreciate your diligent help over the last couple of days with this vexing issue. Along the way, I have learned a lot thanks to you. Much appreciated. I will continue to monitor this to see if the problem recurs, and will post again if it does. Thanks again.
-
No problem - glad finally figured out the issue.
-
@johnpoz said in Strange issue - not sure how to fix:
No problem - glad finally figured out the issue.
... hopefully! I am keeping my fingers crossed.
-
Looks like I spoke too soon. Problem recurred. And the output of the dig feedly.com + trace command is showing that the root servers are not being reached.
; <<>> DiG 9.12.2-P1 <<>> feedly.com +trace ;; global options: +cmd . 86400 IN NS b.root-servers.net. . 86400 IN NS a.root-servers.net. . 86400 IN NS e.root-servers.net. . 86400 IN NS m.root-servers.net. . 86400 IN NS l.root-servers.net. . 86400 IN NS j.root-servers.net. . 86400 IN NS g.root-servers.net. . 86400 IN NS c.root-servers.net. . 86400 IN NS h.root-servers.net. . 86400 IN NS d.root-servers.net. . 86400 IN NS k.root-servers.net. . 86400 IN NS f.root-servers.net. . 86400 IN NS i.root-servers.net. . 86400 IN RRSIG NS 8 0 518400 20200307170000 20200223160000 33853 . GN9hZh6mOFruU2IWiP4EIvALgU6uQLlXo748wScmwsJYCcmPiPFT6y2q NnsJfg06OrI2qhZueL0NNtcZ5W9hGLFff3nzUcOETUnEWcbW4MwIRWDx VQ4MVMmsnIhWM3BCQdA5hG0eIALwJ+9q3aUe+lHhORN98lpYxfs+tx73 A+GgmNZUm4Coz44hmhJ6G+mM0mYsMLZ1oAvDH/exgo/VExwEA9P3xyRQ b5H09yJdc0cdmygbD8R1L/yjyQUlnyKLOC8ZQ3bpei9NKRXWqv5p29cn pwt4AiaAuZNkCVQA9SIWIKdFVrBh40NsO+RDpEcmh84r30wTVm+qYGT4 PItLag== ;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 4256 ms ;; connection timed out; no servers could be reached
But I am able to connect to Feedly because there is a cached entry in Unbound - at least until the cache expires. Back to being mystified.
-
Any chance this NAT rule could be causing the problem?
Source any/any
Destination (non-local DNS resolver)/port 53
Redirect to 127.0.0.1:53Would this be blocking access to the root servers?
-
No your just redirecting say dns to 8.8.8.8 from a client to pfsense..
What unbound restarting when you tested that? Did you sniff while you did that, did you see anything go out? Did you see answers? Look to see how long unbound was running after that..
You sure your not just restarting unbound all the time.. Do you have it registering dhcp?
-
Then I am stumped. I really don't understand what is blocking access to the root servers.
No unbound was not restarting when I posted the last snippet last night.
Here is the output for unbound now:version: 1.9.1 verbosity: 1 threads: 4 modules: 2 [ validator iterator ] uptime: 33792 seconds options: control(ssl) unbound (pid 79706) is running...
(I rebooted my pfsense last night after my previous post - it has been running since then.)
Yes, Unbound is registering DHCP. Could that be the issue here? I have always had it set up like that (for many years) with none of these problems.
My suspicion is that the answer here lies somewhere with pfBlockerNG and its updates and the effect on DNS resolver. But I am not sure exactly where, or how to troubleshoot or resolve it beyond disabling pfBlocker. For now, I have reduced the frequency of pfBlocker NG updates/cron jobs to once daily, which will make it easier for me to figure out if the failure of DNS resolution coincides with that process.
-
Removing dhcp leases from Unbound will prevent it restarting anywhere near as often. If that stops this issue happening it will be a clue at least.
You can hit a critical point when the size of the lists pfBlocker is putting into Unbound makes it slow to start and the number of clients creating dhcp leases ends up running together and bad things happen. Though usually that a lot. Of both.Steve
-
@stephenw10 said in Strange issue - not sure how to fix:
Removing dhcp leases from Unbound will prevent it restarting anywhere near as often. If that stops this issue happening it will be a clue at least.
You can hit a critical point when the size of the lists pfBlocker is putting into Unbound makes it slow to start and the number of clients creating dhcp leases ends up running together and bad things happen. Though usually that a lot. Of both.Steve
I have not observed frequent Unbound restarts, so I am not sure whether that is part of the issue. (I do have DHCP leases in Unbound, but these leases have been there for years and have not previously caused any issues for me).
As I noted earlier, I reduced the frequency of pfBlockerNG updates to once daily. The first such update occurred at midnight last night, and this AM, I was in fact able to access feedly.com (good). However, when I checked the log for pfBlockerNG, every single feed had the following message:
Downloading update [ 02/25/20 00:26:26 ] . cURL Error: 7 Retry in 5 seconds... . cURL Error: 7 Retry in 5 seconds... . cURL Error: 7 Retry in 5 seconds... .. unknown http status code | 0
i.e. no feeds could be updated. I know for a fact that after I reboot pfSense, the pfBlockerNG list updates work, because I have previously checked the log at that time to ensure that the lists updated. Is this evidence that pfSense can't reach the root servers to resolve DNS? And if so, does this get me any closer to identifying the issue in my case? I should point out that I don't have a ton of lists active in pfBlocker, and my pfSense has plenty of RAM available.
output of dig feedly.com +trace command right now:
; <<>> DiG 9.12.2-P1 <<>> feedly.com +trace ;; global options: +cmd . 86361 IN NS d.root-servers.net. . 86361 IN NS c.root-servers.net. . 86361 IN NS i.root-servers.net. . 86361 IN NS l.root-servers.net. . 86361 IN NS b.root-servers.net. . 86361 IN NS g.root-servers.net. . 86361 IN NS e.root-servers.net. . 86361 IN NS m.root-servers.net. . 86361 IN NS f.root-servers.net. . 86361 IN NS h.root-servers.net. . 86361 IN NS a.root-servers.net. . 86361 IN NS k.root-servers.net. . 86361 IN NS j.root-servers.net. . 86361 IN RRSIG NS 8 0 518400 20200309050000 20200225040000 33853 . H9pXeT3s7yemEb6BFdL+eF38hP6nN2fY+S5UD+yM/06AdAlAVOo2wEBW oPAeplPIb1Zb7qNunqUQIHn+FDZqt79A43Poa3OevXxsQERuBcat6IkX v4H6/b3pGsA40KH1GfcbAnz8QlBhHrGIoFlgrQnvltIt7k9pCqA+D3iq YjMz/3dTlePWDQggN3Rao/nn6ZLO+4LF+FM52XTkCgXFPU5Ska6qzm5G 2gDOI7dpyk5DhHj2uc6IrsZWjwgG897Ba1G1fKZxUt8A208O58m4tbs1 ccrHmkfeVdQOwWOBOPCsnYQLtY8BVGL35U6VkGRdqU4U6jeGEFD0A2Pe QfIDkg== ;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms ;; connection timed out; no servers could be reached
(feedly.com is reachable from my network - DNS resolver must be serving up a cached address).
Unbound has been up for a while (33+ hours, i.e. since my last reboot):
version: 1.9.1 verbosity: 1 threads: 4 modules: 2 [ validator iterator ] uptime: 119718 seconds options: control(ssl) unbound (pid 79706) is running...
Other data:
The following name servers are used for lookup of . ;rrset 85994 13 1 11 5 . 85994 IN NS d.root-servers.net. . 85994 IN NS c.root-servers.net. . 85994 IN NS i.root-servers.net. . 85994 IN NS l.root-servers.net. . 85994 IN NS b.root-servers.net. . 85994 IN NS g.root-servers.net. . 85994 IN NS e.root-servers.net. . 85994 IN NS m.root-servers.net. . 85994 IN NS f.root-servers.net. . 85994 IN NS h.root-servers.net. . 85994 IN NS a.root-servers.net. . 85994 IN NS k.root-servers.net. . 85994 IN NS j.root-servers.net. . 85994 IN RRSIG NS 8 0 518400 20200309050000 20200225040000 33853 . H9pXeT3s7yemEb6BFdL+eF38hP6nN2fY+S5UD+yM/06AdAlAVOo2wEBWoPAeplPIb1Zb7qNunqUQIHn+FDZqt79A43Poa3OevXxsQERuBcat6IkXv4H6/b3pGsA40KH1GfcbAnz8QlBhHrGIoFlgrQnvltIt7k9pCqA+D3iqYjMz/3dTlePWDQggN3Rao/nn6ZLO+4LF+FM52XTkCgXFPU5Ska6qzm5G2gDOI7dpyk5DhHj2uc6IrsZWjwgG897Ba1G1fKZxUt8A208O58m4tbs1ccrHmkfeVdQOwWOBOPCsnYQLtY8BVGL35U6VkGRdqU4U6jeGEFD0A2PeQfIDkg== ;{id = 33853} ;rrset 86035 1 0 8 3 j.root-servers.net. 86035 IN A 192.58.128.30 ;rrset 86035 1 0 8 3 j.root-servers.net. 86035 IN AAAA 2001:503:c27::2:30 ;rrset 86035 1 0 8 3 k.root-servers.net. 86035 IN A 193.0.14.129 ;rrset 86035 1 0 8 3 k.root-servers.net. 86035 IN AAAA 2001:7fd::1 ;rrset 86035 1 0 8 3 a.root-servers.net. 86035 IN A 198.41.0.4 ;rrset 86035 1 0 8 3 a.root-servers.net. 86035 IN AAAA 2001:503:ba3e::2:30 ;rrset 86035 1 0 8 3 h.root-servers.net. 86035 IN A 198.97.190.53 ;rrset 86035 1 0 8 3 h.root-servers.net. 86035 IN AAAA 2001:500:1::53 ;rrset 86034 1 0 8 3 f.root-servers.net. 86034 IN A 192.5.5.241 ;rrset 86035 1 0 8 3 f.root-servers.net. 86035 IN AAAA 2001:500:2f::f ;rrset 86034 1 0 8 3 m.root-servers.net. 86034 IN A 202.12.27.33 ;rrset 86034 1 0 8 3 m.root-servers.net. 86034 IN AAAA 2001:dc3::35 ;rrset 86034 1 0 8 3 e.root-servers.net. 86034 IN A 192.203.230.10 ;rrset 86034 1 0 8 3 e.root-servers.net. 86034 IN AAAA 2001:500:a8::e ;rrset 86034 1 0 8 3 g.root-servers.net. 86034 IN A 192.112.36.4 ;rrset 86034 1 0 8 3 g.root-servers.net. 86034 IN AAAA 2001:500:12::d0d ;rrset 86034 1 0 8 3 b.root-servers.net. 86034 IN A 199.9.14.201 ;rrset 86034 1 0 8 3 b.root-servers.net. 86034 IN AAAA 2001:500:200::b ;rrset 86034 1 0 8 3 l.root-servers.net. 86034 IN A 199.7.83.42 ;rrset 86034 1 0 8 3 l.root-servers.net. 86034 IN AAAA 2001:500:9f::42 ;rrset 86034 1 0 8 3 i.root-servers.net. 86034 IN A 192.36.148.17 ;rrset 86034 1 0 8 3 i.root-servers.net. 86034 IN AAAA 2001:7fe::53 ;rrset 86034 1 0 8 3 c.root-servers.net. 86034 IN A 192.33.4.12 ;rrset 86034 1 0 8 3 c.root-servers.net. 86034 IN AAAA 2001:500:2::c ;rrset 86034 1 0 8 3 d.root-servers.net. 86034 IN A 199.7.91.13 ;rrset 86034 1 0 8 3 d.root-servers.net. 86034 IN AAAA 2001:500:2d::d Delegation with 13 names, of which 0 can be examined to query further addresses. It provides 26 IP addresses. 2001:500:2d::d not in infra cache. 199.7.91.13 rto 315 msec, ttl 494, ping 3 var 78 rtt 315, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:500:2::c not in infra cache. 192.33.4.12 rto 199 msec, ttl 494, ping 11 var 47 rtt 199, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:7fe::53 not in infra cache. 192.36.148.17 rto 344 msec, ttl 494, ping 44 var 75 rtt 344, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:500:9f::42 not in infra cache. 199.7.83.42 rto 195 msec, ttl 494, ping 11 var 46 rtt 195, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:500:200::b not in infra cache. 199.9.14.201 rto 308 msec, ttl 112, ping 36 var 68 rtt 308, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:500:12::d0d not in infra cache. 192.112.36.4 not in infra cache. 2001:500:a8::e not in infra cache. 192.203.230.10 rto 195 msec, ttl 494, ping 3 var 48 rtt 195, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:dc3::35 not in infra cache. 202.12.27.33 rto 360 msec, ttl 494, ping 8 var 88 rtt 360, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:500:2f::f not in infra cache. 192.5.5.241 rto 229 msec, ttl 494, ping 1 var 57 rtt 229, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:500:1::53 not in infra cache. 198.97.190.53 rto 204 msec, ttl 494, ping 12 var 48 rtt 204, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:503:ba3e::2:30 not in infra cache. 198.41.0.4 rto 145 msec, ttl 494, ping 13 var 33 rtt 145, tA 0, tAAAA 0, tother 0, EDNS 0 probed. 2001:7fd::1 not in infra cache. 193.0.14.129 not in infra cache. 2001:503:c27::2:30 not in infra cache. 192.58.128.30 rto 163 msec, ttl 494, ping 15 var 37 rtt 163, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
-
What I posted above would seem to confirm that the root servers are reachable by my system. So I am not sure why the dig feedly.com +trace command yields the output I posted above. But feedly is still reachable via cached DNS:
; <<>> DiG 9.12.2-P1 <<>> feedly.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18657 ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;feedly.com. IN A ;; ANSWER SECTION: feedly.com. 0 IN A 104.20.59.241 feedly.com. 0 IN A 104.20.60.241 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Tue Feb 25 07:24:23 EST 2020 ;; MSG SIZE rcvd: 71
-
@johnpoz said in Strange issue - not sure how to fix:
dig @h.root-servers.net com NS
This was just now.
dig: couldn't get address for 'h.root-servers.net': not found
; <<>> DiG 9.12.2-P1 <<>> @198.97.190.53 com ns ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached
I am really stumped by all this.
-
Because unbound not running.. Why don't you turn off pfblocker for a bit and see if you continue to have issues.
-
@johnpoz said in Strange issue - not sure how to fix:
Because unbound not running.. Why don't you turn off pfblocker for a bit and see if you continue to have issues.
My previous post showed that unbound IS running. That's what makes this so perplexing.
I am at work now, but when I return home, I am going to give your suggestion of disabling pfBlocker a shot to see what I can discover.
-
Popping back in here. I think the issue might be solved. After searching these forums, I came across a post in this thread (https://forum.netgate.com/topic/147092/curl-error-7-on-all-downloads/8) that noted curl errors in pfBlockerNG after the default WAN gateway had been changed. I have been observing the same errors when pfBlockerNG updates, and lo and behold, my default gateway had also changed from what I had originally set. I changed it back to what it should be, and instantly DNS began to resolve. However, I am not sure how/why this unintended gateway change occurred, or how to prevent it from happening again.
-
@pfguy2018 said in Strange issue - not sure how to fix:
my default gateway had also changed from what I had originally set.
Meaning what exactly.. You have more than 1 wan interface? Your using PPPoE? Your using a VPN? What do you mean your gateway changed?
-
@johnpoz said in Strange issue - not sure how to fix:
@pfguy2018 said in Strange issue - not sure how to fix:
my default gateway had also changed from what I had originally set.
Meaning what exactly.. You have more than 1 wan interface? Your using PPPoE? Your using a VPN? What do you mean your gateway changed?
Yes - I have several outgoing interfaces set up due to VPN use. The default has always been the WAN (non VPN) interface (for many years). At some point this got changed (without any intervention on my part), and re-setting it seems to have fixed the DNS issue. I will continue to monitor to see if this remains fixed. But I have no idea how/why the change happened in the first place, and whether it might occur again.
-
Well if you pull routes from your vpn service, its possible that becomes the default..
If your going to use a vpn service - its best to not pull routes from them, even though pretty much all their guides say to, or don't mention it (and its default)..
-
Where would I adjust that setting for VPN?
Also - interestingly - the default interface became one of the incoming VPN servers that are run on my pfSense box (I have several). Not sure if that is relevant or not.
-
In your vpn client setting, check the box that says do not pull routes..
-
Thanks. Is there an equivalent setting for the VPN servers that I run on the pfSense box? I don't actually have any VPN clients set up on pfSense