Intermittent DNS timeouts - DNS Resolver
-
Hi,
We have two sites with the same configuration (sg2440 Netgate boxes), same isp fiber offer and number of pcs (nearly 250 machines).
One of them has intermittents dns timeouts all-day long like this :
_[2.4.2-RELEASE][root@pfSense.localdomain]/root: drill -V5 orange.fr
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
;; flags: rd ; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; orange.fr. IN A;; ANSWER SECTION:
;; AUTHORITY SECTION:
;; ADDITIONAL SECTION:
;; Query time: 0 msec
;; WHEN: Thu Mar 15 15:09:00 2018
;; MSG SIZE rcvd: 0
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 44419
;; flags: qr rd ra ; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;; orange.fr. IN A;; ANSWER SECTION:
orange.fr. 2746 IN A 193.252.133.34
orange.fr. 2746 IN A 193.252.148.140;; AUTHORITY SECTION:
;; ADDITIONAL SECTION:
;; Query time: 10036 msec
;; SERVER: 109.0.66.11
;; WHEN: Thu Mar 15 15:09:00 2018
;; MSG SIZE rcvd: 59_The Isp router has been changed, cables have been replaced, Pfsense configuration entirely redone from scratch.
I let a script running with this drill command wich shows that the trouble happens only during the day when computers are on.
During the night the query time never exceeds 40 msecs.I've tried many options in the Dns Resolver section, switched for Dns forwader but the trouble keeps on going on.
One thing i've noticed : We have tree vlans on the lan interface and one of these vlans was inactive during the last holidays. During this time timeouts were gone.
I don't see in the fw and dns resolver logs what could explain this but it seems that something from this vlan compromises the unbound service.
I've been using Pfsense for years with default dns options and never encountered this trouble before.
Thanks in advance, i'm running out of ideas.
-
I would suggest you do a
dig orange.fr +trace
vs using drill - you will get way more info. Like where in the resolve tree you timeouts are happening full path or just part of it.
-
Hi and thank you for your help.
I'm not at work now but i already did dig with +trace option.
Here's the result :
_dig orange.fr +trace
; <<>> DiG 9.11.2 <<>> orange.fr +trace;; global options: +cmd
. 3598615 IN NS b.root-servers.net.
. 3598615 IN NS h.root-servers.net.
. 3598615 IN NS a.root-servers.net.
. 3598615 IN NS k.root-servers.net.
. 3598615 IN NS f.root-servers.net.
. 3598615 IN NS m.root-servers.net.
. 3598615 IN NS i.root-servers.net.
. 3598615 IN NS d.root-servers.net.
. 3598615 IN NS l.root-servers.net.
. 3598615 IN NS e.root-servers.net.
. 3598615 IN NS c.root-servers.net.
. 3598615 IN NS g.root-servers.net.
. 3598615 IN NS j.root-servers.net.
couldn't get address for 'l.root-servers.net': not found
;; Received 241 bytes from 109.0.66.21#53(109.0.66.21) in 38 ms
fr. 172800 IN NS f.ext.nic.fr.
fr. 172800 IN NS e.ext.nic.fr.
fr. 172800 IN NS d.ext.nic.fr.
fr. 172800 IN NS d.nic.fr.
fr. 172800 IN NS g.ext.nic.fr.
fr. 86400 IN DS 42104 8 2 8D913A49C3FA2A39BA0065B4E18BA793E3AD128F7C6C8AA008AEFE0A 14435DD5
fr. 86400 IN DS 35095 8 2 23C6CAADC9927EE98061F2B52C9B8DA6B53F3F648F814A4A86A0FAF9 843E2C4E
fr. 86400 IN RRSIG DS 8 1 86400 20180326130000 20180313120000 41824 . N5A1ChV2LEwsM8mjoAdXPxnWf5ILSAB/OQgi5cSnkHPLFp3u9+jP6/wD guTn8JGYtnKSJwwl9RWlM6dSpWYzprUAEwZX4CtKP6TdY+QudsmxMfnG 3bUj27TPeiv/Q9Ns7UzniDZ3r1xWWCA/b4CiIJyJPLrSRJaovXV2o6rE 8TiKcfKRm2Ngus/5eXSKHA6FY5whzRrgrDTOjl339A+GsRCC3JUKhHl1 jz9fl3qr52z4NiRo0MnSTMIaGL1ia4Tk+vQSDmBpRf/y4ya+KFqAs//p OSajqamUOfuTSZhHPZqI4NYRH+tXGQgTsRG/pDS/3GvrrEldvoLqzeU/ lLN4qA==
;; Received 729 bytes from 192.33.4.12#53(c.root-servers.net) in 36 ms
orange.fr. 172800 IN NS ns4.orange.fr.
orange.fr. 172800 IN NS ns2.orange.fr.
orange.fr. 172800 IN NS ns3.orange.fr.
orange.fr. 172800 IN NS ns1.orange.fr.
1QL8O4QD0QIL0LC26D6NPPUV889GM04R.fr. 5400 IN NSEC3 1 1 1 F3A72438 1QLBQ8M4EBBSRIGRN3T5896VQ40ICA4A NS SOA TXT NAPTR RRSIG DNSKEY NSEC3PARAM
1QL8O4QD0QIL0LC26D6NPPUV889GM04R.fr. 5400 IN RRSIG NSEC3 8 2 5400 20180429071937 20180228071937 33714 fr. DzfgHDff+ZUKQcypIE0hqmjqX/bvH6Zk1bPgcb2HqTcfM0kD9yyKl3Ix +ymSkBb1kyEd80SPk00qpKv5Lm7A1qNdOesrZygOkfexRlalMgdc0uYr HrEhFVsvjnwo7GMzHkRezvxvfseYKsuCjxV/GA/X0PVuJvjdqm5A63oA H1Q=
47GI9CMOK0V4S90B5T95V417T0QPO46A.fr. 5400 IN NSEC3 1 1 1 F3A72438 47GS4UGDSJ1GLPMV3QKN7C8CRP38LQE7 NS DS RRSIG
47GI9CMOK0V4S90B5T95V417T0QPO46A.fr. 5400 IN RRSIG NSEC3 8 2 5400 20180429071937 20180228071937 33714 fr. tpC3+hxZpWZHSGA6WJWAyJ74/UcTBUj7opbDInAzh/NYxDFnsbJcckok F0IF4eGdk74DAapOOssX6HX/GX0sPGzIi8n56HCBmwGyFWPbmoMAKA23 8eyABmbJRuTqE9kBCTrhUsc0V/80Uje42qzKU3tNOHr8D4yl2S82UBDy lXY=
;; Received 721 bytes from 192.5.4.2#53(d.ext.nic.fr) in 185 ms
orange.fr. 3600 IN A 193.252.148.140
orange.fr. 3600 IN A 193.252.133.34
orange.fr. 172800 IN NS ns1.orange.fr.
orange.fr. 172800 IN NS ns2.orange.fr.
;; Received 194 bytes from 80.10.201.224#53(ns1.orange.fr) in 38 ms_
I can't talk randomely to some root servers, here it's the "l.root-servers.net" but it can be "g.root-servers.net" a few minutes later.
Contrary to drill the dig command never returns timeouts.
-
Yeah this is a problem…
couldn't get address for 'l.root-servers.net': not found
;; Received 241 bytes from 109.0.66.21#53(109.0.66.21) in 38 ms
See the 38 ms.. Not sure what your running but getting the roots should be ZERO ms.. since the roots should be local..
[2.4.2-RELEASE][root@sg4860.local.lan]/root: dig orange.fr +trace ; <<>> DiG 9.11.2 <<>> orange.fr +trace ;; global options: +cmd . 515010 IN NS j.root-servers.net. . 515010 IN NS a.root-servers.net. . 515010 IN NS c.root-servers.net. . 515010 IN NS i.root-servers.net. . 515010 IN NS d.root-servers.net. . 515010 IN NS b.root-servers.net. . 515010 IN NS g.root-servers.net. . 515010 IN NS k.root-servers.net. . 515010 IN NS h.root-servers.net. . 515010 IN NS f.root-servers.net. . 515010 IN NS l.root-servers.net. . 515010 IN NS m.root-servers.net. . 515010 IN NS e.root-servers.net. . 515010 IN RRSIG NS 8 0 518400 20180328200000 20180315190000 41824 . gZKff74Th31jl+jS470MQHNVnV0txz48FChiDL/brOf2CXl6XPyIRQ1C 22qzr69/S6pDoO8oPW0nS+2IBxXOhnbU8tfNjHSOVS6yvnmoP0SHEV+B yi5WUyJDF4GN+dS5aNW30RM1dtaQkunLpjY2jTIDkzstV9BmnQnKcYr0 2oltImSStLNxGxKwXzksXJ3rIAhBHKdc1bVSQyyLqbz9y7A8sLOiqUy5 yahLzv2zuIMcuMYvF7Sy72MwfQUnPZ4yR4DP2cvccVYbOox4V4smc9Uy 3Ncabk05gdceltRwgZ2t1c+8StNVR1oKLRUE9wkhyT1zVrBcQqy5pyB2 W9HBgQ== ;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms
Not sure what you have running, but I know for a fact its not out of the box default… Somebody messed with the root hints on your unbound.
Seems you got them from here?????
;; ANSWER SECTION:
21.66.0.109.in-addr.arpa. 86400 IN PTR vip-dns-entr-secondary.dns.sfr.net. -
Seems you got them from here?????
;; ANSWER SECTION:
21.66.0.109.in-addr.arpa. 86400 IN PTR vip-dns-entr-secondary.dns.sfr.net.Yes, it's one of the two dns servers of our Isp.
I usually let them in the general setup section.I've just removed them and let the localhost only in /etc/resolv.conf
Both "Allow DNS server list to be overridden by DHCP/PPP on WAN" and "Do not use the DNS Forwarder or Resolver as a DNS server for the firewall" are unchecked.
-
And now what is your +trace look like?
-
Seems fine to me since two hours :
_[2.4.2-RELEASE][root@pfSense.localdomain]/root: dig orange.fr +trace
; <<>> DiG 9.11.2 <<>> orange.fr +trace
;; global options: +cmd
. 511241 IN NS a.root-servers.net.
. 511241 IN NS b.root-servers.net.
. 511241 IN NS c.root-servers.net.
. 511241 IN NS d.root-servers.net.
. 511241 IN NS e.root-servers.net.
. 511241 IN NS f.root-servers.net.
. 511241 IN NS g.root-servers.net.
. 511241 IN NS h.root-servers.net.
. 511241 IN NS i.root-servers.net.
. 511241 IN NS j.root-servers.net.
. 511241 IN NS k.root-servers.net.
. 511241 IN NS l.root-servers.net.
. 511241 IN NS m.root-servers.net.
. 511241 IN RRSIG NS 8 0 518400 20180329050000 20180316040000 41824 . kZFqF+zfx/fehQ9aWp1H6BQ0ZwJjuaHsxFiTYqoKvXDI5fwUYnhwbn9Q AxWgzt5I6HfqGKDDloSNkgLSznj3i4ZB+CRGOtPHHfhpG276KeZhdtU7 H9fuCWsQg0AIwoZ3gyOgFGA4hzt4IGnwQhuJxc/tlQNUKif8hImZAvDm MLh+6fS4qQKE9wmFvPUgfxS+r9MXEYC52XozCVa1KLP6klOtQpbPvPtq FMjbUScZ3AgXNJFBAc1Ww/uBCyYJ97g4JbTB9Z7a+waLXJp2lnHlad/U gf+6idfXjtP5FUJJjiCKAByxmLz5HyD1y45liQPxBsNr1BNtSGY4b5kd GET6qw==
;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 msfr. 172800 IN NS g.ext.nic.fr.
fr. 172800 IN NS f.ext.nic.fr.
fr. 172800 IN NS e.ext.nic.fr.
fr. 172800 IN NS d.nic.fr.
fr. 172800 IN NS d.ext.nic.fr.
fr. 86400 IN DS 35095 8 2 23C6CAADC9927EE98061F2B52C9B8DA6B53F3F648F814A4A86A0FAF9 843E2C4E
fr. 86400 IN DS 42104 8 2 8D913A49C3FA2A39BA0065B4E18BA793E3AD128F7C6C8AA008AEFE0A 14435DD5
fr. 86400 IN RRSIG DS 8 1 86400 20180329050000 20180316040000 41824 . h3iIDv0HQpg6A8RZ4UBRFTlzU18UUPVYZmGA6A/KvZKIzXeja6MKGYrN 1E0Wfy87G/zfjaR5xgBHF88Fo6dZmkbxQ+BlZeABfcfm9Xzarn9jAEWX Bc4tWVa3YYgR3jcEvUaPJN67EHQFSbUaaLqLAIr2azl0KuFBInL+l0cp YtZX4TRm9iXkW4WJtnTo0YSCuTLkWRz/gRGp5XKJsHsI/gav194XW1dd 37hxWkCaBH8bAaYXbodr1BLWJ5D/44wTm2SpSSTgcGfG6k0hT3obGkfb 9AdtLts5bgD3XzzTlXt+Zgw2KRbKashFNxR5argLEF3eK2BJkIGx6sp6 eiYmbA==
;; Received 729 bytes from 192.33.4.12#53(c.root-servers.net) in 33 msorange.fr. 172800 IN NS ns1.orange.fr.
orange.fr. 172800 IN NS ns2.orange.fr.
orange.fr. 172800 IN NS ns3.orange.fr.
orange.fr. 172800 IN NS ns4.orange.fr.
1ql8o4qd0qil0lc26d6nppuv889gm04r.fr. 5400 IN NSEC3 1 1 1 F3A72438 1QLBQ8M4EBBSRIGRN3T5896VQ40ICA4A NS SOA TXT NAPTR RRSIG DNSKEY NSEC3PARAM
1ql8o4qd0qil0lc26d6nppuv889gm04r.fr. 5400 IN RRSIG NSEC3 8 2 5400 20180512151953 20180313151953 50364 fr. Vhj8pmV5n6D814TSR4otFmVe7kbuYSnxBF7ukoHrTsUISSqoUVyc/tK7 L5tton0NJaf6HCo8iSr+wHwbjgvuN8oj/ZEOebbYGCgbP8FIHnjSLZwT M7FX+VZl5rWfvZbZ6ULyUbIanmA0gktREP1IFWl09IC66i/MjO8FByMK s2I=
47gi9cmok0v4s90b5t95v417t0qpo46a.fr. 5400 IN NSEC3 1 1 1 F3A72438 47GS4UGDSJ1GLPMV3QKN7C8CRP38LQE7 NS DS RRSIG
47gi9cmok0v4s90b5t95v417t0qpo46a.fr. 5400 IN RRSIG NSEC3 8 2 5400 20180512151953 20180313151953 50364 fr. vn2de50juxfGW2MdOqZSpxB4xPt5bXEyaE4/87Kk6WyJHV9NCAlSDGxh cngFihhcLv1Nh5TQf/YhIuF4L2ZdYAgaZ7MfYSq7HJqQSZZEn7sDF3w+ 2sckRwm1jbDKq7Ru7ty7j5nQsL3dWovBIHNDdBXtJUfLPUn8s9pVTANP NVY=
;; Received 721 bytes from 194.0.9.1#53(d.nic.fr) in 33 msorange.fr. 3600 IN A 193.252.133.34
orange.fr. 3600 IN A 193.252.148.140
orange.fr. 172800 IN NS ns1.orange.fr.
orange.fr. 172800 IN NS ns2.orange.fr.
;; Received 194 bytes from 80.10.201.224#53(ns1.orange.fr) in 33 ms_"dig_orange_local" below is the file generated by the "dig orange.fr +trace" command executed every minute in the crontab)
[2.4.2-RELEASE][root@pfSense.localdomain]/root: tail -n 200 dig_orange_local | grep Received
;; Received 729 bytes from 199.7.91.13#53(d.root-servers.net) in 41 ms
;; Received 749 bytes from 193.176.144.22#53(e.ext.nic.fr) in 47 ms
;; Received 194 bytes from 80.10.202.224#53(ns2.orange.fr) in 50 ms
;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms
;; Received 729 bytes from 192.58.128.30#53(j.root-servers.net) in 41 ms
;; Received 721 bytes from 194.0.36.1#53(g.ext.nic.fr) in 40 ms
;; Received 194 bytes from 80.10.202.224#53(ns2.orange.fr) in 39 ms
;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms
;; Received 729 bytes from 192.58.128.30#53(j.root-servers.net) in 41 ms
;; Received 721 bytes from 194.0.36.1#53(g.ext.nic.fr) in 39 ms
;; Received 194 bytes from 80.10.203.224#53(ns4.orange.fr) in 33 ms
;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms
;; Received 729 bytes from 202.12.27.33#53(m.root-servers.net) in 253 ms
;; Received 721 bytes from 194.0.9.1#53(d.nic.fr) in 33 ms
;; Received 194 bytes from 80.10.200.224#53(ns3.orange.fr) in 39 ms
;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms
;; Received 729 bytes from 198.41.0.4#53(a.root-servers.net) in 105 ms
;; Received 721 bytes from 192.5.4.2#53(d.ext.nic.fr) in 180 ms
;; Received 194 bytes from 80.10.202.224#53(ns2.orange.fr) in 39 ms"drill_orange" is for "drill -V5 orange.fr" command run in the same way in the crontab
[2.4.2-RELEASE][root@pfSense.localdomain]/root: tail -n 500 drill_orange | grep Query
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 1 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec
;; Query time: 0 msec -
So my mistake, i presume ?
Bad habit to put dns servers in the general section instead of letting unbound running the thing by itself ?
But it's the only place where i have this trouble. All other sites have the isp dns servers set like this and they don't have timeouts.
-
your drill is just showing you what is cached… Not how long its taking to actual resolve something. Once something is in the cache you do not have to go look it up again until the ttl expires
-
your drill is just showing you what is cached… Not how long its taking to actual resolve something. Once something is in the cache you do not have to go look it up again until the ttl expires
Of course, it makes sense. Useless information from me.
-
" All other sites have the isp dns servers set like this and they don't have timeouts."
Maybe their isp dns doesn't suck ;)
There is ZERO reason to set isp dns if your going to resolve. Pfsense out of the box resolves, and uses itself to resolve stuff it wants. The stuff it resolves it caches and hands to clients from its cache. If your having problems resolving something you need to figure out where the problem is - more than likely its going to be talking to the authoritative servers directly since roots shouldn't be a problem, if they are then you have a overall network issue and nothing is going to resolve.
When you forward your at the mercy of how fast you forward wants to answer, and with what you have no idea it might be something they had cached and you get a 3 second ttl and then have to ask them again 3 seconds later. Or they do no have it cached and it has to forward or resolve what your looking for, etc.
Unless your on a very high latency line or very limited bandwidth I don't see why anyone would forward vs resolve..
-
Ok, got it, thank you very much :).
-
" All other sites have the isp dns servers set like this and they don't have timeouts."
Maybe their isp dns doesn't suck ;)
There is ZERO reason to set isp dns if your going to resolve.
Unless your on a very high latency line or very limited bandwidth I don't see why anyone would forward vs resolve..
John I agree with what you are saying but maybe you can set my mind at ease. I live on the West Coast, south of Vancouver & North of Seattle.
When I was resolving I felt I was seeing too much traffic to East Aisa, and experienced disturbing things like duplicate login screens to places like PayPal. I became suspicious of DNS malfeasance at the TLD level in E. Asia. So my decision to Forward Unbound to North America based public servers was for security & privacy purposes.
Do you believe this is a rational concern ?
Is it possible to force Unbound to resolve only to specific TLD root servers in the United states and Canada ?
-
@Locked:
When I was resolving I felt I was seeing too much traffic to East Aisa, and experienced disturbing things like duplicate login screens to places like PayPal. I became suspicious of DNS malfeasance at the TLD level in E. Asia. So my decision to Forward Unbound to North America based public servers was for security & privacy purposes.
Enforce DNSSEC - only used by the Resolver.
Add a tool in your browser so you can see DNSSEC is validated.When using a forwarder, you hand over your request to another DNS server (so just one more place where spoofing can happen), that finally will use "world wide resolving" anyway.
"duplicate login screens to places like PayPal" = pisching pages ? these are not indexed buy the major search engins. You can access these sites/pages when you receive a pure spam mail with a link.
This isn't a real problem actually : your bank, paypal, the fisc, etc will never send you a mail with a link to the login page.
Falling into the trap of a pisching page is, I'm sorry to say so, for those who really still do not understand what "Internet" is …
These people need to learn,
or
the access to the Internet should be controlled, locked down, and only made accessible for those who passed all the examens ..
Chose your option.@Locked:
Is it possible to force Unbound to resolve only to specific TLD root servers in the United states and Canada ?
The tld root servers are syncing all the time.
If one has wrong info, they all have.I guess it possible to edit the root level file that unbound uses to find tld servers, which are used to find the domain name servers.
Keep in mind that this USA based root server could still direct the request to a tld (a registrar) server situated "on the other side of the planet". -
"When I was resolving I felt I was seeing too much traffic to East Aisa"
And how exactly did you determine that? Do you know the root servers are - which TLD are you talking about?
The roots are mostly all anycast anyway. Unbound will determine which NS it should normally talk to via how fast it responds, etc.
This might help you understand how unbound picks the NS to talk to or which ones it will try first etc..
https://www.unbound.net/documentation/info_timeout.htmlIf you were seeing lots of traffic to places your not familiar with my guess would be you had something on your network requesting that… When you forward you have no clue to where that answer might be coming from.. Since your just asking X.. With unbound you will know all the authoritative ns for a specific domain, etc.
And yes if the domain is using dnssec, it would be validated - if fails then it wouldn't give those results to the client that asked for them. All the roots are signed.