DNS Servers being blocked

johnpoz

Your caching NS should never cache anything longer than the TTL.. Be it MS, Bind, Unbound, Dnsmasq, etc. etc.. Atleast it shouldn't be.

Do you have the AD NS resolve, or do you have it forward.. If you had it forward then you could have it forward to pfsense and pfsense would resolve. Then you could also leverage dnssec, dnssec on a forwarder is pretty freaking pointless. Where dnssec matters is on the resolver.

So have your AD forward to unbound on pfsense, and pretty much out of the box your golden. I am NOT a fan of these services like quad and opendns... While they do provide a service for billy bob the user, no thanks I can do that myself.. I don't need to send you every query I make so you can filter it for me ;) I can filter out what I want to filter on it on my own so that clients don't get an answer to somebadsite.org

Pfblocker on pfsense can make it easy for you to do as well... Or something as simple as pihole can make it no brainer for local filtering vs sending every single query to some company that says sure we are good, and don't do anything with your info, etc. etc..

justme2

@Stewart

Sorry, I was generalizing various companies into a single response. Normally if I'm using OpenDNS I will use 208.67.222.123 and 208.67.220.123 as both should filter the same. I normally used OpenDNS filtering primarily but some customers I'll use QuadDNS instead. Some providers have devices, like VoIP phones, that they require to use Google DNS 8.8.8.8, and 8.8.4.4.

A consumer 'forcing' a DNS server is simply bad. Likely a result of some experience and then a blanket decision. Many orgs implement a firewall policy that only allows "approved DNS Servers" to make outbound queries through the firewall to TCP/UDP 53, to ensure non-evasion of policy. The next hurdle will likely by DoH and requiring SSL MitM to inspect for (application/dns-message) and drop that traffic. Originally intended for privacy, but inherently enables malware/CNC to bypass conventional protections.

If there is an AD then all clients point to it and it has forwarders that go out to OpenDNS. If there isn't a DC then I'll generally just use OpenDNS as forwarders in the router. I almost never use the actual ISP DNS servers if I can help it. Sometimes when there is an issue, though, an ISP will require it to test with.

DNS query and response logging is invaluable. Conversely, note that the overhead of having logging enabled is that the performance reduction is north of 33%. (Common reduction is to about 55-65% of original ceiling).

I've often wondered about DNS sequential vs serial requests. They may be queried sequentially if a response doesn't come in time but not wait for a timeout from the first source before querying the second. If you put a secondary DNS server into a PC in addition to the Domain Controller they will fail randomly but it usually does work, sometimes for years. We often find that to be the case when we pick up a new client (we are an MSP) where the previous companies or in house tech put in 8.8.8.8 as a secondary DNS server and they never really knew when some PCs would start and stop connecting to the domain. Take out the Google DNS and the issues resolve but I've never determined a rhyme or reason to it.

The issue described sounds like the consumer was unable to reach the primary upstream DNS server and then started using 'alternate options'. Usually an artifact of packet drops or simply that the DNS Service on the primary DNS Server is overloaded. MS has historically lacked any viable telemetry to understand DNS/DHCP service health.

The requests (from the consumer) are dispatched as needed and response as obtained (you'll see the transaction ID in a packet capture). However, for a given request, if response doesn't happen in the appropriate duration, then the next DNS Server is tried for that request. DNS itself is parallel, but error condition per lookup (transaction) is serial.

MS DNS is actually based on old BIND 4.9.x code (IIRC) vs. modern BIND which continually monitors all configured forwarders - fixating on the one that demonstrates the best (lowest latency of response) over time. Thus, order of forwarders is immaterial for modern BIND.

By resolving, is the reference to having something like pfSense essentially be a DNS caching server? It would still need forwards and would cache the lookup. I believe MS DNS servers also cache the lookups but I've never looked into seeing just how long it holds the cache. Would the best policy for pfSense, then, to be a resolver instead of a forwarder? Seems like it would be. I haven't really paid much attention to it. DCs would use it as the forwarder and it would be the resolver and cache the responses.

Certainly a viable option - load/volume not withstanding. UnBound is excellent for recursion (non-forwarding) with DNSSEC validation. BIND on the other hand is the workhorse of DDNS, Views, RPZ/feeds and complexity of namespace.

@johnpoz
You are correct. However, some UNIX OSs (seem to recall Solaris favoring it, at least for a while and various Linux flavors appear to still use it), nscd. nscd has the nasty default behavior of caching DNS purely based on either a "default" or local value. In environments where it's present and disabling it is not considered an option, you have to simply make the arbitrary caching value no longer than the shortest record TTL that you may encounter (usually "shortest internal value that you encounter"). FYI - never use "0" - its undefined which means that a given vendor's interpretation can be infinite or don't cache <sigh>. So, the shortest TTL on a record should be 1 ("1 second") to ensure 'correctness' of behavior across any/all DNS client/server variants. You can't utilize DNSSEC validation on a DNS Server where forwarding is in play (unless you use a negative trust anchor for the forwarding namespace - a host of other issues). The forwarding inherently 'breaks' the chain, hence the reason that you only apply DNSSEC validation on a non-forwarding [recursive] resolver.

johnpoz

@justme2 said in DNS Servers being blocked:

FYI - never use "0" - its undefined which means that a given vendor's interpretation can be infinite or don't cache <sigh>

This is a good point.. I will have to look into what unbound does when set to answer 0
"Serve cache records even with TTL of 0 When enabled, allows unbound to serve one query even with a TTL of 0, if TTL is 0 then new record will be requested in the background when the cache is served to ensure cache is updated without latency on service of the DNS request."

I would of figured it hands the client a 0 ttl, but agreed it should prob hand out a 1.. This could be problematic depending on your client base I guess if handing out 0.. I would have to sniff to see exactly what gets handed to the client in the case when that option set.

Keep in mind I am not suggesting he set that as ttl or use it in anyway - talking about the unbound option of being able to answer when the cache has actually expired..

justme2

@johnpoz

Absolutely. Short TTLs (<5 minutes) aren't generally useful, except for GSLB activities. However, long TTLs (>60 minutes) are generally a liability [internally] in modern enterprise environments.

The problem with 0 is that you have interpretation by each device in the chain of resolution. Therefore, using a TTL >=1 and <= 86400 (1 day, which is still on the long side) - you have assurance of behavior if caching is supported. Otherwise, the consumer will query each and every time (most vanilla UNIX-based installs).

Stewart

So, I've started looking into and implementing on what's being discussed here and I've learned that if you run on DNSSEC and used OpenDNS as the forwarder for the Resolver that it fails. OpenDNS doesn't use DNSSEC. It apparently uses DNSCrypt. It looks like the benefit is that traffic is encrypted but how would that be any stronger than using DNSSEC with selecting to use SSL/TLS over port 853?

I'm also struggling with what is being recommended here. Are you saying that rather than using external DNS servers in the DHCP options I should use a local Resolver like pfSense? Simple enough. But that resolver needs to have forwarders to know what data to cache, correct. Which servers are you saying to use if not someone like OpenDNS, QuadDNS, or Google? That wouldn't preclude the use of Forwarders, only limit the amount of traffic to them to once per TTL, no?

@NogBadTheBad
I run Suricata blocking on the WAN and logging on the LAN. That way if something blocks and is triggered on the WAN, I can see the corresponding LAN entry that may have generated the bad request. Very useful if it's an infection or something like that.

@johnpoz
That .top rule is 2023883. Rule is:
alert dns $HOME_NET any -> any any (msg:"ET DNS Query to a *.top domain - Likely Hostile"; dns_query; content:".top"; nocase; isdataat:!1,relative; threshold:type limit, track by_src, count 1, seconds 30; reference:url,www.symantec.com/connect/blogs/shady-tld-research-gdn-and-our-2016-wrap; reference:url,www.spamhaus.org/statistics/tlds/; classtype:bad-unknown; sid:2023883; rev:2; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, deployment Perimeter, signature_severity Major, created_at 2017_02_07, updated_at 2017_02_07;)

johnpoz

@stewart said in DNS Servers being blocked:

But that resolver needs to have forwarders to know what data to cache, correct.

NO... Unbound out of the box on pfsense is RESOLVER... You should prob look up the difference between a resolver and a forwarder as first thing in your journey to understanding DNS ;)

BTW.. Per the rfc on answering stale.. it will give a TTL of 1 for stale records if they have not been resolved by time the client is given an answer..

" If so, it adds that data to the response message and SHOULD set the TTL of each expired record in the message to 1 second"
https://tools.ietf.org/id/draft-tale-dnsop-serve-stale-01.html

justme2

@stewart said in DNS Servers being blocked:

So, I've started looking into and implementing on what's being discussed here and I've learned that if you run on DNSSEC and used OpenDNS as the forwarder for the Resolver that it fails. OpenDNS doesn't use DNSSEC. It apparently uses DNSCrypt. It looks like the benefit is that traffic is encrypted but how would that be any stronger than using DNSSEC with selecting to use SSL/TLS over port 853?

Without getting too far into the weeds: don't look at DNSSEC like it's a "different" critter. It's still the same protocol, with some additional lookups and the "crypto" is simply public key methods to determine authenticity - not 'encrypt' traffic. It still uses port 53. Hence the reason that DNSSEC doesn't change anything about your existing DNS and flows, per se. Other incantations, like DoH (DNS over HTTPS) is meant to encrypt the traffic and it can easily bypass controls - by intention. Also means that CNC could then bypass protections.

You cannot perform DNSSEC validation and forward - that will, fail. DNSSEC validation is (as previously mentioned), the validation of authenticity of namespace and validation of authenticity of infrastructure. It does not 'encrypt' the traffic.

I'm also struggling with what is being recommended here. Are you saying that rather than using external DNS servers in the DHCP options I should use a local Resolver like pfSense? Simple enough. But that resolver needs to have forwarders to know what data to cache, correct. Which servers are you saying to use if not someone like OpenDNS, QuadDNS, or Google? That wouldn't preclude the use of Forwarders, only limit the amount of traffic to them to once per TTL, no?

External resolvers (outside your control) should always be a last resort option. Internal (organizational control/policy) DNS servers are always the preference. You want a lookup to appear as:

internal consumer => internal DNS servers => Internet Root => <follow NS chain until resolution>

Caching is driven by TTL (on the response). Thus, a recursive DNS Server doesn't "forward". It starts with the Root DNS Servers (".") and then follows the namespace of the query until resolution. Each 'hop' will be cached for the duration of TTL.
Eg: "Query 1" = www.google.com

@NogBadTheBad
I run Suricata blocking on the WAN and logging on the LAN. That way if something blocks and is triggered on the WAN, I can see the corresponding LAN entry that may have generated the bad request. Very useful if it's an infection or something like that.

@johnpoz
That .top rule is 2023883. Rule is:
alert dns $HOME_NET any -> any any (msg:"ET DNS Query to a *.top domain - Likely Hostile"; dns_query; content:".top"; nocase; isdataat:!1,relative; threshold:type limit, track by_src, count 1, seconds 30; reference:url,www.symantec.com/connect/blogs/shady-tld-research-gdn-and-our-2016-wrap; reference:url,www.spamhaus.org/statistics/tlds/; classtype:bad-unknown; sid:2023883; rev:2; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, deployment Perimeter, signature_severity Major, created_at 2017_02_07, updated_at 2017_02_07;)

justme2

@justme2 said in DNS Servers being blocked:

@stewart said in DNS Servers being blocked:

So, I've started looking into and implementing on what's being discussed here and I've learned that if you run on DNSSEC and used OpenDNS as the forwarder for the Resolver that it fails. OpenDNS doesn't use DNSSEC. It apparently uses DNSCrypt. It looks like the benefit is that traffic is encrypted but how would that be any stronger than using DNSSEC with selecting to use SSL/TLS over port 853?

Without getting too far into the weeds: don't look at DNSSEC like it's a "different" critter. It's still the same protocol, with some additional lookups and the "crypto" is simply public key methods to determine authenticity - not 'encrypt' traffic. It still uses port 53. Hence the reason that DNSSEC doesn't change anything about your existing DNS and flows, per se. Other incantations, like DoH (DNS over HTTPS) is meant to encrypt the traffic and it can easily bypass controls - by intention. Also means that CNC could then bypass protections.

You cannot perform DNSSEC validation and forward - that will, fail. DNSSEC validation is (as previously mentioned), the validation of authenticity of namespace and validation of authenticity of infrastructure. It does not 'encrypt' the traffic.

I'm also struggling with what is being recommended here. Are you saying that rather than using external DNS servers in the DHCP options I should use a local Resolver like pfSense? Simple enough. But that resolver needs to have forwarders to know what data to cache, correct. Which servers are you saying to use if not someone like OpenDNS, QuadDNS, or Google? That wouldn't preclude the use of Forwarders, only limit the amount of traffic to them to once per TTL, no?

External resolvers (outside your control) should always be a last resort option. Internal (organizational control/policy) DNS servers are always the preference. You want a lookup to appear as:

internal consumer => internal DNS servers => Internet Root => <follow NS chain until resolution>

Caching is driven by TTL (on the response). Thus, a recursive DNS Server doesn't "forward". It starts with the Root DNS Servers (".") and then follows the namespace of the query until resolution. Each 'hop' will be cached for the duration of TTL.
Eg: "Query 1" = www.google.com

Once you've hit the roots and lookup of the com servers - you'll cache that for the duration of TTL. Any successive queries within the TLD of COM don't need to re-validate. You can go direct to the com servers.

@NogBadTheBad
I run Suricata blocking on the WAN and logging on the LAN. That way if something blocks and is triggered on the WAN, I can see the corresponding LAN entry that may have generated the bad request. Very useful if it's an infection or something like that.

@johnpoz
That .top rule is 2023883. Rule is:
alert dns $HOME_NET any -> any any (msg:"ET DNS Query to a *.top domain - Likely Hostile"; dns_query; content:".top"; nocase; isdataat:!1,relative; threshold:type limit, track by_src, count 1, seconds 30; reference:url,www.symantec.com/connect/blogs/shady-tld-research-gdn-and-our-2016-wrap; reference:url,www.spamhaus.org/statistics/tlds/; classtype:bad-unknown; sid:2023883; rev:2; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, deployment Perimeter, signature_severity Major, created_at 2017_02_07, updated_at 2017_02_07;)

That's indicative of a mistake IF it's a recursive DNS server. If it's not a recursive DNS Server (eg: by policy) - then the rule is correct. Likely the $DNS_SERVERS (sp?) variable?

justme2

@johnpoz said in DNS Servers being blocked:

@stewart said in DNS Servers being blocked:

But that resolver needs to have forwarders to know what data to cache, correct.

NO... Unbound out of the box on pfsense is RESOLVER... You should prob look up the difference between a resolver and a forwarder as first thing in your journey to understanding DNS ;)

As johnpoz points out, there is a distinct difference between "resolver" and "forwarder". A forwarder means that you need an answer and are -forwarding- it to another party, asking them to do all the work to get you the answer. A -resolver- does all the work to obtain the answer and doesn't rely on a specific 3rd party to obtain the answer. A forwarder in common terms, means that 3rd party (to whom your forward all your queries), now has a record of everything you asked for. Think of it from a logging stance. With the roots and TLD servers, they only see the "lower level(s)". A forwarder sees the entire FQDN requested - of every request.

BTW.. Per the rfc on answering stale.. it will give a TTL of 1 for stale records if they have not been resolved by time the client is given an answer..

" If so, it adds that data to the response message and SHOULD set the TTL of each expired record in the message to 1 second"
https://tools.ietf.org/id/draft-tale-dnsop-serve-stale-01.html

Only issue is that it's a draft and not an RFC, eg: no obligation to accept or follow. On the plus, someone took the time to document what -they- believe would be most optimal behavior. Although apparently this draft has dependency on Vixie's resimprove (evidently also in draft state).

Stewart

@justme2

I thought understood the difference between a resolver and a forwarder but resolvers need to forward on the request somewhere if it doesn't have the answer, right? So I would assume that resolvers are also forwarders in that regard but maybe that's just the wrong terminology. If a resolver doesn't have an answer, where does it look? Automatically to the root servers? If so then I've been wrong in assuming you need to provide upstream servers for them to check. When set to resolver I've always told it to use forwarders and DNSSEC. You can use OpenDNS as the forwarder but if you switch on DNSSEC it fails so using DNSSEC and forwarding appear to at least do something. If I do switch on forwarders in the resolver, does that essentially just turn it into a forwarder with a local copy of the records inside of the TTL window? It still forwards the requests on but only once per record as long as it's in the TTL?

Would the proper setup, then, be turning off forwarders and enabling DNSSEC? Is there anyway to integrate the features of something like OpenDNS and QuadDNS in that mode? Being able to stop DNS resolutions for malicious or restricted domains is helpful.

justme2

@stewart said in DNS Servers being blocked:

@justme2

I thought understood the difference between a resolver and a forwarder but resolvers need to forward on the request somewhere if it doesn't have the answer, right? So I would assume that resolvers are also forwarders in that regard but maybe that's just the wrong terminology. If a resolver doesn't have an answer, where does it look? Automatically to the root servers? If so then I've been wrong in assuming you need to provide upstream servers for them to check. When set to resolver I've always told it to use forwarders and DNSSEC. You can use OpenDNS as the forwarder but if you switch on DNSSEC it fails so using DNSSEC and forwarding appear to at least do something. If I do switch on forwarders in the resolver, does that essentially just turn it into a forwarder with a local copy of the records inside of the TTL window? It still forwards the requests on but only once per record as long as it's in the TTL?

A resolver doesn't "forward" to obtain answer. If a resolver doesn't have the answer ("in cache"), it will start with the internet Root Servers to obtain the answer. Once the authoritative nameserver(s) is determined, it is queried for the value requested and then response provided back to the consumer.

A forwarder, simply does: 1) If requested value is already in cache, provide answer from cache or 2) ask configured forwarder(s) to obtain answer on your behalf.

The behavior of most DNS Servers is that when they are forwarding - they will cache the results for the TTL duration received. Yes, they re-use the answer provided that the record has not reached TTL duration.

Would the proper setup, then, be turning off forwarders and enabling DNSSEC? Is there anyway to integrate the features of something like OpenDNS and QuadDNS in that mode? Being able to stop DNS resolutions for malicious or restricted domains is helpful.

Optimal is to turn off forwarders, enabling DNSSEC validation and add RPZ feed(s) as appropriate. Unfortunately, you can either forward or obtain the benefit of DNSSEC validation (requires resolver, not forwarder), as you cannot validate unless you are performing the resolution. DNS reputational Feeds are the means to protect against domains in the fashion to which you are referring.

Stewart

@justme2

OK. I get it now. So, besides enabling DNSBL in pfBlockerNG and setting the unit to be a resolver with DNSSEC on and forwarding off, is there anything else I should do? I see there is an Alexa whitelist to enable. I see that the EasyList is enabled but no feeds are in there. One guide I've found is here if I want to upgrade to the dev version or here if I want to use the general release. If those look right then I'll read up on them and figure out how to get my own feeds integrated in and stop relying on third party servers entirely.

johnpoz

@justme2 said in DNS Servers being blocked:

With the roots and TLD servers, they only see the "lower level(s)". A forwarder sees the entire FQDN requested - of every request.

Not actually true, unless you turn on Qname Minimization.. And for sure using strict is going to break a lot of domains.. MS has many of them that are broken with this is on... Tested that even before they added it to the gui to be able to enable.

But completely agree with you in a perfect world.. roots would only see queries for the tld NS, the tld ns for that domain would only see queries for the domain.tld, and then the authoritative ns for the domain would see the fqdn query host.domain.tld

In theory this seems ideal is what everyone would want, but in practice doesn't always work out that well.

Pfsense unbound config is pretty good out of the box.. I change it to not listen on all, and only the interfaces I want and only query outbound on interface(s) I need. Also change from transparent mode to static type. I also turn off the automatic ACLs and do my own.

But the config out of the box should really work for pretty much everyone.

Oh set to only register static dhcp as well.. And enable prefetch and serve 0 ttl option.

justme2

@johnpoz said in DNS Servers being blocked:

@justme2 said in DNS Servers being blocked:

With the roots and TLD servers, they only see the "lower level(s)". A forwarder sees the entire FQDN requested - of every request.

RE: minimization, correct. Should have more clearly (or perhaps more generically correctly) stated that the roots and TLD infrastructure may see the full FQDN, but no other 3rd party would have full visibility of all the queries generated by an organization. Generally, roots/TLD infrastructure are uninterested parties due to mandatory involvement.

Pfsense unbound config is pretty good out of the box.. I change it to not listen on all, and only the interfaces I want and only query outbound on interface(s) I need. Also change from transparent mode to static type. I also turn off the automatic ACLs and do my own.

Yes, when it comes to recursion performance - unbound is particularly good. When tuned on a dedicated box for load - it can be phenomenal for recursion.