DNS validation firewall ruleset - $10000

jtodd

$5000 for development of this firewall ruleset. 50% payment upon software completion, 50% payment upon acceptance into pfSense community distribution.

Only-Valid-DNS Firewall Rule

My non-profit uses (disclaimer: produce as well) a DNS system that has a built-in malware host list, built on domain/hostnames. We see IoT devices avoiding this blocking system by simply hard-coding IP addresses into themselves, thus performing no DNS lookups. Many of these hosts have legitimate reasons to be on the Internet, but we also want to be able to dynamically block them from reaching places we’ve determined are “bad” or at least “unauthorized”. A DNS-based filter seems to be the only way to do it. So I’m proposing this filter for pfSense as a first place to start.

Function: This is a meta-rule that is used to deny or log packets to destinations which have not been validly looked up in the DNS with a resulting AAAA or A record returned. This does not overrule any other filters - it would be the “first” filter in any rule list applied to an interface, and subsequent rules would be applied in normal fashion. It is very similar to “pfBlockerNG” in how it is parsed on interfaces.

- START OF CONFIG OPTIONS

Enable/Disable packet filter. Turn the packet filtering system on and off. Turning the system off does not flush the DNS cache, and even when the blocking is disabled any new DNS requests are ingested and the DNS table is updated.
Enable/Disable DNS rule mapping: Turn the DNS monitoring and table creation system on and off. This monitors DNS requests and builds a table of permitted destinations based on DNS lookups of AAAA and A records. It may be useful upon first activation to turn this on for several hours and leave the packet blocking system disabled until a suitable DNS cache has been built up so as not to interrupt users.
DNS mapping modes:
Strict: Every host has a map of every DNS lookup they’ve performed, and only connections to IP addresses (AAAA, A) that have been looked up by that host are allowed through the firewall rules. This is tricky - this creates, in essence, hundreds of “permit” lines per host, and if there are hundreds of hosts, then this creates a very large table that each packet must pass through.
Loose: A map is created of every DNS lookup that has been performed by any host on the interface, and only connections to IP addresses (AAAA, A) that have been looked up by any host are allowed through the firewall. This conserves memory, but is less rigorously secure. It creates a “permit” line for each valid DNS lookup, but not for each host. Still may be thousands of lines deep if there are lots of browsing users in the pool being examined.

Log events: Yes/No. Send trigger events to the logfile? Default: Yes.

Reject/Block/Pass: How should packets be treated that are non-conforming? “Pass” can be used for examining what would happen if the rules were enabled (via log examination) as a method to determine impact on the network without actually causing operational faults during testing. Default value: Reject.

No block list: Allow a list (or alias) of IP addresses to be accessed, regardless of DNS status. Default value: Empty.

TTL global override: DNS TTL is used to cache entries, and after the TTL expires the “permit” rule is removed (but any existing states are left open as normal if there is traffic flowing.) If TTL global override is set to a number, this means that all “permit” entries are kept for this many seconds regardless of DNS TTL. Default is 0, meaning not to override.

TTL Minimum override: Some DNS providers hand back preposterously short DNS TTLs, which would cause a lot of churn in the table. If the TTL is lower than this number, then override it. Default value: 240

TTL Minimum override value: How many seconds should the TTL be, locally, for those short TTLS? Default value: 3600

Persistence: (number) Write DNS data to disk every X seconds. Reboot or reload of the DNS filter would mean that previously valid connections might become invalid, and the internal hosts attempting to reconnect will often not try to perform another DNS lookup if the connection fails. This means restarting of the firewall or this daemon would cause complete failures for internal hosts until additional DNS lookups re-opened the mapping capability. clearly, this is not a good situation. Therefore, it is possible to write the existing DNS “cache” (the valid DNS mapping table from this ruleset) to permanent storage. This file is loaded every time the app is started or restarted to pre-populate the valid list of permission rules. Suggested value is 60, but may require larger values based on a larger number of hosts or activity. Default value of “0” means never write.

Persistance path: [filename] Default: /var/DNSmapping/DNSmapping-cache

- END OF CONFIG OPTIONS

Backstory:

WHY: In our experience, hosts that attempt connections to IP addresses without first looking up the A or AAAA record in DNS are very frequently malicious in nature. Blocking outbound connections from hosts that are not DNS-compliant has shown to be an effective way to mitigate certain risks from botnets on both human-facing (laptop, desktop) systems but especially on IoT platforms which are running unexpected and un-auditable code.

In addition, we utilize a DNS resolver that has a malware list (aka: “RPZ”) based on DNS which is not effective if hosts can simply bypass the DNS entirely and reach out to distant systems without performing a DNS check where the domains can be centrally examined. Therefore, the concept of limiting outbound connections (TCP/UDP/ICMP/etc.) to destinations that have made it through our DNS system seems to be a good idea. This also allows us to force users to utilize the RPZ-enabled DNS system due to the fact that all of their outbound connections will fail unless they are using it.

DOWNSIDES:

1) If the DNS cache is lost (restart/reboot, software error, manual flush) then connections to previously valid IP addresses would be forbidden, even though the connection was legitimate prior to the flush event. Clients may or may not be able to see why their connections would stop working, as a cache deletion event is distant and invisible from their perspective. The “persistence” disk cache capability may mitigate that at the expense of writing the valid lists to disk to be read upon restart, but many systems running pfSense do not have disks that can handle lots of writes (flash drives.) There does not seem to be a middle ground on this.

This is not foolproof security. A security threat may exist on a host that has valid DNS. The benefit of this method is enhanced by using an upstream DNS resolver that has a malware list (aka: “RPZ”.) However, significant mitigation can be obtained by blocking requests that are made to IP addresses without DNS lookups, as such lookups are often malicious in nature. Your local policy for developers and applications may need to reflect use of DNS as a mandatory feature otherwise this tool may cause headaches for systems that use hard-coded IP addresses. (But you’re not doing that, right?) All security is incremental, so the argument that his is not 100% effective due to edge cases or particularly clever hackers is not a valid reason not to implement.
This requires having each client under inspection pointing ONLY at the pfSense instance. No other resolvers could be used, as their responses would not be collected into the state table. This could be considered a feature in limited circumstances as it enforces security policy, though it does remove an RFC-suggested method of having multiple recursive resolvers on each client to manage failure conditions on the recursive systems. Local administrators will have to evaluate the risks and return of this issue.
Failover or stateful exchange between two pfSense instances would be difficult, though not impossible. Mirroring all DNS requests from one system to another would help keep state (not perfect) or coming up with some other state exchange method would be required but is not considered in this first implementation.

jtodd

Update: Make it $10,000.

Other things I seem to have missed in the original definition:

There needs to be a method to identify clients which are allowed to connect outbound even if there has been no valid A/AAAA lookup. In other words: trusted client definition.
The "No Block List" description needs to be made more specific in that it applies to destination IP addresses/networks which can be accessed without a valid DNS lookup.

brandur

I suggest you update the subject line with the new amount you are offering.
Hopefully someone will take up the "challenge" ;)

heper

have you tried contacting netgate directly ?

jtodd

Yes, they said they didn't have the time and the budget was too small. Fair enough. I think this might be a case of "the valley of death" where OSS projects can't actually provide a complex feature without a champion who wants to do it for free, but the for-profit sponsor of the project can't take on the bounty work either because of conflicting goals in trying to stay alive (i.e.: would this OSS addition in some way generate recurring revenue or a continuing customer? No.)

JT

conor

Interesting idea, but is this not something that would be better handled in Squid or a proxy rather than within the pf Firewall ruleset.

From a security point of view nothing should be allowed out of your network unless you permit that destination IP. Redirecting all DNS port 53 traffic to your resolver and redirecting all web traffic to a proxy would probably be a better point to intercept.

For example a infected IOT device will attempt SSH brute force attacks against legitimate servers with A records in DNS, your ruleset will permit this. Same with spam they will retry to relay of known good accounts and smtp relay servers again both will have valid dns entries.

Don't want to knock security ideas as my god the world needs to get its act together, just not sure you're spending that 10k wisely.

jtodd

@conor:

Interesting idea, but is this not something that would be better handled in Squid or a proxy rather than within the pf Firewall ruleset.

From a security point of view nothing should be allowed out of your network unless you permit that destination IP. Redirecting all DNS port 53 traffic to your resolver and redirecting all web traffic to a proxy would probably be a better point to intercept.

For example a infected IOT device will attempt SSH brute force attacks against legitimate servers with A records in DNS, your ruleset will permit this. Same with spam they will retry to relay of known good accounts and smtp relay servers again both will have valid dns entries.

Don't want to knock security ideas as my god the world needs to get its act together, just not sure you're spending that 10k wisely.

You're perhaps interpreting this idea a bit more widely that it is intended. It cannot solve all security problems; just a specific subset. And it makes DNS-based blacklists actually useful, instead of just prophylactic, which is the underlying assumption of the feature request. Building exception lists for entire organizations is difficult (or impossible, if you've tried for a technical team of more than 100) and blocking outbound ports is a whack-a-mole game that can't be won. The shortcomings of both the administrative overhead and limited protocol set of squid can be avoided entirely with this method, and lets users have a "more" complete end-to-end Internet, with a single protocol gating network access (DNS.)

I think the firewall (pf) is the perfect and only place to create this gate, because that's the work it does already. It just needs a more active method to implement/remove filters. Why implement gates both in squid AND in pf? Seems redundant.

pf3000

I could think of a way in linux because dnsmasq could dump all IPs to ipset. It was probably made for live updating the firewall.

Journer

A thought on downsides 1 & 4… could this not be mitigated by performing a reverse lookup (and caching it) on all destination IPs? A cache miss due to reboot or sync issue would simply add more processing time, instead of causing the connection to be blocked.

Anyhow, I like this idea as I too have had cases of random apps connecting directly to obscure IPs...

Sadly I'm ill equipped to really help. Though it sounds like this would be something that would have to at least be partially built into pf itself. Is it possible to execute arbitrary code as part of rule execution?

jtodd

@Journer:

A thought on downsides 1 & 4… could this not be mitigated by performing a reverse lookup (and caching it) on all destination IPs? A cache miss due to reboot or sync issue would simply add more processing time, instead of causing the connection to be blocked.

Anyhow, I like this idea as I too have had cases of random apps connecting directly to obscure IPs...

Sadly I'm ill equipped to really help. Though it sounds like this would be something that would have to at least be partially built into pf itself. Is it possible to execute arbitrary code as part of rule execution?

I'm not clear on what you mean by a reverse lookup. Do you mean in in-addr/ip6.arpa? Those are almost always wrong, and are also overloaded, since it is only possible to have one in-addr, but there may be hundreds (or thousands, or tens of thousands) of A/AAAA records pointed at single IP address. Can you fill this in a bit more since I don't quite follow the logic.

There is possibly someone who is taking the task; more news in this thread if demonstrable progress is made.

JT

loonylion

I'm already working on this, I'm about 70% done. As for the rdns question, there's only two ways of doing that: in-addr.arpa or PTR records, the latter of which are only set for systems that send mail. Also in-addr.arpa responses are not wrong, they return the hostname of the machine with that IP, or NXDOMAIN if it doesn't have one set.

loonylion

The solution is almost complete and I'm currently testing it.

I would just like to clarify the purpose/intent behind the TTL override options. Am I correct in thinking these overrides would apply ONLY to the blocking system, and not unbound also?

loonylion

Bowing out of because the work has apparently already been committed to someone else.

seanmcb

Quite a large bounty! What ever happened with this…?

DonaldT2

This post is deleted!