DNS server recursion vs. DNS server override
I'm trying to formulate a preferred way of pointing LAN users to the correct DNS server. Normally, your LAN hosts will not care, and just use the DNS server supplied by DHCP. pfSense supplies itself als DNS server, and all is well. But what if you run a Kerberos domain? Or FreeIPA with more network services? Or (yuck) Active Directory? You could change the DHCP configuration to have clients fire their DNS requests to the IPA or AD server(s), or maybe tell pfSense to direct it's DNS lookups to those servers and leave DHCP alone. Or maybe only use a domain override to point DNS lookups from pfSense to the current IPA/AD server and have everything else go to the outside or local cache directly.
On top of that, the DNS zone might exist outside the LAN as well, i.e. because you registered it so it couldn't be abused by some other party, but you didn't add any records there. The ISP, however, did add NS records so it has nameservers and thus might mess up your settings. So DNS lookups for the internal domain should never go outside.
This gets me to a list of requirements:
- DNS for the internal domain should be resolved internally
- DNS for the internal domain should never go 'outside'
- DNS for the internal domain should always be resolved by the same set of authoritative servers
There are a few things that would be nice to have:
- When the internal domain server for the internal domain fails, it should not take down DNS lookups completely, only for the internal domain
- DNS should be able to fail over across replicated servers
To me, this means that we can't really use the simple 'domain override' in Unbound in pfSense as that only allows for one server. Should that server go down, all lookups for that domain will fail. Using the internal servers before using external servers in the general DNS lookup settings for pfSense would mean that all requests hit the internal DNS servers first, which would be bad if they were down, as DNS would become rather slow since it would first have to fail on all internal servers before getting to the external.
For the third option, we would add the internal DNS servers to the DHCP settings, and have the internal servers use pfSense as a lookup for anything they are not authoritative on.
I wonder if anyone could offer additional perspective on this.
From your post, the trouble here appears to be complexity.
You want to attain response integrity despite having numerous servers and naturally seeking redundancy where it affects resolution/network availability.
With these requirements and multiple constraints, you really need to implement controls that enforce the desired config. Once your external domains are correctly configured (you must check and make sure things like ISP zones are not poorly configured).
So you need a base design that uses the various hosts, proxies, resolvers, forwarders as you feel necessary, and also implement controls that prevent subsequent changes breaking the way resolution happens.
Perhaps tagging packets from each hos/interface and then setting a set of catch-all rules on each interface that prevent things like egress via WAN, to enforce security. Then for each group of hosts on each interface, work through what you need to do to ensure they can resolve reliably to the (usually pfS) resolver or forwarder, and/or remote hosts.
Sorry to be so general, but you do mention a lot of moving pieces. Once you have an architecture for your networks, dns hosts, other nodes and related servers, I'd suggest you consider designing a suitable set of firewall and NAT rules for client DNS resolution on each interface to ensure queries can only go the way you want them to, also, for each group, how redundant resolution will work. Give careful thought to the config of the listeners; unbound and forwarders have a lot of options, which is good, but will also impact the rule-set, and consume time if you try exploring/tweaking once a busy network is in place. A carefully thought out plan that includes controls, required features, related config and all the things you want to avoid will save a lot of pain later on.
Most important is perhaps a way to test each host group that can be refined as you implement and which you can keep handy for troubleshooting user queries that arise later, or adding hosts/changing configuration. Then implement interface by interface, testing each client group.
If you have gotten to the point that your worried about dns failure for the internal domains.. You have pretty much moved beyond a caching forwarder/resolver setup. Neither dnsmasq or unbound is designed to be authoritative.. So no you would not use them if you require actual internal dns vs a handful of records to resolve, etc.
Your mention of AD.. be it you like it or not.. If you are a MS shop running AD.. Then your DNS is already covered and designed to not fail for your AD.. Multiple servers in your AD would and could provide dns.. And they auto sync any changes to all ns in the setup. Clients only point to your internal AD for dns, these servers would then be setup to forward to some that can resolve external or resolve external themselves. Normally in such a setup they would forward to something else that has external access. In such a setup they could forward to pfsense that could do the external resolving via unbound. So your 3 requirements are met really out of the gate..
If your not running Microsoft.. Then normally you would use bind for your authoritative setup but there are alternative authoritative dns products out there.. Which sure if you wanted you could run on pfsense via the bind package.. But yes you would run multiples. Authoritative servers by design share info.. This is why there is a SOA and then secondaries.. You create/edit a record in SOA, it is then via zone transfer updated to any of your slaves, etc. So you could have 2, you could have 200 all depending on how big your network is.. Since your clients would only ever point to these internal servers that are authoritative for your internal domains. Again your requirements are met..
In such a setup where you have need of delegation of dns for other internal domains to other NS internal to the network.. Again that is a simple delegation you would do on your SOA and this can be automatically shared to all the other NS you run inside your network.
As long as your clients list more than 1 of these internal NS.. then you have no issues since all of these servers have the same records for your internal domains. If you worried about the server itself going down then you run that box running NS for your internal domains in a HA or CARP setup. You could setup pfsense as carp, running bind for your authoritative dns for your internal domains.
If you have money to spend and your wanting high end dns functionality… You could run say something like infoblox.. Its really just BIND at its heart with a lot of gui and code wrapped around it.. It does and can do more than just dns.. Ipam, your dhcp, network controls even, etc. etc. Love it when customers use this - since I like to manage it ;) But it can be a hard sell sometimes because its not cheap ;) And if the shop is MS they kind of already paid for their dns reliability and redundancy... It just has to be configured and managed correctly is all.
"To me, this means that we can't really use the simple 'domain override' in Unbound in pfSense as that only allows for one server."
Says who? You can have multiple entries for a domain override. All of these servers will be queried if one does not answer, etc. So here I put in a domain override for test.com… I then queried pfsense running unbound for a record in that test.com domain. Via sniffing on pfsense you can see that pfsense then attempts to ask these IPs listed for what I asked.. Until such time that timed out.. Because neither of them are actually running dns at all.. Just wanted to show that they would both be asked.. So sure you could point your clients at pfsense. You could run pfsense in a carp.. you could have multiple pfsense setup, etc. point your clients to either of them.. With your domain overrides setup to point to your internal dns for your internal domains.
"When the internal domain server for the internal domain fails, it should not take down DNS lookups completely, only for the internal domain"
"DNS should be able to fail over across replicated servers"
This really goes hand in and.. If you setup internal authoritative servers.. Then yes the data would be replicated and your clients could point to more than 1 of them if you do not want your internal dns to go down.. So if you have redundant internal, and these can all forward or resolve external then you kill both of those birds.
One thing to remember.. Clients should only ever point to NS that can resolve the same thing.. This is common problem with internal dns.. They point client to external server and an internal server.. This is failure waiting to happen.. Your isp, google, opendns, etc. not going to have clue 1 to your internal.. They will most likely send back NX.. Once client gets back NX.. they not going to go ask any other NS they have listed. They got told that doesn't exist.. Why should I ask some other NS if he has a record for it, etc. While they might list NS as 1 and 2, 3 etc.. Once you point a client to more than 1 NS you can never be sure which one it uses or latches on too.. So pointing to multiple NS that can not resolve the same thing is broken config. If you have internal dns then point your clients to your internal dns.. If you have no need to resolve internal stuff. Then point them to multiple ns that can resolve external..
So you can point clients to multiple public that is fine.. so that if one is down or can not be reached they try another and another, etc. But do not point a client to external and internal at the same time.. This is going to cause you grief..