If you have gotten to the point that your worried about dns failure for the internal domains.. You have pretty much moved beyond a caching forwarder/resolver setup. Neither dnsmasq or unbound is designed to be authoritative.. So no you would not use them if you require actual internal dns vs a handful of records to resolve, etc.
Your mention of AD.. be it you like it or not.. If you are a MS shop running AD.. Then your DNS is already covered and designed to not fail for your AD.. Multiple servers in your AD would and could provide dns.. And they auto sync any changes to all ns in the setup. Clients only point to your internal AD for dns, these servers would then be setup to forward to some that can resolve external or resolve external themselves. Normally in such a setup they would forward to something else that has external access. In such a setup they could forward to pfsense that could do the external resolving via unbound. So your 3 requirements are met really out of the gate..
If your not running Microsoft.. Then normally you would use bind for your authoritative setup but there are alternative authoritative dns products out there.. Which sure if you wanted you could run on pfsense via the bind package.. But yes you would run multiples. Authoritative servers by design share info.. This is why there is a SOA and then secondaries.. You create/edit a record in SOA, it is then via zone transfer updated to any of your slaves, etc. So you could have 2, you could have 200 all depending on how big your network is.. Since your clients would only ever point to these internal servers that are authoritative for your internal domains. Again your requirements are met..
In such a setup where you have need of delegation of dns for other internal domains to other NS internal to the network.. Again that is a simple delegation you would do on your SOA and this can be automatically shared to all the other NS you run inside your network.
As long as your clients list more than 1 of these internal NS.. then you have no issues since all of these servers have the same records for your internal domains. If you worried about the server itself going down then you run that box running NS for your internal domains in a HA or CARP setup. You could setup pfsense as carp, running bind for your authoritative dns for your internal domains.
If you have money to spend and your wanting high end dns functionality… You could run say something like infoblox.. Its really just BIND at its heart with a lot of gui and code wrapped around it.. It does and can do more than just dns.. Ipam, your dhcp, network controls even, etc. etc. Love it when customers use this - since I like to manage it ;) But it can be a hard sell sometimes because its not cheap ;) And if the shop is MS they kind of already paid for their dns reliability and redundancy... It just has to be configured and managed correctly is all.
"To me, this means that we can't really use the simple 'domain override' in Unbound in pfSense as that only allows for one server."
Says who? You can have multiple entries for a domain override. All of these servers will be queried if one does not answer, etc. So here I put in a domain override for test.com… I then queried pfsense running unbound for a record in that test.com domain. Via sniffing on pfsense you can see that pfsense then attempts to ask these IPs listed for what I asked.. Until such time that timed out.. Because neither of them are actually running dns at all.. Just wanted to show that they would both be asked.. So sure you could point your clients at pfsense. You could run pfsense in a carp.. you could have multiple pfsense setup, etc. point your clients to either of them.. With your domain overrides setup to point to your internal dns for your internal domains.
"When the internal domain server for the internal domain fails, it should not take down DNS lookups completely, only for the internal domain"
"DNS should be able to fail over across replicated servers"
This really goes hand in and.. If you setup internal authoritative servers.. Then yes the data would be replicated and your clients could point to more than 1 of them if you do not want your internal dns to go down.. So if you have redundant internal, and these can all forward or resolve external then you kill both of those birds.
One thing to remember.. Clients should only ever point to NS that can resolve the same thing.. This is common problem with internal dns.. They point client to external server and an internal server.. This is failure waiting to happen.. Your isp, google, opendns, etc. not going to have clue 1 to your internal.. They will most likely send back NX.. Once client gets back NX.. they not going to go ask any other NS they have listed. They got told that doesn't exist.. Why should I ask some other NS if he has a record for it, etc. While they might list NS as 1 and 2, 3 etc.. Once you point a client to more than 1 NS you can never be sure which one it uses or latches on too.. So pointing to multiple NS that can not resolve the same thing is broken config. If you have internal dns then point your clients to your internal dns.. If you have no need to resolve internal stuff. Then point them to multiple ns that can resolve external..
So you can point clients to multiple public that is fine.. so that if one is down or can not be reached they try another and another, etc. But do not point a client to external and internal at the same time.. This is going to cause you grief..
overrides.png
overrides.png_thumb