Load Balancing DNS with relayd
-
Hi @johnpoz -
Thanks for responding. I should have been a bit more specific with my use case: The two DNS servers are really just two Pi-hole DNS Servers (or DNS filters). I had been using one and decided to setup a second for redundancy and to make sure there is no down time during upgrades, etc. I thought that balancing the DNS query load across both Pi-holes might be nice, but it's not a primary requirement (which is availability). So in that sense, yes, I was doing this just to see if I could make it work. :)
I think what @NogBadTheBad suggested might be an easy way to balance some of the traffic DNS implicitly while maintaining the primary goal of redundancy (i.e. if I take one of the Pi-holes down for maintenance, regardless of the order of the DNS severs on the subnets, all clients will just switch to using the other Pi-hole that's still up and working).
-
exactly order of dns means nothing really to a client.. For sure it will at some point switch over to the other, etc.You can never really be sure when there is more than 1 dns listed which one will be used..
Do if you list both your piholes, as time goes on your should end up with share across them.. And yeah if one goes offline, the other will be used.
Draw back to do what your doing is it cost you efficiency in the cache.. After your clients ask the pihole - where does it go.. Just to pfsense to be resolved? If so you don't cost yourself cache.. since they have common cache on pfsense that is on your local network. But if they are forwarding to say outside, then you just halved your cache, and now the efficiency of your cache is reduced..
As to being offline for maint.. pihole update like 30 seconds at most... If something goes wrong - just fire up another box on that same IP that can do dns for a bit, or adjust dhcp to point to different ns, etc.
Been running my pihole for year+ on actual pi - before ran it on a vm. And have never had any issues where it was offline for more than a few minutes.. Even when update failed, took only a few minutes to get it back up and running..
-
You raise a great point on cache efficiency and that is something that I did think about. In my case pfSense acts as the DNS resolver, and both Pi-holes point to pfSense as the upstream DNS, effectively sharing pfSense's DNS cache. Now granted what does suffer is the cache on the individual Pi-holes themselves. However, a call upstream to pfSense from a Pi-hole is usually less than 0.5ms, so practically speaking essentially no performance hit. Related to that, I never found the cache on the Pi-holes that effective anyway since most websites these days have such short TTL's (and records would just expire out of the cache requiring a call upstream anyway). What has helped more was enabling "Serve cache records even with TTL of 0" in the pfSense DNS Resolver settings - this has kept lookup's quite fast on my network even when there aren't a ton of clients keeping the cache hot at all times.
-
So you are just pointing back to pfsense, and yeah you overcome the split of the cache.. But now your back to a single point of failure and the point of 2 piholes is moot now.
Unbound shut down on pfsense and dns is dead ;)
I concur use of serve 0 will for sure speed up dns response, you prob also want to turn on prefetch.. And to be honest I hate the min ttl nonsense - freaking 60 seconds is just freaking moronic!!!!
I set min TTL to 3600 seconds.. Those that set 60 seconds is just asshat moronic.. Has not seen an issue overriding that nonsense and using 3600 seconds. Personally I think its AWS just wanting to up their query numbers.. I mean lets get real 60 seconds - WTF! ;)
-
@johnpoz - Fair enough :). I suppose technically I would then also have to setup a second pfSense box and run them both in HA configuration -- but let's say if I do that, is the DNS cache actually shared (or sync'd) between the two?
I always thought short TTL's were used nowadays to help with load balancing, etc. since so many sites use CDN's and the like....otherwise I"m not really sure why it would need to be done.
-
You don't need a ttl of 60 to load share.. They do it because they like lots and lots of queries because you get charged per query.. Set to 5 or 10 minutes.. 30 or so.. Come on 60 freaking seconds.. Lets get real..
I believe that is what they default too.. And people using them never update... The only reason you might ever get down to be a 60 second ttl is when your about ready to flip to another NS.. And you should really work that down from whatever your standard is, as you get closer to the switch over date and time, and then as soon as you flip over you would ramp it back up..
Another issue with current dns is that iot devices are not set to do any local caching - so every freaking time they want to go somewhere like every few minutes they have to query for it.. And if where they go has a 60 second ttl, its just nuts...
No the dns cache would not be shared via ha pair - I don't think so.. doesn't make a lot of sense to be able to do that. Your not active active, your active/standby, etc..
-
This post is deleted! -
This post is deleted! -
This post is deleted! -
This post is deleted! -
This post is deleted! -