Load Balancing DNS with relayd



  • Hi all,

    I'm trying to use relayd to setup a simple load balancer to load balance client DNS traffic between two internal DNS Servers. Testing setup is essentially as follows:

    Internal Subnets A, B, C: 192.168.10.0/24, 192.168.20.0/24, 192.168.30.0/24

    The two DNS Servers reside on internal subnet C with addresses 192.168.30.10 and 192.168.30.11

    I used the instructions here to setup the relayd load balancer pool and virtual server IP:
    https://docs.netgate.com/pfsense/en/latest/book/loadbalancing/server-load-balancing-pools.html

    For the virtual server IP I used an IP that is on the same subnet (i.e. subnet C) as the DNS servers, i.e. 192.168.30.5. Ports for both pool servers and virtual server are set 53. Appropriate firewall access rules have also been setup so that subnets A and B can access subnet C.

    When I choose Relay Protocol = DNS, unfortunately I essentially get the same results as outlined in this more recent thread:

    https://forum.netgate.com/topic/129903/dns-load-balancing

    The pool servers both show as down, and the for the virtual server IP I get "unknown status - relayd not running?" DNS queries over UDP or TCP to the virtual server IP (from subnets A or B) both fail.

    Now, if I switch Relay Protocol = TCP, both virtual server and pool servers show status as "Up". I'm also able to reach the DNS servers by sending DNS queries via TCP from subnets A or B (e.g. dig +tcp .....) to the virtual server IP. Queries over UDP do not work.

    While this is a step forward I really need this work with UDP DNS queries (since that is what is originating from the clients).

    Does anyone have any idea what else I could try to get this to work? For instance, for DNS to work as the relay protocol does the virtual server IP need to be the WAN IP or some other public IP? Or am I able to use internal IP's as well?

    Thanks in advance for your help and insight I really appreciate it.


  • Galactic Empire

    FYI
    https://forum.netgate.com/topic/140790/heads-up-relayd-deprecated-on-pfsense-2-5-0

    Why not just hand out 192.168.30.10 then 192.168.30.11 on one subnet, then 192.168.30.11 & 192.168.30.10 on the next, etc ...



  • @NogBadTheBad said in Load Balancing DNS with relayd:

    FYI
    https://forum.netgate.com/topic/140790/heads-up-relayd-deprecated-on-pfsense-2-5-0

    Why not just hand out 192.168.30.10 then 192.168.30.11 on one subnet, then 192.168.30.11 & 192.168.30.10 on the next, etc ...

    Thanks @NogBadTheBad - I appreciate the response.

    I like your suggestion about switching the order of the DNS servers per subnet to help load balance and it may be that this is the simplest and most straightforward solution. The only caveat is that the number of clients on each subnet is not the same (i.e. one subnet is bigger than the other). So it may not balance perfectly, but it is definitely better than what I have now.

    Regarding relayd likely being deprecated in pfsense 2.5.0 -- does HAProxy represent a viable alternative for UDP DNS load balancing? From what I've been reading, it doesn't appear to support UDP, but I could be wrong.

    Thanks again.


  • Galactic Empire

    @tman222 said in Load Balancing DNS with relayd:

    Regarding relayd likely being deprecated in pfsense 2.5.0 -- does HAProxy represent a viable alternative for UDP DNS load balancing? From what I've been reading, it doesn't appear to support UDP, but I could be wrong.

    Not a clue sorry, I only ever use load balancing with work ( Cisco ACE ), I just remember seeing @jimp post mentioning relayd will be going.


  • LAYER 8 Global Moderator

    So your just doing this to see if you can? Do you have such load that sharing it between your 2 ns is required?

    Handing out both IPs to all your clients should amount to a split of some clients askng 1 and others asking 2.. Also you could use pfsense as dns cache..

    So if client A asks for host.domain.tld, then client B asks for the same - it would just be served up from cache vs having to ask one of your NS every time..

    You can serve up a shitton of dns with very little resources - to require load sharing something seems wrong that would ever been needed unless your ns were some old pi 1's and you had 1,000 and 1,000s of clients asking a shitton of dns..

    I take it these ns are authoritative for your local domain(s)?? Why are you having your clients query them and not just pfsense?



  • Hi @johnpoz -

    Thanks for responding. I should have been a bit more specific with my use case: The two DNS servers are really just two Pi-hole DNS Servers (or DNS filters). I had been using one and decided to setup a second for redundancy and to make sure there is no down time during upgrades, etc. I thought that balancing the DNS query load across both Pi-holes might be nice, but it's not a primary requirement (which is availability). So in that sense, yes, I was doing this just to see if I could make it work. :)

    I think what @NogBadTheBad suggested might be an easy way to balance some of the traffic DNS implicitly while maintaining the primary goal of redundancy (i.e. if I take one of the Pi-holes down for maintenance, regardless of the order of the DNS severs on the subnets, all clients will just switch to using the other Pi-hole that's still up and working).


  • LAYER 8 Global Moderator

    exactly order of dns means nothing really to a client.. For sure it will at some point switch over to the other, etc.You can never really be sure when there is more than 1 dns listed which one will be used..

    Do if you list both your piholes, as time goes on your should end up with share across them.. And yeah if one goes offline, the other will be used.

    Draw back to do what your doing is it cost you efficiency in the cache.. After your clients ask the pihole - where does it go.. Just to pfsense to be resolved? If so you don't cost yourself cache.. since they have common cache on pfsense that is on your local network. But if they are forwarding to say outside, then you just halved your cache, and now the efficiency of your cache is reduced..

    As to being offline for maint.. pihole update like 30 seconds at most... If something goes wrong - just fire up another box on that same IP that can do dns for a bit, or adjust dhcp to point to different ns, etc.

    Been running my pihole for year+ on actual pi - before ran it on a vm. And have never had any issues where it was offline for more than a few minutes.. Even when update failed, took only a few minutes to get it back up and running..



  • @johnpoz

    You raise a great point on cache efficiency and that is something that I did think about. In my case pfSense acts as the DNS resolver, and both Pi-holes point to pfSense as the upstream DNS, effectively sharing pfSense's DNS cache. Now granted what does suffer is the cache on the individual Pi-holes themselves. However, a call upstream to pfSense from a Pi-hole is usually less than 0.5ms, so practically speaking essentially no performance hit. Related to that, I never found the cache on the Pi-holes that effective anyway since most websites these days have such short TTL's (and records would just expire out of the cache requiring a call upstream anyway). What has helped more was enabling "Serve cache records even with TTL of 0" in the pfSense DNS Resolver settings - this has kept lookup's quite fast on my network even when there aren't a ton of clients keeping the cache hot at all times.


  • LAYER 8 Global Moderator

    So you are just pointing back to pfsense, and yeah you overcome the split of the cache.. But now your back to a single point of failure and the point of 2 piholes is moot now.

    Unbound shut down on pfsense and dns is dead ;)

    I concur use of serve 0 will for sure speed up dns response, you prob also want to turn on prefetch.. And to be honest I hate the min ttl nonsense - freaking 60 seconds is just freaking moronic!!!!

    I set min TTL to 3600 seconds.. Those that set 60 seconds is just asshat moronic.. Has not seen an issue overriding that nonsense and using 3600 seconds. Personally I think its AWS just wanting to up their query numbers.. I mean lets get real 60 seconds - WTF! ;)



  • @johnpoz - Fair enough :). I suppose technically I would then also have to setup a second pfSense box and run them both in HA configuration -- but let's say if I do that, is the DNS cache actually shared (or sync'd) between the two?

    I always thought short TTL's were used nowadays to help with load balancing, etc. since so many sites use CDN's and the like....otherwise I"m not really sure why it would need to be done.


  • LAYER 8 Global Moderator

    You don't need a ttl of 60 to load share.. They do it because they like lots and lots of queries because you get charged per query.. Set to 5 or 10 minutes.. 30 or so.. Come on 60 freaking seconds.. Lets get real..

    I believe that is what they default too.. And people using them never update... The only reason you might ever get down to be a 60 second ttl is when your about ready to flip to another NS.. And you should really work that down from whatever your standard is, as you get closer to the switch over date and time, and then as soon as you flip over you would ramp it back up..

    Another issue with current dns is that iot devices are not set to do any local caching - so every freaking time they want to go somewhere like every few minutes they have to query for it.. And if where they go has a 60 second ttl, its just nuts...

    No the dns cache would not be shared via ha pair - I don't think so.. doesn't make a lot of sense to be able to do that. Your not active active, your active/standby, etc..


Log in to reply