Inbound DNS load balancing v2.0.1



  • Can someone explain exactly how this feature works?

    I set up everything normally like I would with load balanced webservers (pools, virtual servers, monitors, virtual IP) but selected DNS for the protocol in the virtual servers pool.  Nothing I can do seems to make the traffic pass.  Is there something I am missing?


  • Rebel Alliance Developer Netgate

    There isn't anything too complicated there. It should work almost exactly the same way as the TCP version, though last I heard fallback pools didn't work for it.

    Anything show up in the relayd tab of the system logs?

    I set it up several times when writing the code for it and it worked fine for me, and the customer I added the code for deployed it successfully as well.



  • Here is the log.  What happens is that relayd craps out with the DNS option enabled:

    
    Dec 24 14:39:37	relayd[1449]: terminating
    Dec 24 14:39:37	relayd[1955]: host check engine exiting
    Dec 24 14:39:37	relayd[1449]: check_child: lost child: socket relay engine exited
    Dec 24 14:39:37	relayd[1955]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 24 14:39:37	relayd[1449]: check_child: lost child: pf update engine exited
    Dec 24 14:39:37	relayd[1955]: host xxx.xxx.xxx.12, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 24 14:39:37	relayd[1955]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 24 14:39:37	relayd[1640]: pf update engine exiting
    Dec 24 14:39:37	relayd[1955]: host xxx.xxx.xxx.11, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 24 14:39:37	relayd[2188]: fatal: relay_privinit: failed to listen: Can't assign requested address
    Dec 24 14:39:37	relayd[1449]: startup
    Dec 24 14:39:19	relayd[8699]: terminating
    Dec 24 14:39:19	relayd[9024]: host check engine exiting
    Dec 24 14:39:19	relayd[9024]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 24 14:39:19	relayd[8699]: check_child: lost child: socket relay engine exited
    Dec 24 14:39:19	relayd[9024]: host xxx.xxx.xxx.11, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 24 14:39:19	relayd[8699]: check_child: lost child: pf update engine exited
    Dec 24 14:39:19	relayd[9024]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 24 14:39:19	relayd[8830]: pf update engine exiting
    Dec 24 14:39:19	relayd[9323]: fatal: relay_privinit: failed to listen: Can't assign requested address
    Dec 24 14:39:19	relayd[8699]: startup
    
    

    The external IP used is a virtual IP (proxy-arp)


  • Rebel Alliance Developer Netgate

    It must be an IP Alias or CARP VIP.

    relayd must be able to bind to the IP, and the error in the log says just that.



  • Ok made that change but relayd is still not starting:

    
    Dec 30 11:31:18	relayd[58582]: terminating
    Dec 30 11:31:18	relayd[58582]: check_child: lost child: socket relay engine exited
    Dec 30 11:31:18	relayd[58582]: check_child: lost child: host check engine exited
    Dec 30 11:31:18	relayd[58582]: check_child: lost child: pf update engine exited
    Dec 30 11:31:18	relayd[58817]: host check engine exiting
    Dec 30 11:31:18	relayd[58745]: pf update engine exiting
    Dec 30 11:31:18	relayd[58996]: fatal: relay_privinit: failed to listen: Address already in use
    Dec 30 11:31:18	relayd[58582]: startup
    Dec 30 11:31:09	relayd[54839]: terminating
    Dec 30 11:31:09	relayd[55161]: host check engine exiting
    Dec 30 11:31:09	relayd[54839]: check_child: lost child: socket relay engine exited
    Dec 30 11:31:09	relayd[54839]: check_child: lost child: pf update engine exited
    Dec 30 11:31:09	relayd[55027]: pf update engine exiting
    Dec 30 11:31:09	relayd[55305]: fatal: relay_privinit: failed to listen: Address already in use
    Dec 30 11:31:09	relayd[54839]: startup
    
    

    Address already in use?


  • Rebel Alliance Developer Netgate

    You have the DNS forwarder on so it can't bind to port 53 on the IP.



  • Perfect! That sorted it all out.  Thanks for the help.

    For any others wanting to know how to do this what I have is:

    • 4 DNS servers behind pfsense

    • 3 pools with all 4 of the DNS servers in each (DNS servers are on private IP space)

    • 3 Virtual IP on the WAN interface using IP Alias configuration

    • 3 Virtual servers using the 3 virtual IPs

    • Firewall rule that allows traffic to pass on the external Virtual IP addresses on port 53 (I have a private IP space rule also for internal traffic)

    So this effectively becomes a high availability round robin DNS affair.


  • Rebel Alliance Developer Netgate

    Just be aware that due to the way relayd relays the connections, you lose the client IP in the process, so all requests appear to originate from the firewall.

    If you have any access controls, views, etc in the DNS config that key off of the source address, you may need to make other adjustments.



  • Ok that makes sense.

    One other thing I have noticed is that the Virtual IP (IP Alias) is not automatically copied over to my failover firewall configuration.  Do I need to manually add that to the Virtual IP list on the failover firewall for this to work properly?  Is that something that must be done with Virtual IPs (non CARP) in general?


  • Rebel Alliance Developer Netgate

    If you are using this in a CARP cluster, you should be using a CARP VIP, not an IP alias.

    (Proxy ARP VIPs are also a no-no for CARP clusters)



  • Unfortunately I cannot use these particular Virtual IPs with CARP for the moment as the network is routed to me but currently is not bound to any of the firewall interfaces.  A Virtual IP is from a network address is created on the WAN interface and a 1:1 NAT usually is done to allow access a machine behind.

    With that said, to make this work without CARP Virtual IPs, will I need to manually add the matching Virtual IP (IP Alias) entry in the 2nd firewall for this to work in failover?  Or is the only way for this to work is to use CARP?


  • Rebel Alliance Developer Netgate

    yeah you'd add one IP from the block to each cluster member as an IP Alias, then you can add CARP VIPs from that subnet to use.



  • Ok I tried to make that routed network an IP Alias (exact same entry) on both the master and slave firewalls.  This allowed me to change my IP Aliases to CARP without issue.  I can also see the CARP IP I am using for the DNS pools showing up on both firewalls.  I also unchecked the DNS forwarder on the slave firewall.

    However relayd wont run on the slave firewall:

    
    Dec 30 12:48:39	relayd[10806]: terminating
    Dec 30 12:48:39	relayd[10806]: check_child: lost child: socket relay engine exited
    Dec 30 12:48:39	relayd[10806]: check_child: lost child: host check engine exited
    Dec 30 12:48:39	relayd[10806]: check_child: lost child: pf update engine exited
    Dec 30 12:48:39	relayd[11013]: host check engine exiting
    Dec 30 12:48:39	relayd[11013]: host xxx.xxx.xxx.12, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 30 12:48:39	relayd[11013]: host xxx.xxx.xxx.11, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 30 12:48:39	relayd[11013]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
    Dec 30 12:48:39	relayd[10885]: pf update engine exiting
    Dec 30 12:48:39	relayd[11099]: fatal: relay_privinit: failed to listen: Can't assign requested address
    Dec 30 12:48:39	relayd[10806]: startup
    
    

  • Rebel Alliance Developer Netgate

    you must use a different IP alias IP on each cluster member

    Just like they can't have the same interface IP, they can't have the same IP Alias IP, it makes an IP conflict.

    Only a CARP or 'other' type VIP can be the same on all cluster members.



  • OK  I'll try that after hours and post back.

    I had to also revert to my old setup because there were some things that rely on the DNS forwarder.  I'll do a big clean up later on as well.  Thanks again for all the help.


Locked