Inbound DNS load balancing v2.0.1

MrEmbedded

Can someone explain exactly how this feature works?

I set up everything normally like I would with load balanced webservers (pools, virtual servers, monitors, virtual IP) but selected DNS for the protocol in the virtual servers pool. Nothing I can do seems to make the traffic pass. Is there something I am missing?

jimp

There isn't anything too complicated there. It should work almost exactly the same way as the TCP version, though last I heard fallback pools didn't work for it.

Anything show up in the relayd tab of the system logs?

I set it up several times when writing the code for it and it worked fine for me, and the customer I added the code for deployed it successfully as well.

MrEmbedded

Here is the log. What happens is that relayd craps out with the DNS option enabled:


Dec 24 14:39:37	relayd[1449]: terminating
Dec 24 14:39:37	relayd[1955]: host check engine exiting
Dec 24 14:39:37	relayd[1449]: check_child: lost child: socket relay engine exited
Dec 24 14:39:37	relayd[1955]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 24 14:39:37	relayd[1449]: check_child: lost child: pf update engine exited
Dec 24 14:39:37	relayd[1955]: host xxx.xxx.xxx.12, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 24 14:39:37	relayd[1955]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 24 14:39:37	relayd[1640]: pf update engine exiting
Dec 24 14:39:37	relayd[1955]: host xxx.xxx.xxx.11, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 24 14:39:37	relayd[2188]: fatal: relay_privinit: failed to listen: Can't assign requested address
Dec 24 14:39:37	relayd[1449]: startup
Dec 24 14:39:19	relayd[8699]: terminating
Dec 24 14:39:19	relayd[9024]: host check engine exiting
Dec 24 14:39:19	relayd[9024]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 24 14:39:19	relayd[8699]: check_child: lost child: socket relay engine exited
Dec 24 14:39:19	relayd[9024]: host xxx.xxx.xxx.11, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 24 14:39:19	relayd[8699]: check_child: lost child: pf update engine exited
Dec 24 14:39:19	relayd[9024]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 24 14:39:19	relayd[8830]: pf update engine exiting
Dec 24 14:39:19	relayd[9323]: fatal: relay_privinit: failed to listen: Can't assign requested address
Dec 24 14:39:19	relayd[8699]: startup

The external IP used is a virtual IP (proxy-arp)

jimp

It must be an IP Alias or CARP VIP.

relayd must be able to bind to the IP, and the error in the log says just that.

MrEmbedded

Ok made that change but relayd is still not starting:


Dec 30 11:31:18	relayd[58582]: terminating
Dec 30 11:31:18	relayd[58582]: check_child: lost child: socket relay engine exited
Dec 30 11:31:18	relayd[58582]: check_child: lost child: host check engine exited
Dec 30 11:31:18	relayd[58582]: check_child: lost child: pf update engine exited
Dec 30 11:31:18	relayd[58817]: host check engine exiting
Dec 30 11:31:18	relayd[58745]: pf update engine exiting
Dec 30 11:31:18	relayd[58996]: fatal: relay_privinit: failed to listen: Address already in use
Dec 30 11:31:18	relayd[58582]: startup
Dec 30 11:31:09	relayd[54839]: terminating
Dec 30 11:31:09	relayd[55161]: host check engine exiting
Dec 30 11:31:09	relayd[54839]: check_child: lost child: socket relay engine exited
Dec 30 11:31:09	relayd[54839]: check_child: lost child: pf update engine exited
Dec 30 11:31:09	relayd[55027]: pf update engine exiting
Dec 30 11:31:09	relayd[55305]: fatal: relay_privinit: failed to listen: Address already in use
Dec 30 11:31:09	relayd[54839]: startup

Address already in use?

jimp

You have the DNS forwarder on so it can't bind to port 53 on the IP.

MrEmbedded

Perfect! That sorted it all out. Thanks for the help.

For any others wanting to know how to do this what I have is:

4 DNS servers behind pfsense
3 pools with all 4 of the DNS servers in each (DNS servers are on private IP space)
3 Virtual IP on the WAN interface using IP Alias configuration
3 Virtual servers using the 3 virtual IPs
Firewall rule that allows traffic to pass on the external Virtual IP addresses on port 53 (I have a private IP space rule also for internal traffic)

So this effectively becomes a high availability round robin DNS affair.

jimp

Just be aware that due to the way relayd relays the connections, you lose the client IP in the process, so all requests appear to originate from the firewall.

If you have any access controls, views, etc in the DNS config that key off of the source address, you may need to make other adjustments.

MrEmbedded

Ok that makes sense.

One other thing I have noticed is that the Virtual IP (IP Alias) is not automatically copied over to my failover firewall configuration. Do I need to manually add that to the Virtual IP list on the failover firewall for this to work properly? Is that something that must be done with Virtual IPs (non CARP) in general?

jimp

If you are using this in a CARP cluster, you should be using a CARP VIP, not an IP alias.

(Proxy ARP VIPs are also a no-no for CARP clusters)

MrEmbedded

Unfortunately I cannot use these particular Virtual IPs with CARP for the moment as the network is routed to me but currently is not bound to any of the firewall interfaces. A Virtual IP is from a network address is created on the WAN interface and a 1:1 NAT usually is done to allow access a machine behind.

With that said, to make this work without CARP Virtual IPs, will I need to manually add the matching Virtual IP (IP Alias) entry in the 2nd firewall for this to work in failover? Or is the only way for this to work is to use CARP?

jimp

yeah you'd add one IP from the block to each cluster member as an IP Alias, then you can add CARP VIPs from that subnet to use.

MrEmbedded

Ok I tried to make that routed network an IP Alias (exact same entry) on both the master and slave firewalls. This allowed me to change my IP Aliases to CARP without issue. I can also see the CARP IP I am using for the DNS pools showing up on both firewalls. I also unchecked the DNS forwarder on the slave firewall.

However relayd wont run on the slave firewall:


Dec 30 12:48:39	relayd[10806]: terminating
Dec 30 12:48:39	relayd[10806]: check_child: lost child: socket relay engine exited
Dec 30 12:48:39	relayd[10806]: check_child: lost child: host check engine exited
Dec 30 12:48:39	relayd[10806]: check_child: lost child: pf update engine exited
Dec 30 12:48:39	relayd[11013]: host check engine exiting
Dec 30 12:48:39	relayd[11013]: host xxx.xxx.xxx.12, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 30 12:48:39	relayd[11013]: host xxx.xxx.xxx.11, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 30 12:48:39	relayd[11013]: host xxx.xxx.xxx.10, check icmp (0ms), state unknown -> up, availability 100.00%
Dec 30 12:48:39	relayd[10885]: pf update engine exiting
Dec 30 12:48:39	relayd[11099]: fatal: relay_privinit: failed to listen: Can't assign requested address
Dec 30 12:48:39	relayd[10806]: startup

jimp

you must use a different IP alias IP on each cluster member

Just like they can't have the same interface IP, they can't have the same IP Alias IP, it makes an IP conflict.

Only a CARP or 'other' type VIP can be the same on all cluster members.

MrEmbedded

OK I'll try that after hours and post back.

I had to also revert to my old setup because there were some things that rely on the DNS forwarder. I'll do a big clean up later on as well. Thanks again for all the help.