DNS Resolver
-
@cmb:
@cmb:
We're running the December 10th build. I can confirm issues with a new WAN address breaking unbound. When our PPPoE WAN link gets a new IP address, the resolver will reply with internal IPs set via DHCP clientIDs, but any external DNS lookup made via a system on the LAN fails.
DNS resolving on the firewall continues to work, so it's clearly an issue with unbound.
https://redmine.pfsense.org/issues/4095
The above referenced issue should be fixed. Those who were seeing that, please try on the 31st or newer snapshot.
I just discovered this issue, or one similar to it, today - the hard way. Unbound failing on a machine with a PPPoE link randomly, but DNS still working on the firewall - just not for any client. Build is 2.2-RC (i386)
built on Thu Jan 01 06:14:04 CST 2015
FreeBSD 10.1-RELEASE-p3I went back to dnsmasq for now.
-
Im not sure if this is a real issue or if its particular to my setup but I was having trouble starting DNS Resolver. To maximise my 10be throughput I use a high kern.ipc.maxsockbuf
kern.ipc.maxsockbuf: 33554432
the so-rcvbuf is derived from this value so in my case, 'so-rcvbuf: 31m' which caused unbound to fail to launch with the following errors
Jan 4 08:47:06 php-fpm[6441]: /status_services.php: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1420361226] unbound[24922:0] debug: creating udp4 socket 192.168.50.1 53 [1420361226] unbound[24922:0] error: setsockopt(..., SO_RCVBUF, ...) failed: No buffer space available [1420361226] unbound[24922:0] fatal error: could not open ports'
adding an advanced option
so-rcvbuf: 8m
to reduce this 31m down to 8m allows unbound to start correctly.
-
@irj972:
Im not sure if this is a real issue or if its particular to my setup but I was having trouble starting DNS Resolver. To maximise my 10be throughput I use a high kern.ipc.maxsockbuf
kern.ipc.maxsockbuf: 33554432
the so-rcvbuf is derived from this value so in my case, 'so-rcvbuf: 31m' which caused unbound to fail to launch with the following errors
Jan 4 08:47:06 php-fpm[6441]: /status_services.php: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1420361226] unbound[24922:0] debug: creating udp4 socket 192.168.50.1 53 [1420361226] unbound[24922:0] error: setsockopt(..., SO_RCVBUF, ...) failed: No buffer space available [1420361226] unbound[24922:0] fatal error: could not open ports'
adding an advanced option
so-rcvbuf: 8m
to reduce this 31m down to 8m allows unbound to start correctly.
The unbound docs I have found all are giving 8m as the example for a busy system, so maybe there is something in the unbound compile or FreeBSD that is limiting that socket option to 8m anyway.
I made this pull request to limit the calculation to 8m : https://github.com/pfsense/pfsense/pull/1420
That might be a practical fix here to protect people like you who have set kern.ipc.maxsockbuf high for other reasons. -
Hrmm I have seen values as high as 32M. So further investigation as to why it failed will need to be done.
I will see what I can do to replicate. -
Not sure if it's been mentioned, on a dual wan setup when one WAN link fails over to the secondary WAN link, DNS lookups start to fail on client devices.
When I set outgoing to WAN1 and WAN2 it works fine, rather than the default ALL:
-
THAT may have been the cause of the behaviour I saw that forced me to go back to dnsmasq.
-
THAT may have been the cause of the behaviour I saw that forced me to go back to dnsmasq.
UPDATE - no that wasn't it, as I already had it set to only allow out over the two interfaces that exist. One of the interfaces is a PPPoE.
-
@irj972:
Im not sure if this is a real issue or if its particular to my setup but I was having trouble starting DNS Resolver. To maximise my 10be throughput I use a high kern.ipc.maxsockbuf
kern.ipc.maxsockbuf: 33554432
Setting kern.ipc.maxsockbuf = 37748736 (36MB) allows Unbound to start, so adding a 4MB buffer to the optimise code section caters for this. As kern.ipc.maxsockbuf increases this buffer grows. Needing more than 32m points towards moving the service off onto its own box.
-
-
THAT may have been the cause of the behaviour I saw that forced me to go back to dnsmasq.
UPDATE - no that wasn't it, as I already had it set to only allow out over the two interfaces that exist. One of the interfaces is a PPPoE.
So what happened in your setup then?
I'm guessing what happens in that circumstance is he has it doing recursion, which leaves all DNS traffic following the default route, and when the default route is unreachable then nothing will resolve. In that case, enabling default gateway switching is probably the best bet. Alternatively, forwarder mode would be an option as well, specifying at least one DNS server under System>General Setup for each WAN.
-
edit: nvm
-
@cmb:
THAT may have been the cause of the behaviour I saw that forced me to go back to dnsmasq.
UPDATE - no that wasn't it, as I already had it set to only allow out over the two interfaces that exist. One of the interfaces is a PPPoE.
So what happened in your setup then?
I'm guessing what happens in that circumstance is he has it doing recursion, which leaves all DNS traffic following the default route, and when the default route is unreachable then nothing will resolve. In that case, enabling default gateway switching is probably the best bet. Alternatively, forwarder mode would be an option as well, specifying at least one DNS server under System>General Setup for each WAN.
Correct, but as far as I know it was the second WAN (the PPPoE one) going down (or changing IPs), not the primary WAN, that killed resolution. Also, why would it still answer queries from localhost but not from machines on the network?
-
Correct, but as far as I know it was the second WAN (the PPPoE one) going down (or changing IPs), not the primary WAN, that killed resolution. Also, why would it still answer queries from localhost but not from machines on the network?
Hmm that makes no sense if its doing recursion, your DNS traffic is going via the default route as Chris has mentioned. It would make sense if 'DNS Query Forwarding' and 'Allow DNS server list to be overridden by DHCP/PPP on WAN' was enabled, and the traffic to those DNS servers were going via the PPPoE connection. Any chance those were enabled at the time?
-
Correct, but as far as I know it was the second WAN (the PPPoE one) going down (or changing IPs), not the primary WAN, that killed resolution. Also, why would it still answer queries from localhost but not from machines on the network?
Hmm that makes no sense if its doing recursion, your DNS traffic is going via the default route as Chris has mentioned. It would make sense if 'DNS Query Forwarding' and 'Allow DNS server list to be overridden by DHCP/PPP on WAN' was enabled, and the traffic to those DNS servers were going via the PPPoE connection. Any chance those were enabled at the time?
Nope, and to clarify, it didn't just kill it while it was down (or IP changed) - it KILLED it, needed to restart the service to get it resolving again. I gave up for now, back to DNSmasq.
-
Has anyone run namebench using unbound? It felt like DNS lookups were happening slower than what I'd seen with dnsmasq on 2.1.5 and Tomato USB so decided to give it a go, these were the results:
dnsmasq (2.2): https://dl.dropboxusercontent.com/u/90391152/pfsense/namebench_dnsmasq.html
unbound (recursive): https://dl.dropboxusercontent.com/u/90391152/pfsense/namebench_unbound_recursive.html
unbound (forward): https://dl.dropboxusercontent.com/u/90391152/pfsense/namebench_unbound_forward.html
Don't really know how to take these results other than dnsmasq appears to be the fastest, thoughts?
-
Has anyone run namebench using unbound? It felt like DNS lookups were happening slower than what I'd seen with dnsmasq on 2.1.5 and Tomato USB so decided to give it a go, these were the results:
dnsmasq (2.2): https://dl.dropboxusercontent.com/u/90391152/pfsense/namebench_dnsmasq.html
unbound (recursive): https://dl.dropboxusercontent.com/u/90391152/pfsense/namebench_unbound_recursive.html
unbound (forward): https://dl.dropboxusercontent.com/u/90391152/pfsense/namebench_unbound_forward.html
Don't really know how to take these results other than dnsmasq appears to be the fastest, thoughts?
Well thats expected you can't compare the two.
DNSMasq is a forwarder and Unbound is a resolver. There is a lot to consider including how your Unbound service is configured e.g. DNSSec enabled?
So Unbound performs the task of doing iterative queries as well as validating answers. DNSMasq does not and relies on another name server to do all the hard work of doing iterative queries etc. -
Correct, but as far as I know it was the second WAN (the PPPoE one) going down (or changing IPs), not the primary WAN, that killed resolution. Also, why would it still answer queries from localhost but not from machines on the network?
Hmm that makes no sense if its doing recursion, your DNS traffic is going via the default route as Chris has mentioned. It would make sense if 'DNS Query Forwarding' and 'Allow DNS server list to be overridden by DHCP/PPP on WAN' was enabled, and the traffic to those DNS servers were going via the PPPoE connection. Any chance those were enabled at the time?
Nope, and to clarify, it didn't just kill it while it was down (or IP changed) - it KILLED it, needed to restart the service to get it resolving again. I gave up for now, back to DNSmasq.
Ok thanks you gave me an idea of where the problem could be but I would need to test to confirm.
-
dnssec was disabled
Does the option under DNS resolver "Enable forwarding mode" not do the same thing as dnsmasq?
-
Couple of small issues I spotted it tuning up my Resolver in build dated December 28th.
In Resolver>General Settings>Advanced, parameters which include double quotes require a space on the end of each line to enforce the carriage return, i.e
local-data: "example1.com A 10.10.10.1" local-data: "example2.com A 10.10.10.1" local-data: "example3.com A 10.10.10.1"
Save & Apply, migrate away and back to the General settings page and you will see they have been reduced to one line.
Also, not sure if this is a real issue, on advance settings page, when increasing the Message Cache Size, the associated RRset cache size isn't set correctly in the unbound.con file - not quite sure if its being internally handled correctly though.
-
I believe the double quotes carriage returns business has been fixed in later snapshots. Dec 28th was last year ;)