Private domain in resolver custom options randomly breaks resolution for that domain
-
I have the following "custom options" configuration in my DNS resolver settings to allow DNS over OpenVPN to work properly (domain generalized).
server: private-domain: "test.foo.com"
About once a week, the PfSense+ resolver just stops resolving anything under the domain foo.com.
I have looked at the general system logs and the DNS Resolver logs and do not see anything happening with the resolver around this time.
Resolving other names, such as google.com works fine. It's just names under the foo.com domain that fail to resolve.
$ ping google.com PING google.com (142.251.214.142): 56 data bytes 64 bytes from 142.251.214.142: icmp_seq=0 ttl=57 time=20.139 ms 64 bytes from 142.251.214.142: icmp_seq=1 ttl=57 time=19.795 ms ^C --- google.com ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 19.795/19.967/20.139/0.172 ms $ ping foo.com ping: cannot resolve foo.com: Unknown host $ dig foo.com ; <<>> DiG 9.10.6 <<>> foo.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 4996 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;foo.com. IN A ;; Query time: 89 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;; WHEN: Wed Nov 23 10:36:52 PST 2022 ;; MSG SIZE rcvd: 43
If I make no changes to the resolver config and click save/apply-changes on the services_unbound.php page, the resolver starts working again for names under the foo.com domain.
I'm running PfSense+ on a Netgate 4100.
22.05-RELEASE (amd64) built on Wed Jun 22 18:56:13 UTC 2022 FreeBSD 12.3-STABLE
-
In general that's not a known issue. Pretty much everyone here at Netgate runs with a private domain entry for our company domain and things hum along as usual.
unbound can get cranky sometimes if it is trying to reach a specific upstream server and it doesn't respond. Keep an eye on Status > DNS Resolver entries when it works vs when it doesn't work. You can get the same output from the shell with:
: unbound-control -c /var/unbound/unbound.conf dump_infra
Odds are when it stops responding there is an entry in there for a server that has also stopped responding. Restarting unbound clears all that knowledge and forces it to try again. You could also try manually flushing things for that domain (or all domains) to see if that's sufficient to make it try again:
: unbound-control -c /var/unbound/unbound.conf flush_zone foo.com
There are some other similar commands to try listed in the docs:
https://docs.netgate.com/pfsense/en/latest/services/dns/resolver-cli.html