Host Overrides are Not Resolving (used to)
-
TL;DR - clients using dig do not resole host overrides, instead get no host found
@johnpoz - I fear I wasn't clear about the
dig
.When I'm on a client (that is on the network and was even assigned a static DHCP fro pfSense), that client lists the pfSense box as the only DNS server; but even if I also use a dig command with the
@
to force it to the pfSense ip address, the client does not resolve the hostname (though the pfSense diagnostic tool does).It's as if the pfSense is not passing the Host Overrides on to the clients, only itself.
Thank for the tip about DNSSEC. This in an area where I have a gap in my knowledge, as I don't remember setting Forwarding Mode and reading the documentation, understood it to mean it'd use the list of DNS Server for queries pfSense couldn't resolve itself. I now wonder if that's part of the source of the problem, if it's just 100% passing client requests onward without caching or using Host Override ...except it didn't seem to do anything when I turned it off.
I'll poke around some more.
-
@wls said in Host Overrides are Not Resolving (used to):
It's as if the pfSense is not passing the Host Overrides on to the clients, only itself.
well what ACLs do you have set? Do you have that on automatic, and is this client a downstream network or vpn connection?
Even if you setup forwarding mode, it would still resolve local stuff.. Unless you have some odd ACL setup in unbound.
-
CONFIG UPDATE: DNSSEC and Forwarding Mode are now unchecked as recommended.
@johnpoz -- this is about as plain as one could imagine, no VPN, just clients downstream on the LAN interface.
ISP <--wan--> pfsense <--lan--> Mac (10.0.0.5) (10.0.0.1) Linux (10.0.0.6)
Where this gets weird is that if I ssh into 10.0.0.1 and do a
dig
(orping
) to resolve a Host Override, it gives the correct A record. If I do ssh into a machine on the LAN interface, it does not.mac$ ssh admin@10.0.0.1 [22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: dig pfsense.mynetwork.com ... ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;pfsense.mynetwork.com. IN A ;; ANSWER SECTION: pfsense.mynetwork.com. 3600 IN A 10.0.0.1 ... ;; Query time: 0 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Fri Feb 25 11:23:04 EST 2022 ;; MSG SIZE rcvd: 69 [22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: exit
That's the response I expect.
mac$ dig @10.0.0.1 pfsense.mynetwork.com ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;pfsense.mynetwork.com. IN A ;; Query time: 13 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Fri Feb 25 11:23:22 EST 2022 ;; MSG SIZE rcvd: 53
No 'ANSWER SECTION' is returned.
Just in case it comes up, I've substituted my domain name and machine IP addresses as the output is real but this is public example. The "client" machines are all using local address spaces (RFC 1918) and hang off the LAN port.
As for ACLs, I appeal to your patience. I'm not familiar with them, don't believe I'm using them, and am trying to Google what they are in the manual so I can understand what you're asking me.
According to the manual, "Unbound requires access lists (ACLs) to control which clients are allowed to submit queries. By default, IPv4 and IPv6 networks residing on internal interfaces of this firewall are permitted. Additional networks must be allowed manually." — I'm desiring the default case, only allowing internal interfaces to resolve them (in this case the hardwired LAN segment), no other networks.
The manual also has a note: "The automatic ACLs may be disabled using the Disable Auto-added Access Control option on the Advanced Settings tab." — this box is not checked
Looking at Services / DNS Resolver / Access Lists, there are no entries listed. It's completely empty.
The Services / DNS Resolver / Advanced Settings has these entries SET:
- Hide Identity — id.server and hostname.bind queries are refused
- Hide Version — version.server and version.bind queries are refused
- Prefech DNS Key Support — DNSKEYs are fetched earlier in the validation process when a Delegation signer is encountered
- Harden DNSSEC Data — DNSSEC data is required for trust-anchored zones.
At the bottom of the same screen I notice that the follow are NOT set:
- Disable Auto-added Access Control — Disable the automatically-added access control entries
- Disable Auto-Added Host Entries — Disable the automatically-added host entries
Thanks all for the assistance so far. It's just as frustrating at this end, and I've tried many, many things after researching and only then brought it to the Netgate forum.
-
@wls You are using ACLs - you just might not know that you are.. By default pfsense creates auto acls for local connected network.
Not being a fan of auto stuff - I disable the auto and create my own.
advanced under unbound you will see this.
I have it disabled, and create my own.
Lets validate your creating the host overides in the unbound section and not in the forwarder section?
What is this setting? In the pfsense general setting.
So I create a test host override.
Your not doing a dns redirection are you? See above my pfsense IP on this lan interface is 192.168.9.253 - when I do a directed query to pfsense from a client 192.168.9.100 I get back the what I put in the host override.
edit: your local query time of 13ms seems high
-
@johnpoz -- Conducted the test like you did, I do not get back what's in the host override as you did... wondering if the problem is elsewhere. I feel like I'm missing something trivial.
I tend not to be a guy that goes in and changes settings randomly; rather, I lean towards accepting the defaults and keeping with them until a need arises (and then I document what was changed and why). I recently suffered a hardware failure of an SG-4860 running pfSense 2.5.2 and restored the config on a Netgate 5100 that's running pfSense+ v22.02-RELEASE, not sure if any differences mattered; Host Overrides had broken after a v2.4.x update with no config changes on my part.
In System / General Setup, the DNS Resolution Behavior is currently set to "Use local DNS (127.0.0.1), fall back to remote DNS Servers (default)".
<TEST>
For the test, setting the DNS Resolution Behavior to "Use local DNS (127.0.0.1), ignore remote DNS Servers" to match yours, and making a Host Override similar to yours:
...and of course applying the changes...
ISP <--wan--> pfsense (10.0.0.1) <--lan--> mac (10.0.0.5)
client 'mac' (10.0.0.5) on LAN interface asking 'pfsense' (10.0.0.1) for a DNS query
This is what I see from a dig of the Host Override entry aaa.testdomain.tld (it doesn't resolve):mac$ sudo killall -HUP mDNSResponder # Let's flush the local DNS cache, just to be sure mac$ dig @10.0.0.1 aaa.testdomain.tld ; <<>> DiG 9.10.6 <<>> @10.0.0.1 aaa.testdomain.tld ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 53362 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;aaa.testdomain.tld. IN A ;; AUTHORITY SECTION: . 599 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2022022501 1800 900 604800 86400 ;; Query time: 48 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Fri Feb 25 15:54:16 EST 2022 ;; MSG SIZE rcvd: 122
And for comparison, here's a working dig to Google, same config settings (works, got an answer):
mac$ dig @10.0.0.1 www.google.com ; <<>> DiG 9.10.6 <<>> @10.0.0.1 www.google.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24219 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;www.google.com. IN A ;; ANSWER SECTION: www.google.com. 2399 IN A 172.217.9.196 ;; Query time: 55 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Fri Feb 25 15:44:57 EST 2022 ;; MSG SIZE rcvd: 59
And over on the pfSense box itself, the Host Override works like a charm:
mac$ ssh admin@10.0.0.1 [22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: dig @10.0.0.1 aaa.testdomain.tld ; <<>> DiG 9.16.23 <<>> @10.0.0.1 aaa.testdomain.tld ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25868 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;aaa.testdomain.tld. IN A ;; ANSWER SECTION: aaa.testdomain.tld. 3600 IN A 1.2.3.4 ;; Query time: 0 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Fri Feb 25 15:54:00 EST 2022 ;; MSG SIZE rcvd: 63
So the test feels like it's producing exactly the same results I'm seeing:
- pfSense (10.0.0.1) returns Host Overrides for itself via DNS just fine
- LAN interface clients, such as mac (10.0.0.5) when asking pfSense for DNS don't get those entries
- External DNS queries work as expected for all
...digging deeper... (no pun intended)
Possible extra data points — decided to try to resolve my own hostname (e.g. mac should resolve 10.0.0.5) from a LAN interface connected client and got no resolved answer (note, I use static DHCP addresses tied to MACs):
mac$ dig @10.0.0.1 mac # Ask pfSense with dig what's my own IP (got no answer section) ; <<>> DiG 9.10.6 <<>> @10.0.0.1 mac ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10102 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;mac. IN A ;; AUTHORITY SECTION: . 599 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2022022501 1800 900 604800 86400 ;; Query time: 55 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Fri Feb 25 16:06:29 EST 2022 ;; MSG SIZE rcvd: 111
So, I tried that on pfSense expecting to get a response, but did not (likely user error, keep reading):
mac$ ssh admin@10.0.0.1 [22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: dig @10.0.0.1 mac ; <<>> DiG 9.16.23 <<>> @10.0.0.1 mac ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 34009 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;mac. IN A ;; AUTHORITY SECTION: . 2591 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2022022501 1800 900 604800 86400 ;; Query time: 176 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Fri Feb 25 16:09:57 EST 2022 ;; MSG SIZE rcvd: 111
And yet ping works (though this is not a surprise):
[22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: ping mac PING mac.mynetwork.com (10.0.0.5): 56 data bytes 64 bytes from 10.0.0.5: icmp_seq=0 ttl=64 time=0.980 ms ^C --- mac.mynetwork.com ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss
Simply because it's in /etc/hosts:
[22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: fgrep mac /etc/hosts 10.0.0.5 mac.mynetwork.com mac
And, so are the Host Overrides, by the way:
[22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: fgrep mac /etc/hosts 1.2.3.4 aaa.testdomain.tld aaa
Then I thought I'd try the fully qualified name, just to be sure (and that worked):
[22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: dig @10.0.0.1 mac.mynetwork.com ; <<>> DiG 9.16.23 <<>> @10.0.0.1 mac.mynetwork.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62523 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;mac.mynetwork.com. IN A ;; ANSWER SECTION: mac.mynetwork.com. 3600 IN A 10.0.0.1 ;; Query time: 0 msec ;; SERVER: 10.0.0.1#53(10.0.0.1) ;; WHEN: Fri Feb 25 16:20:27 EST 2022 ;; MSG SIZE rcvd: 69
That is another source of small confusion, as System / General Setup, has the Domain set to mynetwork.com (and I figure that suffix would be applied to any non-qualified host names).
I think I misunderstood what
dig
was passing on, mistaking host names would fully resolve, where it appears be passing specific queries verbatim. A quick test shows that's the case, dig wants a query, hence I need to use a fully qualified domain name.On the upshot, using
nslookup
on pfsense (10.0.0.1) behaved as I expected, confirming the prior assumption about dig's argument usage:[22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: nslookup mac Server: 127.0.0.1 Address: 127.0.0.1#53 Name: mac.mynetwork.com Address: 10.0.0.5
And using
nslookup
from the mac (10.0.0.5) gives this, clearly hitting the right name server (10.0.0.1):mac$ nslookup aaa.testdomain.tld Server: 10.0.0.1 Address: 10.0.0.1#53 ** server can't find aaa.testdomain.tld: NXDOMAIN
So what about asking about pfsense (10.0.0.1) from mac (10.0.05) via
nslookup
about a Host Override that's on the known domain:mac$ nslookup pfsense.mynetwork.com Server: 10.0.0.1 Address: 10.0.0.1#53 Non-authoritative answer: *** Can't find pfsense.mynetwork.com: No answer
- details of unexpected results left in deliberately for the education of future readers of this thread and also hoping someone spots something I have not -
CONCLUSION OF TEST:
- The machines on the LAN side are indeed hitting the pfSense server for DNS
- I'm still not getting Host Overrides to come back, but am getting DHCP Hostnames
- External hosts resolve just fine
</TEST>
I've restored System / General Setup's DNS Resolution Behavior back to the original "Use local DNS (127.0.0.1), fall back to remote DNS Servers (default)" post testing. Let me know if this was the wrong thing to do.
I concur that the 13ms query time seems higher than it should be for a local network; this makes me ponder if pfSense is just sending the query out to the raw internet without answering it on its own.
In the category of Education-of-ACLs, what is the ACL allowing or disallowing? (I come from the olden days where access control lists were a set of policies. However, this seems to almost be a mini-firewall for the DNS Resolver service itself.)
For S&Gs, I created a single ACL with extremely liberal policies for 10.0.0.0/8 that I hope overlapped the default permits:
However, this didn't seem to change any of the above behaviors, and so I removed the ACL.
In a way it feels like while pfSense has a DNS Resolver running (and uses it itself), it's not exposing this service, but is just forwarding on DNS requests it gets outside the network. Why it'd do that, I can't figure out at the moment.
-
@wls you sure your putting the host overrides in unbound and not in forwarder..
Or sure your unbound (resolver) is running and not forwarder?
-
@johnpoz - 99.999% sure
If I go to Services / DNS Forwarder, every box on the page is unchecked and there are no host or domain overrides. In short, the Enable box is unchecked.
Also, System / Package Manager / Available Packages shows don't have the
bind
package installed.Services / DNS Resolver / General Settings shows the Resolver service enabled.
Note to self: This service is called "unbound"And, the Host Overrides on the same page has the test entry we made:
Furthermore, if I hop on the pfSense box via ssh, we can see unbound running:
[22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: ps aux | fgrep bound root 75284 0.0 0.1 10848 2252 - Is 17:34 0:00.01 /usr/local/sbin/dhcpleases -l /var/dhcpd/var/db/dhcpd.leases -d ashburn.wwco.com -p /var/run/unbound.pid -u /var/unbound unbound 94064 0.0 0.7 63880 29712 - Ss 17:34 0:01.97 /usr/local/sbin/unbound -c /var/unbound/unbound.conf admin 29489 0.0 0.1 11104 2456 0 S+ 00:32 0:00.00 fgrep bound
The config file is present and includes the link to the host entries:
[22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: cat /var/unbound/unbound.conf ... # Static host entries include: /var/unbound/host_entries.conf ...
And if we look at the end of the host_entries.conf file, we see the Host Overrides defined:
[22.01-RELEASE][admin@pfsense.mynetwork.com]/home/admin: tail -2 /var/unbound/host_entries.conf local-data-ptr: "1.2.3.4 aaa.testdomain.tld" local-data: "aaa.testdomain.tld. A 1.2.3.4"
Even better, there's an /var/log/resolver.log and inside that, it says: "start of service (unbound 1.13.2)".
But I can do one better, I can use unbound-control to show the service is up and dump its status.
[22.01-RELEASE][admin@pfsense.amynetwork.com]/var/unbound: unbound-control -c /var/unbound/unbound.conf status version: 1.13.2 verbosity: 1 threads: 4 modules: 1 [ iterator ] uptime: 28545 seconds options: control(ssl) unbound (pid 94064) is running...
It's worth noting that if one does not specify the file, it will check against /usr/local/etc/unbound/unbound.conf instead.
And if you double check the process list, you'll see the unbound process is using the correct one located in /var/unbound that has all the pfSense settings.
-
@wls that makes just zero sense then...
Are you using pfblocker?
Can we look at your unbound.conf located in /var/unbound
-
@johnpoz - I know. It makes no sense to me what-so-ever, hence my reaching out.
I strongly suspect there's something else that inter-playing that I'm not aware of.
I'm not aware of what the pfBlocker-NG package does (the docs say introduces Enhanced Alias Table Features, but I'm not sure I know what that is). I don't believe I have that package installed.
Installed packages are:
- aws-wizard — pfSense AWS VPC VPN Connection Wizard
- ipsec-profile-wizard — pfSense IPsec Profile Generation wizard for iOS and Windows devices
- ntopng — ntopng (replaces ntop) is a network probe that shows network usage in a way similar to what top does for processes.
I did not install them, they came installed by default apparently.
Here's the
/var/unbound/unbound.conf
:########################## # Unbound Configuration ########################## ## # Server configuration ## server: chroot: /var/unbound username: "unbound" directory: "/var/unbound" pidfile: "/var/run/unbound.pid" use-syslog: yes port: 53 verbosity: 1 hide-identity: yes hide-version: yes harden-glue: yes do-ip4: yes do-ip6: no do-udp: yes do-tcp: yes do-daemonize: yes module-config: "iterator" unwanted-reply-threshold: 0 num-queries-per-thread: 512 jostle-timeout: 200 infra-host-ttl: 900 infra-cache-numhosts: 10000 outgoing-num-tcp: 10 incoming-num-tcp: 10 edns-buffer-size: 4096 cache-max-ttl: 86400 cache-min-ttl: 0 harden-dnssec-stripped: yes msg-cache-size: 4m rrset-cache-size: 8m num-threads: 4 msg-cache-slabs: 4 rrset-cache-slabs: 4 infra-cache-slabs: 4 key-cache-slabs: 4 outgoing-range: 4096 #so-rcvbuf: 4m prefetch: no prefetch-key: yes use-caps-for-id: no serve-expired: no aggressive-nsec: no # Statistics # Unbound Statistics statistics-interval: 0 extended-statistics: yes statistics-cumulative: yes # TLS Configuration tls-cert-bundle: "/etc/ssl/cert.pem" tls-port: 853 tls-service-pem: "/var/unbound/sslcert.crt" tls-service-key: "/var/unbound/sslcert.key" # Interface IP(s) to bind to interface-automatic: no interface: 0.0.0.0 interface: 0.0.0.0@853 interface: ::0 interface: ::0@853 # DNS Rebinding # For DNS Rebinding prevention private-address: 127.0.0.0/8 private-address: 10.0.0.0/8 private-address: ::ffff:a00:0/104 private-address: 172.16.0.0/12 private-address: ::ffff:ac10:0/108 private-address: 169.254.0.0/16 private-address: ::ffff:a9fe:0/112 private-address: 192.168.0.0/16 private-address: ::ffff:c0a8:0/112 private-address: fd00::/8 private-address: fe80::/10 # Access lists include: /var/unbound/access_lists.conf # Static host entries include: /var/unbound/host_entries.conf # dhcp lease entries include: /var/unbound/dhcpleases_entries.conf # Domain overrides include: /var/unbound/domainoverrides.conf ### # Remote Control Config ### include: /var/unbound/remotecontrol.conf
-
@wls said in Host Overrides are Not Resolving (used to):
ntopng — ntopng (replaces ntop) is a network probe that shows network usage in a way similar to what top does for processes.
I did not install them, they came installed by default apparently.
ntopng for sure did not come default, that was installed by someone.
Thought you said unbound was forwarding? Did you turn that off, since I don't see any forwarding in your conf.
Can we see what is in /var/unbound/access_lists.conf
This should be the automatic created rules..
But nothing there is jumping out at me saying - oh yeah that is wrong.
The only thing else I can see to do is actually sniff on pfsense.. So to validate unbound is actually seeing these queries your asking for your overrides..
So for example - here I sniffed on on my lan interface for port 53 and did a directed query for that test host override I put in.
set verpose level to medium so could see what is asked and what responded
So you can see my client from 192.168.9.100 ask pfsense at 192.168.9.253 and it sent the answer that is in my host override.
-
@johnpoz — TL;DR - your inspiration of packet capturing helped me find the cause (and thus a solution)
I was authoring this in real time, responding to your questions...
I don't recall installing
ntopng
, nor would have thought to ifntop
was available; seeing theng
suffix, I presume this is Netgear specific. Is it likely that this comes pre-installed with pfSense+ (noting that prior to equipment failure, I was using plain old pfSense).The DNS Query Forwarding is unchecked for the DNS Resolver (unbound):
Trying to figure out where the communication confusion arose... the "DNS Forwarder" service is unchecked (and thus not running). Likely I said something along the lines that DNS requests from a client that aren't known/cached by pfSense's DNS Resolver are correctly forwarded to an upstream dns server for resolution; that is working perfectly. (e.g., foo.com resolves to 34.206.29.153, so I do have completely working DNS, just not of my Host Overrides for some very mysterious reason)
Here's the contents of /var/unbound/access_lists.conf:
access-control: 127.0.0.1/32 allow_snoop access-control: ::1 allow_snoop access-control: 10.0.0.0/8 allow access-control: 127.0.0.0/8 allow access-control: ::1/128 allow access-control: fd00:10:91::/48 allow
I agree with you, there's nothing that's jumping out at me that's wrong either. I'm highly baffled by the results I'm seeing.
But then... you put me on the path to solving it.
Using pfSense's Diagnostic / Packet Capture on the LAN interface to look at port 53 and log with a Medium level of detail, I got a very unexpected result from –
dig @10.0.0.1 aaa.testdomain.tld
– ...absolutely nothing.Same for
dig @10.0.0.1 pfsense.mynetwork.com
. Very strange indeed.Then I tried this –
dig @10.0.0.1 www.google.com
– and got an answer back... though not what I expected.19:04:07.063431 IP (tos 0x0, ttl 64, id 61696, offset 0, flags [none], proto UDP (17), length 79) 10.0.0.5.50378 > 10.0.0.1.53: 24619+ [1au] A? github.com. (51) 19:04:07.063538 IP (tos 0x0, ttl 64, id 3438, offset 0, flags [none], proto UDP (17), length 83) 10.0.0.1.53 > 10.0.0.5.50378: 24619 1/0/1 github.com. A 140.82.113.4 (55)
Yes, you read that right. I went to search for google.com and what was asked for over the wire was github.com.
This told me something was altering traffic. And sure enough, there was a pass-through bridge device that was performing DNS over HTTPS.
The security device was intercepting DNS calls in the open and changing them to use go over HTTPS to some endpoint service.
The moment I turned that setting off, PROBLEM SOLVED.
dig @10.0.0.1 aaa.testdomain.tld
worked.19:05:30.080199 IP (tos 0x0, ttl 64, id 1314, offset 0, flags [DF], proto UDP (17), length 87) 10.0.0.5.63700 > 10.0.0.1.53: 43654+ [1au] A? aaa.testdomain.tld. (59) 19:05:30.080286 IP (tos 0x0, ttl 64, id 60010, offset 0, flags [none], proto UDP (17), length 91) 10.0.0.1.53 > 10.0.0.5.63700: 43654* 1/0/1 aaa.testdomain.tld. A 1.2.3.4 (63)
Everything now makes sense.
A DNS request for an external domain would get pushed over to HTTPS where it was resolved outside my network, not by pfSense.
- If I asked for something that didn't exist, like aaa.testdomain.tld, it would reply there was no domain.
- If I asked for something that is published in my public DNS hosted elsewhere, it'd work.
- If I asked over a Host Override, that request (just like all the others) would go out via HTTPS over the wire and get resolved elsewhere. (Where no Host Override information exists.)
Plus, because I'm using "the cloud" and no longer a physical DNS server in a data center where I can get to low-level logs, I had no external evidence to look at on the responding side (as I can't see what a cloud provider does, nor was pfSense getting the request).
As such, the DNS conversation was passing through pfSense via an encrypted pipe – and so what we were actually seeing were the requests and replies, and those IP addresses were the middlemen, not pfSense responding!
This also explains why logging onto the pfSense device proper, it was using its own DNS Resolver (unbound) and resolving correctly as expected.
It was acting like pfSense wasn't being queried because it wasn't.
I introduced that device about a week ago as part of diagnosing a completely different issue.
And, like I said, this problem cropped up quite a while ago — well before the device.
So now I'm curious if the issue is actually with the browser being "smarter" for security reasons.
With not-so-passive bridge device disabled from doing that at the command line, I'm now testing what happens if the browser offers to perform that service for me as well. For instance, in Firefox, setting this in Settings / General / Network Settings / [Connection] Settings:
Or over in Brave at Settings / Privacy and Security:
If DNS Over HTTPS is turned off, the browser resolves the URL (because it's going through pfSense).
Once the IP address has been resolved, the browser appears to cache that IP resolution results.
So turning it back on, even if one clears the operating system's DNS Cache where the
dig
command fails, the browser seems to be caching the DNS resolution and will correctly reload the page.This also explains behavior I didn't share from a prior experiment long ago where some browsers worked and others did not, but that was well before the other monitoring bridge device in-line.
Extra reading, after Googling to see if others have had problems:
- DNS-over-HTTPS causes more problems than it solves, experts say
- Block DNS over HTTPS (DoH), using pfsense
It appears that DNS with TLS (not HTTPS) is the better way to go.
And it looks like if you do want DNS with TLS, then let pfSense do the job, not an upstream system.
How? See DNS over TLS with pfSense by Netgate.
I would so have loved that a trouble-shooting foot note be in the Host Override to point out if they're not working, you might not be using pfSense's DNS afterall.
It's been a long road... thank you so much; having someone knowledgeable to talk this through with has been incredibly helpful.
-
@wls didn't I say doh in my first post? ;)
What security device?
I am not a fan of forwarding anything, I will resolve thank you very much... I don't care if billy my isp wants to snif my traffic and see I went to amazon.com
He knows I went there anyway via the sni that is in the clear via the traffic anyway. Until such time that esni or now its ech is actually a thing, trying to hide where your going via encrypting your dns isn't doing much of anything.
But yes dot is much better than doh - because you can set it up if you want very simple on your NS your running locally, and its easy to block if you don't want your devices to use it.
Doh is just sneaky BS to get your traffic to be sent to them, they throw in the more secure word to think they provide you something. When what they want is to be able to serve you ads, and not let your filter locally.
Glad you finally got to the bottom of it.
-
For visiting readers, the solution is above — this is just follow up. And additional thanks.
@johnpoz -- Yup, you called it. ;)
So, in a post-mortem, one should ask how that got missed by me.
John's answer, "your "browser" or what your using to try and resolve these fqdn isn't using your dns - browsers like to use doh now.. which would point to some dns outside of your control" mentioned it, but my brain didn't pick up on it.
When I read the phrase "fdqn isn't using your dns", I immediately turned to my client's operating system's DNS resolvers – and that pointed to pfSense. It simply did not dawn on me that something else might be transporting, what I thought was strictly local, DNS resolution outside my network. And, as so, I wasn't in a mindset to go looking for that.
And, if I'm honest with myself, when I read "browsers like to use doh now" I completely misinterpreted that part of the sentence entirely. As the abbreviation doh all lowercase did not trigger anything to me as, at the time I read it, it was an unknown acronym that I didn't look up because was thinking he was saying this about 'browsers not using fully qualified domain names' (which he was not):
And, given that I had just – in err – having proved to myself I was pointing at pfSense, which it was, missed the subtly of what was actually being said.
'doh' didn't catch my attention, 'DoH' might have, but I was well down a different rabbit hole.
Having never have encountered this kind of problem before, and aware DoH (DNS over HTTPS) was in use just fine in other environments I've messed with, I didn't give it the second thought that it so rightly deserved.
Instead, I got hyper-focused on the 'forwarding' and 'DNSSEC' part, which were two things I went off to read about (and now in hindsight may not fully grok in the context of pfSense's unbound DNS Resolver).
The security device is a Firewalla Gold.
The way the hardware lab is set up is akin to this at the moment (it changes often):
At any given time either (or both) Firewalla devices may be removed; they tend to sit in bridge mode and send live notifications to my phone of devices being on the network that shouldn't be there (and give good insight to device details that are), further limit traffic by region, and provide alerts for certain kinds of activity or data usage patterns.
It has identified a number of devices sleeping in the closet I'd forgotten about and my DHCP pool was happily assigning addresses to, but I was having a hard time tracking down. It has also alerted me to rogue behaviors certain applications and devices do at off hours; stuff I would not have thought to dig through logs to find.
While I know they can act as firewalls themselves, I am extremely fond of pfSense and enjoy the low-level bit-twiddling granularity it provides. But there are high-level features pfSense simply does not provide (or I am ignorant of), and this fills those gaps when I'm not using the devices at external locations.
I'm not interested in having Firewalla be "the" firewall, as this configuration isn't for a home (rather a personal lab for education purposes), and I have no minors requiring content restrictions or scheduled access times.
I am acutely aware that it is possible to forego the need for two Firewalla devices (or even one) by integrating VLAN rules. Each Firewalla device has its own profiles, which I can compare behaviors.
The current set of experiments are around keeping devices on various networks from seeing each other at level 2 and/or level 3 layers or only allowing certain kinds of access, such as from one network but not the other.
I'm with you on not forwarding DNS, your arguments are sound, and I've been reading a number of security articles that say the same thing – much of the "protection" you think you're getting isn't real, as it can easily be determined by other ways.
I'll have to look into DNS Over TLS (DoT), as that was entirely new to me as well. Part of the adventure has been stepping into the deeper part of the networking pool to improve my own understanding, and I hit that point where the gentle tapering swimming pool bottom suddenly drops to well beyond my height.
Thank you again for all your help, insight, and advise.