UNBOUND stops resolving externally
-
Folks - apologies for the early morning braindead post. I should have said UNBOUND not untangled. Dunh! Anyway...
Hi Folks - Not sure if this is an issue specific to me or with pfSense 2.4.4_p3. Unbound works fine for a period of time (varies) but then it stops resolving externally. Another DNS server (using BIND 9), has no issue resolving. The queries are being sent to Google (8.8.8.8 and 8.8.4.4) and OpenDNS (208.67.222.222 and 208.67.220.220). I've swapped them back and forth in testing in this out.
I checked the logs and nothing is showing up under General or DNS Resolver.
Any suggestions?
Thanks!
-
I have more info...
I changed the DHCP server to use my old internal DNS server (which also does external forwarding). I put my workstation to manually use pfSense and unbound.
NS lookups worked fine for about a day (just for me) and then suddenly failed lookups.I ssh'ed into pfSense and the results are
root: nslookup > news.bbc.co.uk ;; Got SERVFAIL reply from 127.0.0.1, trying next server Server: 8.8.8.8 Address: 8.8.8.8#53 Non-authoritative answer: news.bbc.co.uk canonical name = newswww.bbc.net.uk. Name: newswww.bbc.net.uk Address: 212.58.249.144 Name: newswww.bbc.net.uk Address: 212.58.244.56 ;; Got SERVFAIL reply from 127.0.0.1, trying next server >
The only error I found this error in the System | DNS Resolver logs were from about 12 hours earlier:
unbound 95721:0 info: generate keytag query _ta-4f66. NULL IN
When I check to see if the service is running on the pfSense box I get the following:
root: ps -axwwl | grep unbound 59 95721 1 0 20 0 79392 36088 kqread Ss - 0:12.90 /usr/local/sbin/unbound -c /var/unbound/unbound.conf
Unlike others if I stop and start unbound DNS starts to work again.
-
Not sure if this mean anything (I'm a Linux guy... BSD was a long time ago).
df -h Filesystem Size Used Avail Capacity Mounted on /dev/ufsid/5c65f4bb632a04f4 27G 3.8G 21G 15% / devfs 1.0K 1.0K 0B 100% /dev /dev/md0 3.4M 132K 3.0M 4% /var/run devfs 1.0K 1.0K 0B 100% /var/dhcpd/dev
Is /dev and /var/dhcpd/dev supposed to be full?
-
@nfld_republic said in UNBOUND stops resolving externally:
Is /dev and /var/dhcpd/dev supposed to be full?
Yep, that's ok.
About unbound :
When you check with a nslookup, and 127.0.0.1:53 comes back with a SERVFAIL, is unbound running ?
Test withps -axwwl | grep unbound
as you have shown above.
Also, check the "DNS log" : your unbound restarts often ?
-
@Gertjan - Thanks for the reply.
When I check the nslookup on pfSense using 127.0.0.1 unbound is running. Local hosts resolve fine as well; the external hosts fail. There is nothing in the DNS log on restarts...It seems like pfSense cannot reach out to the external DNSes (e.g., 8.8.8.8, 8.8.4.4, etc.). If is use nslookup on the server and specify 8.8.8.8 manually it can resolve. This is weird...
-
@nfld_republic said in UNBOUND stops resolving externally:
the external hosts fail
Who are these external hosts ? Devices on LAN(s) ?
What is the DNS for those devices ? The IP of pfSense ?127.0.0.1 can't be accessed by these devices. unbound is also listening all the LAN(s) IP(s) ?
Also check : how many unbound instances are present ? There should be 'one'.
No other programs like 'bind', 'dnsmasq' or some other program listening on port '53' ?Your unbound restarts often ?
-
@Gertjan - Sorry, I wan't clear.
Basic topology - all gateways are the VLAN gateways on the pfSense box:- Server subnet 192.168.20.0/24; gateway 192.168.20.254; DNS 192.168.20.254
- Trusted wired subnet 192.168.25.0/24; gateway 192.168.25.254; DNS 192.168.25.254
- Trusted WiFi subnet 192.168.30.0/24; gateway 192.168.30.254; DNS 192.168.30.254
There is an any-any all ports open rule between these three subnets (I haven't gotten around to putting those rules in yet)
When unbound stops responding to external host lookups:
- Internal clients connect to unbound using their respective gateways (e.g., 192.168.25.254, 192.168.30.254) that are on pfSense (I also tried having the clients use my server subnet gateway, 192.168.20.254, for the other subnets - it works either way. It also fails either way, too...)
- Internal clients can only resolve the internal hosts listed in Unbound
- Logging into pfSense, nslookup using localhost can only resolve the internal hosts listed in Unbound
- Internal hosts manually pointed to my other internal DNS server can resolve internal hosts and external hosts (internal DNS server forwards to 8.8.8.8/8.8.4.4)
- Internal hosts using the the public DNS (8.8.8.8/8.8.4.4) can resolve the external hosts (not internal hosts, obviously)
- Logging into pfSense, nslookup can resolve internal hosts and external hosts when using my other internal DNS
- Logging into pfSense, nslookup can resolve external hosts when using the public DNS (8.8.8.8/8.8.4.4)
Only one instance of unbound is running when I check. bind and dnsmasq are not installed.
Unbound does not show very many restarts; it seems only when I manually restart the service.
-
I've been having similar problems for the same time frame -- it started a week or so ago that my entire network stopped resolving external hosts.
First I thought it was pfBlockerNG-devel or snort so I disabled them then I tried switching from 1.1.1.1 to 9.9.9.9 since Cloudflare had issues last week, but the problem persisted until I placed unbound into forwarding mode, now my network is stable again.
Here's an article regarding Cloudflare's network outages and the Verizon BGP mishap which caused some sntability the week prior. It may be coincidental, but it seems to have happened when my local DNS resolution broke. https://arstechnica.com/information-technology/2019/07/facebook-cloudflare-microsoft-and-twitter-suffer-outages/
-
Could this be a root.hints update issue?
-
@lohphat I'm not sure about the root hints thought.
With my configuration I have had both Google (8.8.8.8/8.8.4.4) and OpenDNS (208.67.222.222/208.67.220.220) as number 2 and 3 options (127.0.0.1 - pfSense - as the initial lookup) and my ISP's DNSes as my number 4 and 5 options. The issue still persists.
Unlike one or two other issues with Unbound, I do not have to restart pfSense to have resolving work again. I just have to stop and start Unbound. Restarting Unbound doesn't seem to work consistently.
-
@nfld_republic Adding root.hints to my config didn't help either after testing.
So far, as long as I run in DNS forwarding mode things work. When I disable forwarding then DNS stops resolving after entries timeout. So for now, I'm running in forwarding mode.
-
@nfld_republic said in UNBOUND stops resolving externally:
- Not sure if this is an issue specific to me or with pfSense 2.4.4_p3
If there was something wrong with 2.4.4p3 the boards would be on fire with posts about it.. So its something with your setup that is clear.
I can not make out what actual troubleshooting you have done.. What exactly did you do with root.hints?? How did you add them to your config?
Out of the box unbound will resolve.. You do not need to add or doing anything for it to know the roots. If your having issues with resolving - then troubleshoot what that is. Can you actually talk to them?
Up the log level of unbound (say 5 for example), you could add say
server: log-queries: yes log-replies: yes
To your custom options box to get even more info... So simple test, is to just sniff on your wan do a query for something.. Do you see unbound send out queries, were there any answers? You should see it query one of the root ips..And then walk down the tree.
Simple test is to just do a dig +trace on pfsense itself.. What happens with this trace - here is example.
[2.4.4-RELEASE][admin@sg4860.local.lan]/root: dig www.netgate.com +trace ; <<>> DiG 9.12.2-P1 <<>> www.netgate.com +trace ;; global options: +cmd . 59759 IN NS m.root-servers.net. . 59759 IN NS b.root-servers.net. . 59759 IN NS c.root-servers.net. . 59759 IN NS d.root-servers.net. . 59759 IN NS e.root-servers.net. . 59759 IN NS f.root-servers.net. . 59759 IN NS g.root-servers.net. . 59759 IN NS h.root-servers.net. . 59759 IN NS i.root-servers.net. . 59759 IN NS a.root-servers.net. . 59759 IN NS j.root-servers.net. . 59759 IN NS k.root-servers.net. . 59759 IN NS l.root-servers.net. . 59759 IN RRSIG NS 8 0 518400 20190720050000 20190707040000 59944 . plkjvKNHb3XcrFeXyGWhf8GL45LKQQrXzw1pvM76BEAG5v60RGAAKd9u n+mDAXed0Um1ohtMGRx3LvdZ+MVDU4iJG1CilUypWo9QWLxJ4pYoDwV7 ati2yAJVN8UwZYYF4GPTnRM28P8qTC1HsotgZVpkx1rhLnIn3ueD0CZv LVyoHvxZLB8sjFp0CbEpxhbfapAZhEWT+lo0fndGGDafVIWvg3Mmgue0 Nku9MgezIWQSHzjCBXoNnwZOjiNJtTLNQVLj70fmD5eq65JEREGypXhb tOaifiC/E6WTpCrzNQ9zlT3OttjGFYIE/l1imo2kgqc5QsaQgeVbXm0U vv5QJg== ;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 86400 IN DS 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766 com. 86400 IN RRSIG DS 8 1 86400 20190720050000 20190707040000 59944 . ldQN2P62I2iLgn8E3uQPOp9bSU3BT3M4guSkK+JkFwpv0qqNN2NoUQ4l MiFfcD0V4oW5OX5G07+IU0kM8C5YJYZKwbh9dMJaTq7YrAHf3E9BHgNV 8iBTt4ET3mYt2WhnJQDylRgH3e/ATtfxzxSbB4k+H78ua08DrqMZRJR+ /pVynxUk14s4sEaQt/j8qgwyFcncXiZBlA/0ik76Uac/2aONRee6I99v 9drLYdfHvTmINs2bxrj0wGiLg58na0GM+0F0/xrTKGdYmkb65CWtZO0u tZhFbkD9GsoldIJ6zbh9/fE0VO6cT9VUGmYKe8eGdtS/ltbkmreAUnYO VosFJA== ;; Received 1203 bytes from 2001:500:12::d0d#53(g.root-servers.net) in 51 ms netgate.com. 172800 IN NS ns1.netgate.com. netgate.com. 172800 IN NS ns2.netgate.com. CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20190714044431 20190707033431 3800 com. BKPFq/Z6OdQj3J/veD+Ty87mCyx1yfhuW3eFuZ4g6d6JOZ+CHghL6DEL y8ztytbZxVCMHrFRl5VkSrxM9buZ2MDJnHeZBqB/LwuCncLD9DRQ/5R3 tbvu8PIWFrwvpgfyez+h5/XVEKJqszN+rFlNEsOS4iaZDw+mIn3PYOt5 T2U= 2U53SUOKS8OJJV178M90A8BMNI9USDVJ.com. 86400 IN NSEC3 1 1 0 - 2U541R8K3PEAS1JPB7275INB5153PFP8 NS DS RRSIG 2U53SUOKS8OJJV178M90A8BMNI9USDVJ.com. 86400 IN RRSIG NSEC3 8 2 86400 20190711050210 20190704035210 3800 com. Kbmv+3vi91UkcCwF6Dtks91Zo9GA5gJ60/pDF7TcDtAlVGOsv+2J4sk4 S1j48tatjLUNkP3zPcMeUvAYA7SDriCIlG1IIe6VAqZjO/MOw5fzmrVI I4dM/Flg1VHItWWfcZ0lRXeGIvoTWPzx1gvjanVI9XZqgD2Og3WGpJzI DIs= ;; Received 625 bytes from 2001:503:d2d::30#53(k.gtld-servers.net) in 62 ms www.netgate.com. 3600 IN A 208.123.73.73 netgate.com. 3600 IN NS ns1.netgate.com. netgate.com. 3600 IN NS ns2.netgate.com. ;; Received 156 bytes from 2610:160:11:11::80#53(ns1.netgate.com) in 72 ms [2.4.4-RELEASE][admin@sg4860.local.lan]/root:
You would be how unbound would resolve anything.. Talking to roots, and then down the line until it gets to the authoritative NS and then it would ask it for the record your requesting.
If your failing anywhere in that path - then yeah your going to have problems. You would then need to figure out why..
Are you using a vpn connection? Are you running any other packages, ips, pfblocker, etc?
You could have issues with resolving for a few different reasons, your on a high latency connection - sat for example. Or your isp is dicking with your dns queries, your vpn blocking it, etc. etc. You need to find out where in the resolve process your failing.
-
@johnpoz - Reponses:
- Only changes to my config from out-of-the-box are (1) to add Host Overrides for the internal hosts/IPs to be resolved and (2) to specifiy the internal network interfaces to listen on (e.g., not the DMZ or WiFi guest VLANs).
- Log level for Unbound - will change this to 5
- dig +trace example
[2.4.4-RELEASE][xxxx@xxxx.xxx.xxx/root: dig www.netgate.com +trace ; <<>> DiG 9.12.2-P1 <<>> www.netgate.com +trace ;; global options: +cmd . 81920 IN NS a.root-servers.net. . 81920 IN NS b.root-servers.net. . 81920 IN NS c.root-servers.net. . 81920 IN NS d.root-servers.net. . 81920 IN NS e.root-servers.net. . 81920 IN NS f.root-servers.net. . 81920 IN NS g.root-servers.net. . 81920 IN NS h.root-servers.net. . 81920 IN NS i.root-servers.net. . 81920 IN NS j.root-servers.net. . 81920 IN NS k.root-servers.net. . 81920 IN NS l.root-servers.net. . 81920 IN NS m.root-servers.net. . 81920 IN RRSIG NS 8 0 518400 20190720050000 20190707040000 59944 . plkjvKNHb3XcrFeXyGWhf8GL45LKQQrXzw1pvM76BEAG5v60RGAAKd9u n+mDAXed0Um1ohtMGRx3LvdZ+MVDU4iJG1CilUypWo9QWLxJ4pYoDwV7 ati2yAJVN8UwZYYF4GPTnRM28P8qTC1HsotgZVpkx1rhLnIn3ueD0CZv LVyoHvxZLB8sjFp0CbEpxhbfapAZhEWT+lo0fndGGDafVIWvg3Mmgue0 Nku9MgezIWQSHzjCBXoNnwZOjiNJtTLNQVLj70fmD5eq65JEREGypXhb tOaifiC/E6WTpCrzNQ9zlT3OttjGFYIE/l1imo2kgqc5QsaQgeVbXm0U vv5QJg== ;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 86400 IN DS 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766 com. 86400 IN RRSIG DS 8 1 86400 20190720050000 20190707040000 59944 . ldQN2P62I2iLgn8E3uQPOp9bSU3BT3M4guSkK+JkFwpv0qqNN2NoUQ4l MiFfcD0V4oW5OX5G07+IU0kM8C5YJYZKwbh9dMJaTq7YrAHf3E9BHgNV 8iBTt4ET3mYt2WhnJQDylRgH3e/ATtfxzxSbB4k+H78ua08DrqMZRJR+ /pVynxUk14s4sEaQt/j8qgwyFcncXiZBlA/0ik76Uac/2aONRee6I99v 9drLYdfHvTmINs2bxrj0wGiLg58na0GM+0F0/xrTKGdYmkb65CWtZO0u tZhFbkD9GsoldIJ6zbh9/fE0VO6cT9VUGmYKe8eGdtS/ltbkmreAUnYO VosFJA== ;; Received 1175 bytes from 193.0.14.129#53(k.root-servers.net) in 69 ms netgate.com. 172800 IN NS ns1.netgate.com. netgate.com. 172800 IN NS ns2.netgate.com. CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q1GIN43N1ARRC9OSM6QPQR81H5M9A NS SOA RRSIG DNSKEY NSEC3PARAM CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20190714044431 20190707033431 3800 com. BKPFq/Z6OdQj3J/veD+Ty87mCyx1yfhuW3eFuZ4g6d6JOZ+CHghL6DEL y8ztytbZxVCMHrFRl5VkSrxM9buZ2MDJnHeZBqB/LwuCncLD9DRQ/5R3 tbvu8PIWFrwvpgfyez+h5/XVEKJqszN+rFlNEsOS4iaZDw+mIn3PYOt5 T2U= 2U53SUOKS8OJJV178M90A8BMNI9USDVJ.com. 86400 IN NSEC3 1 1 0 - 2U541R8K3PEAS1JPB7275INB5153PFP8 NS DS RRSIG 2U53SUOKS8OJJV178M90A8BMNI9USDVJ.com. 86400 IN RRSIG NSEC3 8 2 86400 20190711050210 20190704035210 3800 com. Kbmv+3vi91UkcCwF6Dtks91Zo9GA5gJ60/pDF7TcDtAlVGOsv+2J4sk4 S1j48tatjLUNkP3zPcMeUvAYA7SDriCIlG1IIe6VAqZjO/MOw5fzmrVI I4dM/Flg1VHItWWfcZ0lRXeGIvoTWPzx1gvjanVI9XZqgD2Og3WGpJzI DIs= ;; Received 625 bytes from 192.48.79.30#53(j.gtld-servers.net) in 38 ms www.netgate.com. 3600 IN A 208.123.73.73 netgate.com. 3600 IN NS ns2.netgate.com. netgate.com. 3600 IN NS ns1.netgate.com. ;; Received 156 bytes from 162.208.119.38#53(ns2.netgate.com) in 39 ms
- VPN is not being used
- IPS, pfBlocker, etc packages not used
- ISP is 500 Mbit fibre
- When resolution using Unbound fails (both from clients and on pfSense itself), using dig with external DNS servers (e.g., Google, my ISP, etc.) work fine. It does not seem like my ISp is dicking with my DNS queries...
When I manually stop and start unbound (as opposed to restarting unbound), lookups start working again.
-
So your trace look very normal.. You can see how it walked down the tree.. Response times are good, so doesn't seem to be at this point in time any sort of connectivity issues. I would try the same test when it fails.
When it fails - can you resolve local clients, ie pfsense own name from a client? Do a dig from client on your lan to pfsense IP address where unbound listening, ie normally its lan IP.. Does this respond? example
$ dig @192.168.9.253 sg4860.local.lan ; <<>> DiG 9.14.3 <<>> @192.168.9.253 sg4860.local.lan ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 43864 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;sg4860.local.lan. IN A ;; ANSWER SECTION: sg4860.local.lan. 3600 IN A 192.168.9.253 ;; Query time: 1 msec ;; SERVER: 192.168.9.253#53(192.168.9.253) ;; WHEN: Sun Jul 07 11:01:36 Central Daylight Time 2019 ;; MSG SIZE rcvd: 61
But when you ask it for external it fails? Does it give back a specific sort of answer? ie NX or ServFail? or just give you a timeout?
When this happens - look to your now more detailed log for what could be happening...
Validate that unbound just didn't restart or in the process of a restart, etc. dhcp registration can cause unbound to restart all the time - which can present as issue with dns resolution. It will show in log if restarted..
Its possible your unbound is hanging up for some reason, which why a full stop and start vs just restart command is not working, etc.
Are you doing any sort of redirection or manipulation of dns queries via port forwards on pfsense?
-
@johnpoz - I think that is the crux of the problem. Let me see if I can sum this up:
- When working "normally" Unbound on the pfSense box resolves external and internal addresses without any problems
- When the issue occurs Unbound can still resolve the internal Host Overrides but nothing externally. This happens to internal clients and if trying to resolve on the pfSense box itself
- If you try looking up the external hosts, again from internal clients or on the pfSense box specifying an external DNS provider (e.g., Google's 8.8.8.8, etc.) the external host can be resolved
- Stopping and restarting Unbound clears the problem (but not a restart of Unbound)
- So far, I can see no commonality (e.g., number of clients resolving against Unbound, after 20 hours, or whatever) when external resolution stops working
- I am upping the log level for Unbound and when it happens I will post the logs
-
If unbound is hung, you wouldn't think it could resolve even internal when you asked pfsense from a client on your network. Never seen such an issue - not saying its not possible - just very odd... If unbound "hung" you would think it wouldn't be able to resolve anything even internal stuff.
-
@johnpoz That is my thought as well. If Unbound "hung", then the internal resolution would not work either. (Hence my "stops resolving externally" title )
Any suggestion as to some other log to look at the next time this happens?
One of my colleagues at work suggested moving from Unbound to the BIND package. Thoughts?
-
If your having issues with unbound - bind can resolve and do dnssec as well.. Bind is the dns of the internet for a reason.. But would be curious to find out what is causing your issues with unbound.
I have been using unbound on pfsense since before it was built in, and just a package.. And other than the restarts on dhcp registration restart issues have never hand any issues..
I personally see no reason for dhcp reservations of dynamic clients - clients I need to resolve I always setup a reservation for. So my setup is just register those.