DNS Resolver needs a constant reboot to work
-
I have no idea why unbound (dns resolver) is stopping, but i have had a few occurences of the same. And that lead me to install the package : Service Watchdog , and let it monitor (and restart) unbound, if it stops.
This doesn't fix the issue as of: Why unbound is stopping.
But it does fix the "DNS outage"/Bingo
-
@scottlindner said in DNS Resolver needs a constant reboot to work:
The DNS Lookup worked in pfSense. So then I tried pinging gmail.com in pfSense and it worked
I do the same test :
[22.05-RELEASE][admin@pfSense.bhf.net]/root: host gmail.com gmail.com has address 142.251.208.133 gmail.com has IPv6 address 2a00:1450:400d:803::2005 gmail.com mail is handled by 10 alt1.gmail-smtp-in.l.google.com. gmail.com mail is handled by 20 alt2.gmail-smtp-in.l.google.com. gmail.com mail is handled by 40 alt4.gmail-smtp-in.l.google.com. gmail.com mail is handled by 30 alt3.gmail-smtp-in.l.google.com. gmail.com mail is handled by 5 gmail-smtp-in.l.google.com. [22.05-RELEASE][admin@pfSense.bhf.net]/root: nslookup gmail.com Server: 127.0.0.1 Address: 127.0.0.1#53 Non-authoritative answer: Name: gmail.com Address: 142.251.208.133 Name: gmail.com Address: 2a00:1450:400d:803::2005
We all agree that unbound is answering here, as it listens on 127.0.0.1 port 53.
But, lets not asume, lets fact check :
[22.05-RELEASE][admin@pfSense.bhf.net]/root: sockstat | grep 'unbound' unbound unbound 44608 3 udp4 *:53 *:* unbound unbound 44608 4 tcp4 *:53 *:* ......
( I removed some non related lines )
So, for UDP and TCP, on port 53, on all interfaces, which includes 127.0.0.1 (and your LAN, WAN, whatever you have)
Now, stop the unbound process : I get process ID of unbound :
[22.05-RELEASE][admin@pfSense.bhf.net]/root: cat /var/run/unbound.pid 78510
With this number, I kill unbound :
[22.05-RELEASE][admin@pfSense.bhf.net]/root: kill 78510
Now, do a host and nslookup again.
Long delays, no answer.Check with sockstat | grep 'unbound' and you'll see that unbound isn't there anymore.
If you understood what I described above, you agree that something isn't right.
unbound is running - it isn't dead. Otherwise you could not have answers when running the host and nslookup commands on the pfSense command line.For me, it looks like unbound doesn't receive DNS requests on the LAN interface, this is where the DNS requests come in from a LAN based device, the device that runs your browsers.
But that ciould be many things.
Like :
Bad Wifi.
Bad cable.
Bad network settings.
Are you even using pfSense as a DNS from this device ?
And yes, it can happen that unbound doesn't listen on LAN anymore. But wait : you can test this !!And there is also a unbound setting ( go here Services > DNS Resolver > General Settings ) and look for DHCP Registration - Register DHCP leases in the DNS Resolver. If this is checked, then unbound will restart on every incoming DHCP lease request. If you have a device that ask for a new lease every xx seconds (because, for example, its wifi is bad) then your unbiound restart every xx seonds. That's probably not what you want. As during restart, DNS isn't available. And you think it's 'unbound' that fails.
If this is your issue : easy to check.
Look at the DHCP logs : is there a lot of activity == lease requests ?
Look at the DNS Resolver Logs : are there a lot of "unbound stop" and subsequent "unbound start" lines ? You can stop this frequent unbound stop and starts : uncheck the "DHCP Registration" option, save and reload unbound.You said : my browser doesn't show the web page anymore.
That's an end user reflection.
You, as an admin, should know better.
Your web page doesn't work ? Ok, open a command line (cmd) and executenslookup
Make it a bit verbose
set debug
now ask for gmail.com
Serveur : pfSense.bhf.net Address: 192.168.1.1 ------------ Got answer: HEADER: opcode = QUERY, id = 10, rcode = NOERROR header flags: response, want recursion, recursion avail. questions = 1, answers = 1, authority records = 0, additional = 0 QUESTIONS: gmail.com, type = A, class = IN ANSWERS: -> gmail.com internet address = 142.250.75.229 ttl = 216 (3 mins 36 secs) ------------ RƩponse ne faisant pas autoritƩ : ------------ Got answer: HEADER: opcode = QUERY, id = 11, rcode = NOERROR header flags: response, want recursion, recursion avail. questions = 1, answers = 1, authority records = 0, additional = 0 QUESTIONS: gmail.com, type = AAAA, class = IN ANSWERS: -> gmail.com AAAA IPv6 address = 2a00:1450:400d:803::2005 ttl = 216 (3 mins 36 secs) ------------ Nom : gmail.com Addresses: 2a00:1450:400d:803::2005 142.250.75.229 >
You saw at the top who was asked DNS details. For me, it was :
Serveur : pfSense.bhf.net Address: 192.168.1.1
And while you are on the command line, execute
ipconfig /all
and you will see what DNS your system want to use :
Serveurs DNS. . . . . . . . . . . . . : 192.168.1.1 2001:470:xxxx:5c0:2::1
192.168.1.1 is my pfSense.
if all this is ok, you can check the DNS traffic between your device and pfSense.
You use pfSense, so you have tons of options. I'll propose two of them.
The fist one :
Go to this page Services > DNS Resolver > Advanced Settings
and locate :
set it to "Level 3 : Query level information". Save and Apply changes.Now, on your device, open a web site you did not visit before.
( or flush your local device DNS cache first, a PC does this with "ipconfig /flushdns" )Now, have a look at the DNS Resolver page, and Ctrl-F the host name you were looking for.
You see all the DNS related to this web site / host name.
And yes, there will be a lot of lines.Btw : Do not forget to set the logging level back to "1 : basic logging".
Also :
Use these settings :That is : select "All" for both settings ( see other forum threads about what can happen if you select your own interfaces. My advise is : don't )
The second one :
Go to Diagnostics > Packet Capture.Set these :
Btw : 192.168.1.6 is the IP of my device, change it for your device IP.
And hit start at the bottom.
Browse some random sites on your device.
Hit Stop.You should see this :
You can now switch to "Level of detail" to High, and redo a capture.
Take note : If there is no traffic from your device, the capture is empty, this means traffic never reached pfSense.
edit : I'm not a fan of "Service Watchdog".
unbound never died on me. No any other package.
As every body else, as an pfSense admin, I'm very capable of rendering my pfSense unstable.
For the last 10+ years, I always managed to get a stable situation back by applying the default settings, and/or removing gadgets.
If, for some device, unbound was unstable, I would shift to plan B right away : de activate unbound, activate the forwarder, dnsmasq, as it is still available.@bingo600 said in DNS Resolver needs a constant reboot to work:
I have no idea why unbound (dns resolver) is stopping,
unbound, when launched as a background process, never stops.
It will stop if it was signaled by some other process to stop - see the DHCP leases event mentioned above.
Or some interface went away, or came back. Like a WAN interface that changes it's WAN IP.
unbound can, as any piece of software, more or less, crash. Most of the time it will log in its log file (DNS Resolver) what happened. This is an important piece of information and should be communicated on the forum, so a work around can be proposed. -
@bingo600 said in DNS Resolver needs a constant reboot to work:
This doesn't fix the issue as of: Why unbound is stopping.
But it does fix the "DNS outage"/Bingo
To be honest, that is really all I care about. I'm a home user and I don't have time to reverse engineer the challenges of pfSense. I used to be into that but it just isn't worth it to me anymore. The volume of unbound posts kinda says something just isn't right and it isn't idiot users. That's why I bought a Netgate five years ago because I wanted it to be truly turn key. I have been looking at alternatives to my Netgate lately because of this issue. I just want to buy my way out of unreliable Internet. I just want my internet to work. I haven't done anything crazy with my pfSense configuration and I find this to be truly madenning. I used to praise pfSense over the years and this DNS Resolver failing issue seems to have sprung up with later releases is madening because I havne't done anything special or funny.
I just setup watchdog and I have it set to notify me when it restarts it. I'm curious to see if I get emailed often or not.
-
@scottlindner said in DNS Resolver needs a constant reboot to work:
....... I just want my internet to work.
And I believe you.
I'm running a company behind mine, that is, it's a hotel, so I share (captive portal) my connection with my clients.@scottlindner said in DNS Resolver needs a constant reboot to work:
This is becoming unusable and I don't have a clue what to do to fix it.
That's why I was answering with 'some' details.
I guess you testing them will take you less time as me writing up the post.You clearly stated in you second post that resolving worked for pfSense, but not your desktop device.
That is, IMHO, uncommon.If you do 'nothing', then at least : use 100 % Netgate Resolver default settings.
But un check "DHCP Registration" on that Resolver settings page, you probably understood by now why I say this.Btw : is a 2220 a amd or arm device ?
-
@gertjan said in DNS Resolver needs a constant reboot to work:
@scottlindner said in DNS Resolver needs a constant reboot to work:
....... I just want my internet to work.
And I believe you.
I'm running a company behind mine, that is, it's a hotel, so I share (captive portal) my connection with my clients.Ya. So you have a lot more responsibilitis than I do, but it's also your paid job. This is my home network and as a single dad of three kids that play hockey and I coach all three of their teams, I really don't have the time anymore to be a network engineer like I used to. That is why I bought the Netgate to begin with because I wanted turnkey. It used to be super reliable and now it isn't. I haven't changed anything. It just became unstable. I have restarted from default several times and this issue keeps coming back so I'm feeling it's newer versions of pfSense... somehow.
@scottlindner said in DNS Resolver needs a constant reboot to work:
This is becoming unusable and I don't have a clue what to do to fix it.
That's why I was answering with 'some' details.
I guess you testing them will take you less time as me writing up the post.I had started a reply appreciating the great detail you put in. Was doing some digging here first.
You clearly stated in you second post that resolving worked for pfSense, but not your desktop device.
That is, IMHO, uncommon.Yah. That is what kinda blew me away. I didn't expect that at all.
If you do 'nothing', then at least : use 100 % Netgate Resolver default settings.
But un check "DHCP Registration" on that Resolver settings page, you probably understood by now why I say this.It hasn't been checked. I haven't ever checked it. I havent touched DNS Resolver settings... ever. I am using pfSense almost completely default. I do have a hunch though related to some of the non-default stuff I have done. I do have two VLANs for separation of traffic. Basically IoT devices that just need Internet and everything else. The IoT network doesn't have access to pfSense GUI. But the computer I'm using is wired on the everything else VLAN. I have been doing this for five years, so it isn't like this is a sudden change. BUT... I have this hunch that it might be related to OpenVPN. Twice now it seems I have stability issues after hitting my home network from remote using OpenVPN on pfSense. It's just a hunch for now, but I am starting to suspect it is related to the OpenVPN Server I have configured.
Btw : is a 2220 a amd or arm device ?
Neither. It's an Intel Atom.
-
@scottlindner said in DNS Resolver needs a constant reboot to work:
Neither. It's an Intel Atom.
Like mine (Netgate SG 4100) : an atom is known as an 'amd' :
(don't ask why an atom is known as amd ;) )
We're using both the same unbound version, unbound 1.15.0. Probably also the same pfSense Plus version 22.05.
I'm using also the OpenVPN server.
You use VLANs : I can imagine that the slightest VLAN setting change on the SG 2220 can make your LAN unreachable.
Although : just restarting unbound wouldn't repair that. So I rule out VLAN issues.There was a big 'what the heck is going on with unbound 0.15.0' forum thread a while back ago. I'll go over it, check if something matches your description.
-
@gertjan said in DNS Resolver needs a constant reboot to work:
(don't ask why an atom is known as amd ;) )
That's just the architecture it was compiled for.
And AMD was first w 64bit/Bingo
-
@gertjan said in DNS Resolver needs a constant reboot to work:
(don't ask why an atom is known as amd ;) )
Gonna guess it's related to how software was packaged, built and labelled long ago when AMD and Intel had different builds and now they don't.
We're using both the same unbound version, unbound 1.15.0. Probably also the same pfSense Plus version 22.05.
Yup!
I'm using also the OpenVPN server.
You use VLANs : I can imagine that the slightest VLAN setting change on the SG 2220 can make your LAN unreachable.
Although : just restarting unbound wouldn't repair that. So I rule out VLAN issues.Yah. I do too. And I rule out physical network issues. I have run Cat6 to all rooms and have multiple PoE APs in throughout the house to ensure everything has solid network. I feel this is truly a behavoiral/configuration type of thing but why now? Why not with earlier versions of pfSense? I have been running this Netgate since mid 2017 and this has been an issue for about a year or so.
There was a big 'what the heck is going on with unbound 0.15.0' forum thread a while back ago. I'll go over it, check if something matches your description.
Ya. I know a bunch of folks have had issues with unbound for various reasons. Appreciate you looking. I'm trying hard to find anything to correlate the issue with. I'm even losing confidence it's related to OpenVPN because of how quickly it happened yesterday after an unbound restart and there is nothing in the logs that makes any sense.
-
This is the thread : Slow DNS after 22.0
It talks about 'buffers' and IPv6.Btw : a thread like that, I hate it.
I've been trying a lot to create the same unbound behaviour.
I'm using an Intel Atom, a Netgate SG 4100 device, I do use IPv6 on my LANs, not that I'm really needing it.
I'm using pfBlockerng-devel, with a "restart reload feeds ones a week" as I don't want unbound to get restarted every 60 minutes or so, because some one somewhere added one DNSBL on some list I use.The thing is : if we use the same hardware (a Netgate device), the some software, then what is different ?
Our LAN : the cables, switches and devices.I guess, my unbound knows I'm watching him.
See here what I use so I can see what happens when. That's my unbound. It restart a lot, because, while I'm writing here, I do try stuff before posting, my unbound gets restarted.
But when I'm not poking in my pfSense, you can see unbound runs for days or weeks without a restart.
Of course, if unbound stops handling DNS for my LANs, credit card machines start to fail .... and then all hell breaks loose. As money stops coming in.An there is of course Unbound 1.15.0 released and this version was replaced by 0.15.1 , 0.15.2 etc.
These sub versions came out because of 'issues'. pfSense doesn't allow us (easily) to try all these versions, its "whatever they chose" up until the next pfSense release comes out.What 0.15.0 changed, according to nlnet labs, the author, was DNSSEC related.
What about shutting down DNSSEC entirely for a while ? -
This post is deleted! -
@gertjan said in DNS Resolver needs a constant reboot to work:
This is the thread : Slow DNS after 22.0
It talks about 'buffers' and IPv6.Btw : a thread like that, I hate it.
I've been trying a lot to create the same unbound behaviour.
I'm using an Intel Atom, a Netgate SG 4100 device, I do use IPv6 on my LANs, not that I'm really needing it.
I'm using pfBlockerng-devel, with a "restart reload feeds ones a week" as I don't want unbound to get restarted every 60 minutes or so, because some one somewhere added one DNSBL on some list I use.The thing is : if we use the same hardware (a Netgate device), the some software, then what is different ?
Our LAN : the cables, switches and devices.I guess, my unbound knows I'm watching him.
See here what I use so I can see what happens when. That's my unbound. It restart a lot, because, while I'm writing here, I do try stuff before posting, my unbound gets restarted.
But when I'm not poking in my pfSense, you can see unbound runs for days or weeks without a restart.
Of course, if unbound stops handling DNS for my LANs, credit card machines start to fail .... and then all hell breaks loose. As money stops coming in.An there is of course Unbound 1.15.0 released and this version was replaced by 0.15.1 , 0.15.2 etc.
These sub versions came out because of 'issues'. pfSense doesn't allow us (easily) to try all these versions, its "whatever they chose" up until the next pfSense release comes out.What 0.15.0 changed, according to nlnet labs, the author, was DNSSEC related.
What about shutting down DNSSEC entirely for a while ?I had already disabled DNSSEC from a while ago when I was trying to bang on this issue before.
Is there a way to reset DNS Resolver settings to default? I couldn't find anything. I just want to be sure I haven't done something unintentional as I have been trying to get my Internet to work without having to man the pfSense console constantly.
-
Crap. I tried upgrading to the latest experimental just to try to force my way out of this problem and now upgrading is jacked too. It failed with meaningless "unable to..." errors and now the Update system fails to check for updates. Guess I gotta rebuild this thing... again^3.
If I save off the XML is there an effective way to hack the XML for a clean install but to make sure Unbound is set to defaults? I know I can just manually rebuild everything and maybe that's what I'll do. Getting the VLANs back to working is the pivotal part for me because until that happens I gotta use a laptop connected to the Netgate.
-
@scottlindner said in DNS Resolver needs a constant reboot to work:
Crap. I tried upgrading to the latest experimental
You went from 22.05 to something new ?
I wouldn't dare doing so.Read some posts from here : Home > pfSense Software > Development should make you think otherwise.
@scottlindner said in DNS Resolver needs a constant reboot to work:
Guess I gotta rebuild this thing... again
You have a good backup up config file when you left 22.05 ? That's the one you need.
If not, take one from here : /cf/conf/backup : look at the date / time stamp, and use the one you where using with 22.05.Installing a fresh 22.05, assign a minimal LAN + WAN + import xml + reboot and you'll be back to square one.
edit : extra info :
https://forum.netgate.com/topic/174248/need-help-troubleshooting-dns-after-upgrade-to-22-05
-
@gertjan said in DNS Resolver needs a constant reboot to work:
@scottlindner said in DNS Resolver needs a constant reboot to work:
Crap. I tried upgrading to the latest experimental
You went from 22.05 to something new ?
I wouldn't dare doing so.Read some posts from here : Home > pfSense Software > Development should make you think otherwise.
@scottlindner said in DNS Resolver needs a constant reboot to work:
Guess I gotta rebuild this thing... again
You have a good backup up config file when you left 22.05 ? That's the one you need.
If not, take one from here : /cf/conf/backup : look at the date / time stamp, and use the one you where using with 22.05.Installing a fresh 22.05, assign a minimal LAN + WAN + import xml + reboot and you'll be back to square one.
edit : extra info :
https://forum.netgate.com/topic/174248/need-help-troubleshooting-dns-after-upgrade-to-22-05
I just reinstalled and restored the latest config. The update check is now working. I'm kinda hopeful my DNS Resolver issues just magically go away. I did save off a copy of the config XML in the default fresh installation state so I can hack an XML file to restore certain features to default. This being the key that I was looking for in this particular thread:
<unbound> <enable></enable> <dnssec></dnssec> <active_interface></active_interface> <outgoing_interface></outgoing_interface> <custom_options></custom_options> <hideidentity></hideidentity> <hideversion></hideversion> <dnssecstripped></dnssecstripped> </unbound>
This is the unbound config that I restored.
<unbound> <enable></enable> <active_interface>all</active_interface> <outgoing_interface>all</outgoing_interface> <custom_options></custom_options> <hideidentity></hideidentity> <hideversion></hideversion> <dnssecstripped></dnssecstripped> <port></port> <sslcertref>589929552f3cb</sslcertref> <regdhcpstatic></regdhcpstatic> <system_domain_local_zone_type>transparent</system_domain_local_zone_type> <tlsport></tlsport> </unbound>
Would any of those differences cause issues? Seems pretty trivial and more like skeleton config rather than influential.
-
One reason I love running my Netgate. Never would have gotten this with a traditional residential router.
-
This is interesting. After reinstalling pfSense and restoring my configuration things seem much more stable. I know it's still early to know for sure. I even shut off the reboot and unbound restart cron jobs and things are good so far. My previous pfSense installation was a lot of upgrades so I'm wondering if some configuration got left behind that a clean install took care of that doesn't show up in the XML but does on the device OS.
In the future when there is an update available for my Netgate, I'm going to do a clean install and config restore rather than upgrade.
-
I have the exact same problem here with a 5100. Pfsense stops providing DNS services to everything on our internal network. unbound has not stopped. It shows it is still running. I can do a ping from the Diagnostic menu and it resolves the IP just fine and the ping works but only from inside the pfsense gui. I have not tried it when ssh'd into the 5100 yet.
Fortunately for us it only does this every few weeks. So when it does I log into the gui and restart unbound and everything works again. I am also running OpenVPN. I might be retiring OpenVPN soon for Wiregaurd. We will see if that changes anything.
-
@t__2 said in DNS Resolver needs a constant reboot to work:
I have the exact same problem here with a 5100. Pfsense stops providing DNS services to everything on our internal network. unbound has not stopped. It shows it is still running. I can do a ping from the Diagnostic menu and it resolves the IP just fine and the ping works but only from inside the pfsense gui. I have not tried it when ssh'd into the 5100 yet.
Mine seems to be stable after a fresh install of pfSense. I saved off the config, installed fresh, loaded config and I seem to be fine now. Although I'm not going to say that with 100% confidence because sometimes it takes months.
I did have another similar issue but restarting Unbound didn't fix it but a reboot did. That happened twice so I set a Cron to reboot pfSense once a week and now I seem great. Again, waiting for long term to confirm.
-
@scottlindner Switched to wireguard removed OpenVpn for pfsense but still had the problem. Upgrade to 23.01 on 2023-04-17 and have not had to reboot unbound since. Maybe they fixed it?
-
@t__2
For me it wasn't the upgrade so much as a clean install after the upgrade. Has been totally fine since the flash install.