Unbound seems to be restarting frequently
-
BTW - I see alot of people doing alot of really silly things with pfsense and the one thing all those people have in common it they all think their way should be fine and its got to be that pfsense is brrrrrroooooooked.
Anyway - Hope you figure it out.
-
BTW - I see alot of people doing alot of really silly things with pfsense and the one thing all those people have in common it they all think their way should be fine and its got to be that pfsense is brrrrrroooooooked.
Anyway - Hope you figure it out.
I did. Solution is not to use unbound.
-
Well - For forwarding, I agree with you there. Even if unbound worked flawlessly for you, I'd recommend dnsmasq for forwarding and unbound for recursive resolver.
-
Compare time of these:
In the DHCP log :
Apr 22 23:20:41 dhcpd: Wrote 250 leases to leases file.
Apr 22 23:20:41 dhcpd: Wrote 0 new dynamic host decls to leases file.
Apr 22 23:20:41 dhcpd: Wrote 0 deleted host decls to leases file.with these (in the Resolver log):
Apr 22 23:20:41 unbound: [68253:0] info: start of service (unbound 1.5.3).
Apr 22 23:20:41 unbound: [68253:0] info: service stopped (unbound 1.5.3).I tend to say: when leases are written, unbound is restarted.
Or: every time a new lease (unkown before) pops in, unbound is restarted.
If you have many (many !) device on your LAN (or a very short DHCP lease time like the default 7200 sec) then ….. well .... you'll see what you see.Read also https://forum.pfsense.org/index.php?topic=88240.msg514643#msg514643
-
Could be - But is that broken or is that normal function?
(I'm definitely not seeing lots of restarts BTW)
-
Compare time of these:
In the DHCP log :
Apr 22 23:20:41 dhcpd: Wrote 250 leases to leases file.
Apr 22 23:20:41 dhcpd: Wrote 0 new dynamic host decls to leases file.
Apr 22 23:20:41 dhcpd: Wrote 0 deleted host decls to leases file.with these (in the Resolver log):
Apr 22 23:20:41 unbound: [68253:0] info: start of service (unbound 1.5.3).
Apr 22 23:20:41 unbound: [68253:0] info: service stopped (unbound 1.5.3).I tend to say: when leases are written, unbound is restarted.
Or: every time a new lease (unkown before) pops in, unbound is restarted.
If you have many (many !) device on your LAN (or a very short DHCP lease time like the default 7200 sec) then ….. well .... you'll see what you see.Read also https://forum.pfsense.org/index.php?topic=88240.msg514643#msg514643
But even though it restarts, it doesn't lose any "resolving" capability, right?
-
Here is a clip from my logs:
Apr 15 20:02:42 unbound: [53580:0] info: start of service (unbound 1.5.3).
Apr 15 20:02:42 unbound: [53580:0] notice: init module 1: iterator
Apr 15 20:02:42 unbound: [53580:0] notice: init module 0: validator
Apr 15 20:02:42 unbound: [53580:0] notice: Restart of unbound 1.5.3.
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: requestlist max 6 avg 3.88889 exceeded 0 jostled 0
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: 9 queries, 0 answers from cache, 9 recursions, 0 prefetch
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: requestlist max 21 avg 11.6667 exceeded 0 jostled 0
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: 39 queries, 0 answers from cache, 39 recursions, 0 prefetch
Apr 15 20:02:42 unbound: [53580:0] info: service stopped (unbound 1.5.3).
Apr 15 20:02:42 unbound: [53580:0] info: start of service (unbound 1.5.3).
Apr 15 20:02:42 unbound: [53580:0] notice: init module 1: iterator
Apr 15 20:02:42 unbound: [53580:0] notice: init module 0: validator
Apr 15 20:02:42 unbound: [53580:0] notice: Restart of unbound 1.5.3.
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: requestlist max 2 avg 1.33333 exceeded 0 jostled 0
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: 3 queries, 0 answers from cache, 3 recursions, 0 prefetch
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: 1 queries, 0 answers from cache, 1 recursions, 0 prefetch
Apr 15 20:02:42 unbound: [53580:0] info: service stopped (unbound 1.5.3).
Apr 15 20:02:42 unbound: [53580:0] info: start of service (unbound 1.5.3).
Apr 15 20:02:42 unbound: [53580:0] notice: init module 1: iterator
Apr 15 20:02:42 unbound: [53580:0] notice: init module 0: validator
Apr 15 20:02:42 unbound: [53580:0] notice: Restart of unbound 1.5.3.
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: requestlist max 0 avg 0 exceeded 0 jostled 0
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 1: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0
Apr 15 20:02:42 unbound: [53580:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch
Apr 15 20:02:42 unbound: [53580:0] info: service stopped (unbound 1.5.3).
Apr 15 20:02:42 unbound: [53580:0] info: start of service (unbound 1.5.3).
Apr 15 20:02:42 unbound: [53580:0] notice: init module 1: iterator
Apr 15 20:02:42 unbound: [53580:0] notice: init module 0: validator
Mar 29 23:23:26 unbound: [88947:0] info: start of service (unbound 1.5.3).
Mar 29 23:23:26 unbound: [88947:0] notice: init module 1: iterator
Mar 29 23:23:26 unbound: [88947:0] notice: init module 0: validator
Mar 29 23:23:26 unbound: [88947:0] notice: Restart of unbound 1.5.3.
Mar 29 23:23:26 unbound: [88947:0] info: 2.000000 4.000000 1
Mar 29 23:23:26 unbound: [88947:0] info: 1.000000 2.000000 1
Mar 29 23:23:26 unbound: [88947:0] info: 0.524288 1.000000 8
Mar 29 23:23:26 unbound: [88947:0] info: 0.262144 0.524288 50
Mar 29 23:23:26 unbound: [88947:0] info: 0.131072 0.262144 91So, to me, it looks like its only really restarting when its supposed to to deal with attacks against it, badly formed DNS entries etc etc.
It went 15 days without a restart from Mar 29 - April 15 and whats today? 23rd? so its been 8 days since last restart of unbound?Doesn't seem too broken to me….
However, I really do think setting matter. Alot.
-
I'm guessing you have "Register DHCP leases in the DNS Resolver" disabled, or you don't have DHCP leases with dynamic address.
I've just turned this off in my configuration, and waiting/watching to see if it resolves my issues.
-
Nope - I have those things enabled. Both of them.
But I don't have IP conflicts and nor am I connected to one machine with both wire and wireless.
And the general settings matter and so does the advanced setting in unbound.
Without looking at people's pfsense with my own eyes and going through all the setting its hard to know what is causing some people problems.
-
Nope - I have those things enabled. Both of them.
Hmm… only one of these things is a setting (in the General tab) that can be enabled. The other is a dynamic artifact of the environment and DHCP mapping.
But I don't have IP conflicts and nor am I connected to one machine with both wire and wireless.
I also do not have this.
And the general settings matter and so does the advanced setting in unbound.
Just for clarity, can you explain what you are referring to in the settings? In particular, is there anything other than "Enable DNSSEC Support" and "Register DHCP leases in the DNS Resolver" (mentioned above) that you are referring to?
Thanks
-
I can tell you my settings…
Enabled:
DNSSEC
DHCP Registration
Static DHCP
In advanced Settings besides what came enabled, I also Enabled:
Hide Identity
Hide Version
Prefetch Support
Prefetch DNS Key Support
Harden DNSSEC data
Unwanted Reply Threshold 10 millionI have no DNS IPs in the general settings of pfsense. None.
-
But even though it restarts, it doesn't lose any "resolving" capability, right?
Depends. Often it takes about 30 seconds after a restart before it starts resolving, long enough to be annoying since it happens all the time on some routers.
-
But even though it restarts, it doesn't lose any "resolving" capability, right?
Depends. Often it takes about 30 seconds after a restart before it starts resolving, long enough to be annoying since it happens all the time on some routers.
Interestingly, I have two exact copies of pfSense now, but different hardware. unbound bugs out on one of them (config is 100% equal, I differed the conf)
Works: atom, 2gb, 1 intel pro nics
Fails: i5 4 cores, 8gb, intel pro nicsIt's not the nice, because they're physically the same, I switch between tests.
Adding more memory shouldn't be a problem either.
So is it the fact that it's 4-cores? I guess more things can go wrong if there's issue with synchronization code, but that's just speculation.
At any rate, an interesting result. Note that the only problem on the 4-core is unbound, everything else works great in pfSense.
wtf
-
Another one here:
I got 2.2.2 running on different hardware (Dell Server, "normal" PC, etc.). All with the same issue that unbound keeps restarting. Single WAN and multi WAN configs. On all of them there are issues with unbound restartingI'm using unbound for three main reasons:
1. DNSsec
2. Prefetch
3. Recursive queriesThe bad thing is that the constant restarts (probably 10-20 per day) basically make the cache and the Prefetch and also the cache useless and it is a big hit on overall DNS performance.
I was reading the whole thread and also the linked ones and I'll try a few things now but I really want to me to add myself to the list of users so the devs are aware: There is a problem. :'(Btw, on a big test network I got the unbound service running on a Windows 7 box. I'm using the same config as in pfSense. Guess what… It's running completely stable. No restarts or whatsoever.
-
Btw, on a big test network I got the unbound service running on a Windows 7 box. I'm using the same config as in pfSense. Guess what… It's running completely stable. No restarts or whatsoever.
This Windows box is also the DHCP server ?
Without DHCP serving, unbound won't restart on pfSEnse. -
Btw, on a big test network I got the unbound service running on a Windows 7 box. I'm using the same config as in pfSense. Guess what… It's running completely stable. No restarts or whatsoever.
This Windows box is also the DHCP server ?
Without DHCP serving, unbound won't restart on pfSEnse.No, it's no DHCP server and even if it was I doubt that would make a difference. There SHOULD be no reason for a DHCP server to be able to restart a DNS service (unless there is something in the code that forces/allows it which I expect is the issue with pfSense).
So even if I added a DHCP server to the Windows 7 machine I'm nearly certain it wouldn't make a difference. -
(unless there is something in the code that forces/allows it which I expect is the issue with pfSense).
And there is.
Read my posts above, or look at the (PHP) code: when the system hosts file changed (because of a new entry, put in by the DHCP server) then unbound is reloaded …. (it shows in the logs : stopped -> started).Running unbound on pfSEnse without DHCP server (so the system hosts file is nearly static), unbound will NOT get reloaded ....
-
Differences between your and my configuration are the DHCP parameters and the prefetch.
I will set prefetch aside as I doubt it is relevant to the issue at hand.
Static DHCP registration would only have affect at initial startup or static table edit, so I think we can set that aside as well.
Dynamic DHCP address registration would appear to be the key issue. Based on the logs you posted, you have essentially no dynamic DHCP activity in your network. None between March 29 and April 15. The people experiencing the problem have lots of dynamic DHCP activity.
I don't think it's your settings that make you immune, I think it's your environment.
As an experiment, I turned off DHCP registration last night. I have not had a issue since. This isn't how I want to run, but I am more interested in general DNS reliability than being able to resolve dynamic client names.
I can tell you my settings…
Enabled:
DNSSEC
DHCP Registration
Static DHCP
In advanced Settings besides what came enabled, I also Enabled:
Hide Identity
Hide Version
Prefetch Support
Prefetch DNS Key Support
Harden DNSSEC data
Unwanted Reply Threshold 10 million -
I am using static leases in DHCP for everything that is normally present (minus wireless for android devices and visitors etc)
I wouldn't want to register every DHCP client in DNS if I was hosting open free wifi to 100s of random users though.
I just can't imagine the need. Why?
-
This bug is incredibly annoying, as resolves fails for quite a few minutes after restart.
Is there a ticket on this, or is it being worked on already?