Frequent unbound restarts
-
anyone knows if this has been fixed in the latest update 2.4.5-p1?
-
@jt said in Frequent unbound restarts:
anyone knows if this has been fixed in the latest update 2.4.5-p1?
Everybody knows.
See here - just above. - just above.Again : as soon as [nllabs.n](https://nlnetlabs.nl/projects/unbound/about/ (the authors) rewrites unbound to implement something that could be the solution, this wont't happen.
As such, it's not a (pfSense) bug. At most, one could say that unbound is good, but not perfect.
The easy work around is : declare static MAC DHCP leases for all the devices that you need to address 'by name' - these devices often hosts services to be accessed from your LAN.
-
This is completely wrong. Shutting off services in the device to fix outages is not acceptable. If a service is not supported, don't offer it as a feature.
Also, there is no problem with unbound. I repeat as people seem to not be getting this: THERE IS NO PROBLEM WITH UNBOUND. The problem is with the other software. Unbound has a way to reload specific zones via a command. The DHCP lease scripts incorrectly send a HUP to the process. THIS IS A BUG. It's not a design choice, it's not a different way to do it. It's broken, incorrectly written software. A lease can never affect any other zone than local so the local zone should be reloaded.
If one pays for PFSense is the support any better than this forum? Because this forum is just dismissive of seemingly every problem. IPV6 support is also broken and no one cares about that either.
-
@jasonArloUser said in Frequent unbound restarts:
This is completely wrong.
I need - we all, I guess, work arrounds, as the issue can't be solved easily.
In another thread I showed the source code of unbound : how it handles the reception of OS signal like SIGHUP for example.
What ever you send to unbound : there is no "reload the config" functionality - it restarts.
That is the reason why I tend to say : do the next best thing : limit the number of SIGHUP's that will get send to unbound. Which means : stop the DHCP registration. Not perfect, I know, because now the admin has some work to do to compensate this initial loss-of-functionality - let's say 30 seconds of wrok for every device in the network. I mean : add DHCP static leases for every device that has to have a known host name in your network.@jasonArloUser said in Frequent unbound restarts:
THERE IS NO PROBLEM WITH UNBOUND
I prefer to say : it's not perfect ;)
What I know is that the process that processes new DHCP leases so they get signalled to the DNS sub system : known as "dhcpleases" is working as it should. It signals the current (unbound inour case) the DNS system that there is a new host name available.
This dhcpleases doesn't say "here is xxx.localdomain.local with IP a.b.c.d".
This process observes the file /var/dhcpd/var/db/dhcpd.leases - maintand by the dhcpd server daemon.
When it changes, it parses /var/dhcpd/var/db/dhcpd.leases, rewrites /var/unbound/dhcpleases_entries.conf, gets the pid of unbound by reading /var/run/unbound.pid and sends a SIGHUP to it (unbound).and unbound parses all the config files again ... by restating.
It's not smart enough to detect that it 'knows' that a line (or more) was added removed (or a combination) to the /var/unbound/dhcpleases_entries.conf file ....Do you see this any different ?
The obtained the "how it works" by reading the code (the C language is part of my professional eduction). Still, why not, I could be wrong ^^Keep in mind : unbound is a light weight DNS Resolver with DNSSEC and forward capabilities. It does not advertise more.
It's not 'bind'. (bind is huge ...)Btw : I'm not judging the system as a whole, just trying to understand how it works. I like to understand why things happen. It's a needed step if solutions need to be found.
edit : in the past, and in the present, unbound stops and starts fast.
Of course, when it restarts, its DNS cache is gone ..... no good, but many didn't notice, so ok ...
Then some one came by, and invented "DNSBL" for pfSense, and how to repopulate the local DNS cache with pre build replies that made it possible to screen out some IP's and domain names. You know the one I'm talking about.
The size of the config files read by unbound at startup exploded. Before, several kilo bytes was usual. Now, check the forum : people don't even blink their eyes when the explain that they have a million or more DNSBL's in their 'unbound' config files .... Why would one stop at a certain size if you can have it all ? "Just select them all" Consequences are unknown thus none existent ....The poor unbound has to parse them all at startup ..... because a stupid DHCP lease came in.
And during that time, tens of seconds, no more DNS .... and that, that was noticed. And here we are.So, according to you : who's fault is this ? ;)
-
@Gertjan said in Frequent unbound restarts:
Keep in mind : unbound is a light weight DNS Resolver with DNSSEC and forward capabilities. It does not advertise more.
It's not 'bind'. (bind is huge ...)True, but Netgate encourages people to use this as their DNS solution. Another way to read it is that they sell $5300 devices that can't even serve DNS properly.
The poor unbound has to parse them all at startup ..... because a stupid DHCP lease came in.
And during that time, tens of seconds, no more DNS .... and that, that was noticed. And here we are.So, according to you : who's fault is this ? ;)
It is true that a typical "fat" client with a DNS cache has no issues with DNS not being available for a few moments. However, on my network, with numerous embedded devices & sensors (and yes, even a Google Home device) which do NOT do any DNS caching but always do a DNS lookup, we have continuous measurement interruptions every time Unbound does a restart, even if it only takes 500 ms on my device.
That is operational impact for Netgate device customers (not people running some free version of pfSense somewhere), for an issue that has been known already for three years (start of this thread).
BTW: Thanks for your investigative work, was an interesting read!
-
Hello!
On a very small network (sg-3100) with 10 or so regular devices (devices not coming and going) with DHCP leases, I am seeing around 10 dns restarts per hour initiated by dhcpd ("pfSense dhcpleases: Sending HUP signal to dns daemon"). Stock OOTB snort and pfb. No feed or rule craziness. I am not getting any blowback from users about internet flakiness.
Does dhcpd restart the dns on lease renewals or DHCPREQUEST/DHCPACK traffic? I cant tell from the logs. It is hard to match up those restarts to any specific dhcpd activity.
The default lease time is only 2hrs. On a large network with lots of leases that could be many restarts? Maybe a longer lease time?
John
-
Among all the things I love about pfsense, unbound is the one thing I have to say is the biggest let down for me. I agree that the blame is not solely on unbound since it is being bombarded by large DNS block lists in these worst case scenarios. Overall though, if I could have one thing that must be improved, it would be unbound. Even the best case scenario of an install without pfblocker, you're DNS cache would be wiped every time a DHCP lease is requested (assuming DHCP registration is enabled). This is not a huge deal since startup would be fast and a delay in DNS query may not even be noticed, but that defeats the purpose of a DNS cache. You're left having to choose if you want a proper DNS cache or DHCP registration. I personally don't care about the DHCP registration, but that shouldn't have to be a choice.
Edit: The other options as mentioned bind or a seperate DNS server would be a solution. I opted for neither. As much as I would like to take on a new project like setting up a seperate pi hole server, I rather not add more devices and complication to my network to solve a problem that's not that big of a deal.
-
@TimJacobs said in Frequent unbound restarts:
Another way to read it is that they sell $5300 devices that can't even serve DNS properly.
I ... understand what you are saying.
And it's easy to stay on the positive side of things : with that budget - and probably some big network behind it, I would go even bigger ; separated DNS and DHCP and separated DNS ...proxy ? => another device. etc.But I rejoin your words :
HIGH AVAILABILITY XG-1541 1U Security Gateway with pfSense software
Now ....
Let's apply the maths. First things first : performance cost ratio. And pfSense implodes by it's division by zero error ..... (it's free, so it's infinite, nothing, ....) I tend to say : Netgate has another product .... ;)Also : there is another solution ( which would scare the hell out a lot of people ) : stop unbound.
Install bind. Using all and only the config files, disabling all GUI DNS / bind related settings.But such a solution doesn't really belong on this forum ....
edit : and permit me - it's Friday after all : you're about to buy that Ferrari and discuss about the free Espresso machine that comes with it ?
@serbus said in Frequent unbound restarts:
10 dns restarts per hour
I would investigate.
At least, see who (what) restarted unbound.
It is not necessary a DHCP event.
Just compare (all) the logs at the same event time.Quick test on my side :
clog /var/log/resolver.log | grep 'Restart'
..... Jun 18 12:52:18 pfsense unbound: [32889:0] notice: Restart of unbound 1.10.1. Jun 20 10:24:23 pfsense unbound: [3443:0] notice: Restart of unbound 1.10.1.
That's 6 days ago ...
Probably not good neither. I'm not sure.@serbus said in Frequent unbound restarts:
Does dhcpd restart the dns on lease renewals or DHCPREQUEST/DHCPACK traffic?
That one is on my want-to-know-be -sure investigation list !!
I guess, because a DHCP renewal does not (should not ? can't ? I'm wrong ?) change the IP neither the host name, the validity of exiting DNS data is prolonged in the future, so no SIGHUP. is needed because no "file" change.
Rather easy to check : On a windows device, open your "cmd" command prompt, and launch aipconfig /renew
and check the Resolver log.
I bet : no.
For me, unbound does not get restarted by any DHCP event, but that's because I do not register DHCP leases (I DHCP Static leased them all).
@serbus said in Frequent unbound restarts:
The default lease time is only 2hrs
Mine is set to half a day, so renewals take place every 6 hours.
-
Hello!
Virtually all of my 10+/hr unbound restarts come from dhcpd -> dhcpleases for no apparent reason.
A typical log dump looks like :Jun 26 12:55:55 dhcpleases Sending HUP signal to dns daemon(61316)
Jun 26 12:55:55 dhcpd DHCPACK on 192.168.0.135 to e0:37:bf:87:b5:ff via mvneta1
Jun 26 12:55:55 dhcpd DHCPREQUEST for 192.168.0.135 from e0:37:bf:87:b5:ff via mvneta1
Jun 26 12:47:21 dhcpleases Sending HUP signal to dns daemon(61316)
Jun 26 12:47:21 dhcpd DHCPACK on 192.168.0.51 to 18:16:c9:34:79:1b (DIRECTV-HR54-C9347919) via mvneta1
Jun 26 12:47:21 dhcpd DHCPREQUEST for 192.168.0.51 from 18:16:c9:34:79:1b (DIRECTV-HR54-C9347919) via mvneta1
Jun 26 12:38:00 dhcpleases Sending HUP signal to dns daemon(61316)
Jun 26 12:38:00 dhcpd DHCPACK on 192.168.0.172 to 1c:4d:66:e0:cf:7e via mvneta1
Jun 26 12:38:00 dhcpd DHCPREQUEST for 192.168.0.172 from 1c:4d:66:e0:cf:7e via mvneta1
Jun 26 12:37:03 dhcpleases Sending HUP signal to dns daemon(61316)
Jun 26 12:37:03 dhcpd DHCPACK on 192.168.0.145 to b4:2a:0e:af:b8:af via mvneta1
Jun 26 12:37:03 dhcpd DHCPREQUEST for 192.168.0.145 from b4:2a:0e:af:b8:af via mvneta1These are NOT new leases or even lease renewals...they are just devices that already have a lease "checking in".
Dhcpleases appears to perform this reload on any kevent with a NULL command. Not sure why dhcpd is sending that command.
There are some REQ/ACK pairs that do not result in dhcpleases reload.
I dont see any way, or support, for cranking up the dhcpd/dhcpleases debug logging.John
-
Hello!
The DHCP logging is not great, but a capture showed my 10 or so devices are renewing/extending at the half-way point on the 2hr lease. So, around 10 dhcp related dns restarts per hour.
John
-
One solution would be for DHCP service to write new leases to the dhcp lease file but instead of restarting unbound, DHCP service would use unbound-control(8) to notify unbound about new leases.
Like I posted earlier, that's what pfblockerNG strategy is with Live Reload enabled.
-
@RonpfS said in Frequent unbound restarts:
Like I posted earlier,
I think I recall my answer to that what you said back then.
Something like "then where is this smart reloading ?". I guess I felt for "can't find it so it isn't there" way of thinking.
Well, I was very wrong.
I found the DNSBL reloading using unbound-control in pfBlockerNG.
Lots of condition apply, otherwise :Reloading Unbound Resolver..... completed [ 06/27/20 12:14:16 ]
which means : unbound is restarted. No Liveupdate.
If this condition isn't met : unbound will get restarted - unbound-control won't be used :
Once the TLD Domain limit below is exceeded, the balance of the Domains will be listed as-is. IE: Blocking only the listed Domain (Not Sub-Domains) TLD Domain Limit Restrictions: < 1.0GB RAM - Max 100k Domains < 1.5GB RAM - Max 150k Domains < 2.0GB RAM - Max 200k Domains < 2.5GB RAM - Max 250k Domains < 3.0GB RAM - Max 400k Domains < 4.0GB RAM - Max 600k Domains < 5.0GB RAM - Max 1.0M Domains < 6.0GB RAM - Max 1.5M Domains < 7.0GB RAM - Max 2.5M Domains > 7.0GB RAM - > 2.5M Domains
I've a 220K+ domain list : "Liveupdate" never executed for me because my pfSense has 2 Gbytes of RAM.
And you're right, unbound-control could be used to transmit "DHCP lease DNS details" into unbound to insert/update/remove . It's pretty straight forward.
I guess it's a matter of replacing the "dhcpleases" process with a shell script that uses unbound-control.
No more need for the "/var/unbound/dhcpleases_entries.conf" - just inject all found active leases into unbound, and done. -
I lowered the number of DNBL entries :
= somewhat less then 70000 entries : DNSBL = 43802 to be exact. See below.
...... TLD finalize... completed [ 06/29/20 00:00:20 ] Saving DNSBL database... completed Resolver Live Sync analysis... completed [ 06/29/20 00:00:21 ] Resolver Live Sync finalizing: Remove local-zone(s): removed 270 zones Remove local-data(s): removed 205 datas Add local-zone(s): added 606 zones Add local-data(s): added 393 datas Resolver Live Sync... completed [ 06/29/20 00:00:22 ] DNSBL update [ 43799 | PASSED ]... completed DNSBL DEBUG..[ Data(s): 43802 Zone(s): 32065 | 06/29/20 00:00:25 ] ....
These are synced in with unbound-control - NOT restarting unbound.
My last unbound restart :
Jun 20 10:24:23 pfsense unbound: [3443:0] notice: Restart of unbound 1.10.1.
That's more the a week ago. So I'm pretty sure even pfBlockerNG-devel doesn't restart unbound any more,.
This method should be used for syncing in (and out ?) DHCP leases, and the issue would be gone.
The "dhcpleases" process could be replaced with a (shell) script that parses the DHCP leases, and put them "into" "unbound" instead of the less sophisticated "reload/restart". -
@Gertjan I got back too late to answer your previous response so I'll answer this one. I believe unbound is doing the right thing: when it gets a HUP signal it restarts. That's standard Unix behaviour. dhcpleases has a bug in that it is sending a HUP signal when it can only ever effect the local zone. Since it effects the local zone, it should be using unbound-control to reload that zone only. In the worst case you can do "local_zone_remove <local zone>" and then add it back. There is no reason to restart the entire server since only a single zone will be affected by DHCP leases.
For PFSense, it makes sense that it sometimes reloads the whole server because it can modify potentially any zone. DHCP cannot.
EDIT: Oh, never mind. Didn't properly read your last post. We're on the same page now. :)
-
Since there is a simple way to fix this extremely frustrating problem, is there any progress on it?
Like, please Netgate, one of the most useful features is not working.
-
What is not working ?
This thread highlights two possible reasons why unbound is restarting.
Because it should restart to take changes in account.@sotirone said in Frequent unbound restarts:
Since there is a simple way to fix this extremely frustrating problem, is there any progress on it?
Don't get it.
What progress is needed if a simple fix already exists ?
An even more simple fix ?You agree with me that their will never be a situation that a program/service/ functionality/etc will work for every type of scenario ?
-
@Gertjan Our problem is mostly because of the dhcpleases sending a HUP request and triggering the unbound reload.
So, changing the code to use unbound-control instead of sending a HUP request is the easy fix, no?
-
I would qualify this as "easy" : Remove the check from here :
Changing https://github.com/pfsense/FreeBSD-ports/blob/9e4bb79caf876017a31a09176947b88d69588f1b/sysutils/dhcpleases/files/dhcpleases.c
is less forward.
Because using unbound-control to administer unbound's internal structures and records is less documented / less known (my opinion).
One minor glitch and the entire local DNS goes down ... or contains wrong info which is even worse.This small program, called dhcpleases,isa process that keeps running as a daemon, wakes up when the dhcp.leases file changes, as this would happen if the DHCP server 'changes' something, like when a new lease is attributed - or a lease is renewed. Note that the latter case is not actually changing the IP thus the required DNS info.
All active leases are then formatted for the system /etc/hosts file, and copied over to that file. When done, unbound is signalled = HUPped. Upon receiving of this signal, unbound .... it restarts it self.
If dhcpleases was using unbound-control, it could transmit the unbound process the 'new' or 'renewed' host name and IP address from the lease info, one by one ( ! ) .
I'm wondering :
When we have a big network, and many, like 48 devices that are hooked up to a 48 ports switch, and this switch powers down and up, 48 PC's will receive an "interface UP" signal. These 48 PC's will fire a DHCP-client request to the DHCP server, pfSense.
48 renewed (== updated) leases will get written, actually : updated, to /var/dhcpd/var/db/dhcp.leases file in a rapid manor.Parsing the /var/dhcpd/var/db/dhcp.leases and adding the distilled content to /etc/hosts file is one thing.
Firing 'unbound-control', a big process by itself, to transmit just one changed 'host name' + IP address, multiplied by 48 because 48 PC are renewing, is another thing.Note : I activated "DHCP Registration" in the Unbound config settings : it look likes dhcpleases contains a minor bug : the DHCP info is added 3 (identical) times :
Have a look at my /etc/hosts file :127.0.0.1 localhost localhost.my-pfsense.domain ::1 localhost localhost.my-pfsense.domain 192.168.1.1 pfsense.my-pfsense.domain pfsense 2001:470:1f13:5c0:2::1 pfsense.my-pfsense.domain pfsense 188.165.53.87 ns1.my-pfsense.domain ns1 2001:41d0:2:927b::3 ns1.my-pfsense.domain ns1 10.10.10.1 pfb.my-pfsense.domain pfb 192.168.2.1 portal.my-pfsense.domain portal # dhcpleases automatically entered 192.168.1.121 DESKTOP-SHTTTGB.my-pfsense.domain DESKTOP-SHTTTGB # dynamic entry from dhcpd.leases 192.168.1.120 Gauche2.my-pfsense.domain Gauche2 # dynamic entry from dhcpd.leases 192.168.1.119 iPhone6sdeVera.my-pfsense.domain iPhone6sdeVera # dynamic entry from dhcpd.leases 192.168.1.111 iPhone11deVera.my-pfsense.domain iPhone11deVera # dynamic entry from dhcpd.leases 192.168.2.92 Jerome-EGC.my-pfsense.domain Jerome-EGC # dynamic entry from dhcpd.leases 192.168.2.139 Air-van-dirk.my-pfsense.domain Air-van-dirk # dynamic entry from dhcpd.leases 192.168.2.203 iPadProdier2019.my-pfsense.domain iPadProdier2019 # dynamic entry from dhcpd.leases 192.168.2.125 Galaxy-A71.my-pfsense.domain Galaxy-A71 # dynamic entry from dhcpd.leases 192.168.2.77 Galaxy-Tab-A.my-pfsense.domain Galaxy-Tab-A # dynamic entry from dhcpd.leases 192.168.2.224 HUAWEI_P_smart_2019-ce0dc.my-pfsense.domain HUAWEI_P_smart_2019-ce0dc # dynamic entry from dhcpd.leases 192.168.2.201 Galaxy-A70.my-pfsense.domain Galaxy-A70 # dynamic entry from dhcpd.leases 192.168.2.192 iPhone.my-pfsense.domain iPhone # dynamic entry from dhcpd.leases 192.168.2.147 Galaxy-Tab-S2.my-pfsense.domain Galaxy-Tab-S2 # dynamic entry from dhcpd.leases 192.168.2.217 Galaxy-S9.my-pfsense.domain Galaxy-S9 # dynamic entry from dhcpd.leases 192.168.2.153 iPad-de-Yann.my-pfsense.domain iPad-de-Yann # dynamic entry from dhcpd.leases # dhcpleases automatically entered 192.168.1.121 DESKTOP-SHTTTGB.my-pfsense.domain DESKTOP-SHTTTGB # dynamic entry from dhcpd.leases 192.168.1.120 Gauche2.my-pfsense.domain Gauche2 # dynamic entry from dhcpd.leases 192.168.1.119 iPhone6sdeVera.my-pfsense.domain iPhone6sdeVera # dynamic entry from dhcpd.leases 192.168.1.111 iPhone11deVera.my-pfsense.domain iPhone11deVera # dynamic entry from dhcpd.leases 192.168.2.203 iPadProdier2019.my-pfsense.domain iPadProdier2019 # dynamic entry from dhcpd.leases 192.168.2.92 Jerome-EGC.my-pfsense.domain Jerome-EGC # dynamic entry from dhcpd.leases 192.168.2.139 Air-van-dirk.my-pfsense.domain Air-van-dirk # dynamic entry from dhcpd.leases 192.168.2.147 Galaxy-Tab-S2.my-pfsense.domain Galaxy-Tab-S2 # dynamic entry from dhcpd.leases 192.168.2.217 Galaxy-S9.my-pfsense.domain Galaxy-S9 # dynamic entry from dhcpd.leases 192.168.2.125 Galaxy-A71.my-pfsense.domain Galaxy-A71 # dynamic entry from dhcpd.leases 192.168.2.77 Galaxy-Tab-A.my-pfsense.domain Galaxy-Tab-A # dynamic entry from dhcpd.leases 192.168.2.224 HUAWEI_P_smart_2019-ce0dc.my-pfsense.domain HUAWEI_P_smart_2019-ce0dc # dynamic entry from dhcpd.leases 192.168.2.201 Galaxy-A70.my-pfsense.domain Galaxy-A70 # dynamic entry from dhcpd.leases 192.168.2.192 iPhone.my-pfsense.domain iPhone # dynamic entry from dhcpd.leases 192.168.2.153 iPad-de-Yann.my-pfsense.domain iPad-de-Yann # dynamic entry from dhcpd.leases # dhcpleases automatically entered 192.168.1.121 DESKTOP-SHTTTGB.my-pfsense.domain DESKTOP-SHTTTGB # dynamic entry from dhcpd.leases 192.168.1.120 Gauche2.my-pfsense.domain Gauche2 # dynamic entry from dhcpd.leases 192.168.1.119 iPhone6sdeVera.my-pfsense.domain iPhone6sdeVera # dynamic entry from dhcpd.leases 192.168.1.111 iPhone11deVera.my-pfsense.domain iPhone11deVera # dynamic entry from dhcpd.leases 192.168.2.203 iPadProdier2019.my-pfsense.domain iPadProdier2019 # dynamic entry from dhcpd.leases 192.168.2.92 Jerome-EGC.my-pfsense.domain Jerome-EGC # dynamic entry from dhcpd.leases 192.168.2.139 Air-van-dirk.my-pfsense.domain Air-van-dirk # dynamic entry from dhcpd.leases 192.168.2.147 Galaxy-Tab-S2.my-pfsense.domain Galaxy-Tab-S2 # dynamic entry from dhcpd.leases 192.168.2.217 Galaxy-S9.my-pfsense.domain Galaxy-S9 # dynamic entry from dhcpd.leases 192.168.2.125 Galaxy-A71.my-pfsense.domain Galaxy-A71 # dynamic entry from dhcpd.leases 192.168.2.77 Galaxy-Tab-A.my-pfsense.domain Galaxy-Tab-A # dynamic entry from dhcpd.leases 192.168.2.224 HUAWEI_P_smart_2019-ce0dc.my-pfsense.domain HUAWEI_P_smart_2019-ce0dc # dynamic entry from dhcpd.leases 192.168.2.201 Galaxy-A70.my-pfsense.domain Galaxy-A70 # dynamic entry from dhcpd.leases 192.168.2.192 iPhone.my-pfsense.domain iPhone # dynamic entry from dhcpd.leases 192.168.2.153 iPad-de-Yann.my-pfsense.domain iPad-de-Yann # dynamic entry from dhcpd.leases
This has probably no side effects.
-
What is the status on this? I see a lot of ubound restarts, I can't believe this issue has not been fixed since 4 years. Is it that fix is impossible, too hard, or developers are unable to recognize this problem?
-
@lutel Still broken as you can see.
It is actually working in OpnSense, which was previously a pfSense fork but now refactored and based on HardenedBSD rather than FreeBSD.