Frequent unbound restarts

lucas_nz

@stephenw10 It's particularly noticeable if you are using pfBlockerNG - which adds large lists of sites to the unbound config (to provide DNS based blocking) and thus the reload can take some seconds (the restart wasn't noticeable before I implemented pfBlockerNG). This was a major issue for me until I unticked DHCP registration option. But having DHCP registration disabled is a bit lame.

Luke

stephenw10

I imagine there is a threshold where the latency for the different processes becomes critical. I run pfBlocker and have dhcp leases enabled and never have an issue. I seemingly have not that limit yet.

RichMawdsley

@stephenw10 said in Frequent unbound restarts:

I have never hit it myself but clearly some people do. Switching to reload instead of restart does seem like the obvious option here. The fact it hasn't happened yet may imply I'm missing something though.

Steve

This is what I don't understand either. This seems like a reaaally simple thing to fix.. and yes, I say FIX because this is absolutely a ridiculous flaw.

Gertjan

@lucas_nz said in Frequent unbound restarts:

But having DHCP registration disabled is a bit lame.

Shutting down that option is half the work.
This one stays on :

and you add all your devices to the "DHCP Static Mappings for this Interface" list.

@stephenw10 said in Frequent unbound restarts:

I run pfBlocker and have dhcp leases enabled and never have an issue.

I bet you didn't select "all the feeds" neither ;)

edit : https://redmine.pfsense.org/issues/5413

stephenw10

@Gertjan said in Frequent unbound restarts:

I bet you didn't select "all the feeds" neither ;)

Indeed I did not.

jt

anyone knows if this has been fixed in the latest update 2.4.5-p1?

Gertjan

@jt said in Frequent unbound restarts:

anyone knows if this has been fixed in the latest update 2.4.5-p1?

Everybody knows.
See here - just above. - just above.

Again : as soon as [nllabs.n](https://nlnetlabs.nl/projects/unbound/about/ (the authors) rewrites unbound to implement something that could be the solution, this wont't happen.

As such, it's not a (pfSense) bug. At most, one could say that unbound is good, but not perfect.

The easy work around is : declare static MAC DHCP leases for all the devices that you need to address 'by name' - these devices often hosts services to be accessed from your LAN.

jasonArloUser

This is completely wrong. Shutting off services in the device to fix outages is not acceptable. If a service is not supported, don't offer it as a feature.

Also, there is no problem with unbound. I repeat as people seem to not be getting this: THERE IS NO PROBLEM WITH UNBOUND. The problem is with the other software. Unbound has a way to reload specific zones via a command. The DHCP lease scripts incorrectly send a HUP to the process. THIS IS A BUG. It's not a design choice, it's not a different way to do it. It's broken, incorrectly written software. A lease can never affect any other zone than local so the local zone should be reloaded.

If one pays for PFSense is the support any better than this forum? Because this forum is just dismissive of seemingly every problem. IPV6 support is also broken and no one cares about that either.

Gertjan

@jasonArloUser said in Frequent unbound restarts:

This is completely wrong.

I need - we all, I guess, work arrounds, as the issue can't be solved easily.

In another thread I showed the source code of unbound : how it handles the reception of OS signal like SIGHUP for example.
What ever you send to unbound : there is no "reload the config" functionality - it restarts.
That is the reason why I tend to say : do the next best thing : limit the number of SIGHUP's that will get send to unbound. Which means : stop the DHCP registration. Not perfect, I know, because now the admin has some work to do to compensate this initial loss-of-functionality - let's say 30 seconds of wrok for every device in the network. I mean : add DHCP static leases for every device that has to have a known host name in your network.

@jasonArloUser said in Frequent unbound restarts:

THERE IS NO PROBLEM WITH UNBOUND

I prefer to say : it's not perfect ;)

What I know is that the process that processes new DHCP leases so they get signalled to the DNS sub system : known as "dhcpleases" is working as it should. It signals the current (unbound inour case) the DNS system that there is a new host name available.
This dhcpleases doesn't say "here is xxx.localdomain.local with IP a.b.c.d".
This process observes the file /var/dhcpd/var/db/dhcpd.leases - maintand by the dhcpd server daemon.
When it changes, it parses /var/dhcpd/var/db/dhcpd.leases, rewrites /var/unbound/dhcpleases_entries.conf, gets the pid of unbound by reading /var/run/unbound.pid and sends a SIGHUP to it (unbound).

and unbound parses all the config files again ... by restating.
It's not smart enough to detect that it 'knows' that a line (or more) was added removed (or a combination) to the /var/unbound/dhcpleases_entries.conf file ....

Do you see this any different ?
The obtained the "how it works" by reading the code (the C language is part of my professional eduction). Still, why not, I could be wrong ^^

Keep in mind : unbound is a light weight DNS Resolver with DNSSEC and forward capabilities. It does not advertise more.
It's not 'bind'. (bind is huge ...)

Btw : I'm not judging the system as a whole, just trying to understand how it works. I like to understand why things happen. It's a needed step if solutions need to be found.

edit : in the past, and in the present, unbound stops and starts fast.
Of course, when it restarts, its DNS cache is gone ..... no good, but many didn't notice, so ok ...
Then some one came by, and invented "DNSBL" for pfSense, and how to repopulate the local DNS cache with pre build replies that made it possible to screen out some IP's and domain names. You know the one I'm talking about.
The size of the config files read by unbound at startup exploded. Before, several kilo bytes was usual. Now, check the forum : people don't even blink their eyes when the explain that they have a million or more DNSBL's in their 'unbound' config files .... Why would one stop at a certain size if you can have it all ? "Just select them all" Consequences are unknown thus none existent ....

The poor unbound has to parse them all at startup ..... because a stupid DHCP lease came in.
And during that time, tens of seconds, no more DNS .... and that, that was noticed. And here we are.

So, according to you : who's fault is this ? ;)

TimJacobs

@Gertjan said in Frequent unbound restarts:

Keep in mind : unbound is a light weight DNS Resolver with DNSSEC and forward capabilities. It does not advertise more.
It's not 'bind'. (bind is huge ...)

True, but Netgate encourages people to use this as their DNS solution. Another way to read it is that they sell $5300 devices that can't even serve DNS properly.

The poor unbound has to parse them all at startup ..... because a stupid DHCP lease came in.
And during that time, tens of seconds, no more DNS .... and that, that was noticed. And here we are.

So, according to you : who's fault is this ? ;)

It is true that a typical "fat" client with a DNS cache has no issues with DNS not being available for a few moments. However, on my network, with numerous embedded devices & sensors (and yes, even a Google Home device) which do NOT do any DNS caching but always do a DNS lookup, we have continuous measurement interruptions every time Unbound does a restart, even if it only takes 500 ms on my device.

That is operational impact for Netgate device customers (not people running some free version of pfSense somewhere), for an issue that has been known already for three years (start of this thread).

BTW: Thanks for your investigative work, was an interesting read!

serbus

Hello!

On a very small network (sg-3100) with 10 or so regular devices (devices not coming and going) with DHCP leases, I am seeing around 10 dns restarts per hour initiated by dhcpd ("pfSense dhcpleases: Sending HUP signal to dns daemon"). Stock OOTB snort and pfb. No feed or rule craziness. I am not getting any blowback from users about internet flakiness.

Does dhcpd restart the dns on lease renewals or DHCPREQUEST/DHCPACK traffic? I cant tell from the logs. It is hard to match up those restarts to any specific dhcpd activity.

The default lease time is only 2hrs. On a large network with lots of leases that could be many restarts? Maybe a longer lease time?

John

Raffi_

Among all the things I love about pfsense, unbound is the one thing I have to say is the biggest let down for me. I agree that the blame is not solely on unbound since it is being bombarded by large DNS block lists in these worst case scenarios. Overall though, if I could have one thing that must be improved, it would be unbound. Even the best case scenario of an install without pfblocker, you're DNS cache would be wiped every time a DHCP lease is requested (assuming DHCP registration is enabled). This is not a huge deal since startup would be fast and a delay in DNS query may not even be noticed, but that defeats the purpose of a DNS cache. You're left having to choose if you want a proper DNS cache or DHCP registration. I personally don't care about the DHCP registration, but that shouldn't have to be a choice.

Edit: The other options as mentioned bind or a seperate DNS server would be a solution. I opted for neither. As much as I would like to take on a new project like setting up a seperate pi hole server, I rather not add more devices and complication to my network to solve a problem that's not that big of a deal.

Gertjan

@TimJacobs said in Frequent unbound restarts:

Another way to read it is that they sell $5300 devices that can't even serve DNS properly.

I ... understand what you are saying.
And it's easy to stay on the positive side of things : with that budget - and probably some big network behind it, I would go even bigger ; separated DNS and DHCP and separated DNS ...proxy ? => another device. etc.

But I rejoin your words :
HIGH AVAILABILITY XG-1541 1U Security Gateway with pfSense software
Now ....
Let's apply the maths. First things first : performance cost ratio. And pfSense implodes by it's division by zero error ..... (it's free, so it's infinite, nothing, ....) I tend to say : Netgate has another product .... ;)

Also : there is another solution ( which would scare the hell out a lot of people ) : stop unbound.
Install bind. Using all and only the config files, disabling all GUI DNS / bind related settings.

But such a solution doesn't really belong on this forum ....

edit : and permit me - it's Friday after all : you're about to buy that Ferrari and discuss about the free Espresso machine that comes with it ?

@serbus said in Frequent unbound restarts:

10 dns restarts per hour

I would investigate.
At least, see who (what) restarted unbound.
It is not necessary a DHCP event.
Just compare (all) the logs at the same event time.

Quick test on my side :

clog /var/log/resolver.log | grep 'Restart'

.....
Jun 18 12:52:18 pfsense unbound: [32889:0] notice: Restart of unbound 1.10.1.
Jun 20 10:24:23 pfsense unbound: [3443:0] notice: Restart of unbound 1.10.1.

That's 6 days ago ...
Probably not good neither. I'm not sure.

@serbus said in Frequent unbound restarts:

Does dhcpd restart the dns on lease renewals or DHCPREQUEST/DHCPACK traffic?

That one is on my want-to-know-be -sure investigation list !!

I guess, because a DHCP renewal does not (should not ? can't ? I'm wrong ?) change the IP neither the host name, the validity of exiting DNS data is prolonged in the future, so no SIGHUP. is needed because no "file" change.
Rather easy to check : On a windows device, open your "cmd" command prompt, and launch a

ipconfig /renew

and check the Resolver log.

I bet : no.

For me, unbound does not get restarted by any DHCP event, but that's because I do not register DHCP leases (I DHCP Static leased them all).

@serbus said in Frequent unbound restarts:

The default lease time is only 2hrs

Mine is set to half a day, so renewals take place every 6 hours.

serbus

Hello!

Virtually all of my 10+/hr unbound restarts come from dhcpd -> dhcpleases for no apparent reason.
A typical log dump looks like :

Jun 26 12:55:55 dhcpleases Sending HUP signal to dns daemon(61316)
Jun 26 12:55:55 dhcpd DHCPACK on 192.168.0.135 to e0:37:bf:87:b5:ff via mvneta1
Jun 26 12:55:55 dhcpd DHCPREQUEST for 192.168.0.135 from e0:37:bf:87:b5:ff via mvneta1
Jun 26 12:47:21 dhcpleases Sending HUP signal to dns daemon(61316)
Jun 26 12:47:21 dhcpd DHCPACK on 192.168.0.51 to 18:16:c9:34:79:1b (DIRECTV-HR54-C9347919) via mvneta1
Jun 26 12:47:21 dhcpd DHCPREQUEST for 192.168.0.51 from 18:16:c9:34:79:1b (DIRECTV-HR54-C9347919) via mvneta1
Jun 26 12:38:00 dhcpleases Sending HUP signal to dns daemon(61316)
Jun 26 12:38:00 dhcpd DHCPACK on 192.168.0.172 to 1c:4d:66:e0:cf:7e via mvneta1
Jun 26 12:38:00 dhcpd DHCPREQUEST for 192.168.0.172 from 1c:4d:66:e0:cf:7e via mvneta1
Jun 26 12:37:03 dhcpleases Sending HUP signal to dns daemon(61316)
Jun 26 12:37:03 dhcpd DHCPACK on 192.168.0.145 to b4:2a:0e:af:b8:af via mvneta1
Jun 26 12:37:03 dhcpd DHCPREQUEST for 192.168.0.145 from b4:2a:0e:af:b8:af via mvneta1

These are NOT new leases or even lease renewals...they are just devices that already have a lease "checking in".
Dhcpleases appears to perform this reload on any kevent with a NULL command. Not sure why dhcpd is sending that command.
There are some REQ/ACK pairs that do not result in dhcpleases reload.
I dont see any way, or support, for cranking up the dhcpd/dhcpleases debug logging.

John

serbus

Hello!

The DHCP logging is not great, but a capture showed my 10 or so devices are renewing/extending at the half-way point on the 2hr lease. So, around 10 dhcp related dns restarts per hour.

John

RonpfS

One solution would be for DHCP service to write new leases to the dhcp lease file but instead of restarting unbound, DHCP service would use unbound-control(8) to notify unbound about new leases.

Like I posted earlier, that's what pfblockerNG strategy is with Live Reload enabled.

Gertjan

@RonpfS said in Frequent unbound restarts:

Like I posted earlier,

I think I recall my answer to that what you said back then.
Something like "then where is this smart reloading ?". I guess I felt for "can't find it so it isn't there" way of thinking.
Well, I was very wrong.
I found the DNSBL reloading using unbound-control in pfBlockerNG.
Lots of condition apply, otherwise :

Reloading Unbound Resolver..... completed [ 06/27/20 12:14:16 ]

which means : unbound is restarted. No Liveupdate.

If this condition isn't met : unbound will get restarted - unbound-control won't be used :

Once the TLD Domain limit below is exceeded, the balance of the Domains will be listed as-is. IE: Blocking only the listed Domain (Not Sub-Domains)

TLD Domain Limit Restrictions:

    < 1.0GB RAM - Max 100k Domains
    < 1.5GB RAM - Max 150k Domains
    < 2.0GB RAM - Max 200k Domains
    < 2.5GB RAM - Max 250k Domains
    < 3.0GB RAM - Max 400k Domains
    < 4.0GB RAM - Max 600k Domains
    < 5.0GB RAM - Max 1.0M Domains
    < 6.0GB RAM - Max 1.5M Domains
    < 7.0GB RAM - Max 2.5M Domains
    > 7.0GB RAM - > 2.5M Domains

I've a 220K+ domain list : "Liveupdate" never executed for me because my pfSense has 2 Gbytes of RAM.

And you're right, unbound-control could be used to transmit "DHCP lease DNS details" into unbound to insert/update/remove . It's pretty straight forward.

I guess it's a matter of replacing the "dhcpleases" process with a shell script that uses unbound-control.
No more need for the "/var/unbound/dhcpleases_entries.conf" - just inject all found active leases into unbound, and done.

Gertjan

I lowered the number of DNBL entries :

= somewhat less then 70000 entries : DNSBL = 43802 to be exact. See below.

......
TLD finalize... completed [ 06/29/20 00:00:20 ]

Saving DNSBL database... completed
Resolver Live Sync analysis... completed [ 06/29/20 00:00:21 ]
Resolver Live Sync finalizing:
	Remove local-zone(s):		removed 270 zones
	Remove local-data(s):		removed 205 datas
	Add local-zone(s):		added 606 zones
	Add local-data(s):		added 393 datas
Resolver Live Sync... completed [ 06/29/20 00:00:22 ]
DNSBL update [ 43799 | PASSED  ]... completed

DNSBL DEBUG..[ Data(s): 43802	Zone(s): 32065 | 06/29/20 00:00:25 ]
....

These are synced in with unbound-control - NOT restarting unbound.

My last unbound restart :

Jun 20 10:24:23 pfsense unbound: [3443:0] notice: Restart of unbound 1.10.1.

That's more the a week ago. So I'm pretty sure even pfBlockerNG-devel doesn't restart unbound any more,.

This method should be used for syncing in (and out ?) DHCP leases, and the issue would be gone.
The "dhcpleases" process could be replaced with a (shell) script that parses the DHCP leases, and put them "into" "unbound" instead of the less sophisticated "reload/restart".

jasonArloUser

@Gertjan I got back too late to answer your previous response so I'll answer this one. I believe unbound is doing the right thing: when it gets a HUP signal it restarts. That's standard Unix behaviour. dhcpleases has a bug in that it is sending a HUP signal when it can only ever effect the local zone. Since it effects the local zone, it should be using unbound-control to reload that zone only. In the worst case you can do "local_zone_remove <local zone>" and then add it back. There is no reason to restart the entire server since only a single zone will be affected by DHCP leases.

For PFSense, it makes sense that it sometimes reloads the whole server because it can modify potentially any zone. DHCP cannot.

EDIT: Oh, never mind. Didn't properly read your last post. We're on the same page now. :)

sotirone

Since there is a simple way to fix this extremely frustrating problem, is there any progress on it?

Like, please Netgate, one of the most useful features is not working.