After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts
-
@getcom Hm, I have had 23.01 running at my home on a 2100, about 18 hours since I changed a setting last night. I don't have a billion devices at home though.
: sockstat | grep unbound | wc -l 12 PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 25498 unbound 2 20 0 158M 124M kqread 0 7:37 0.25% unbound
DNS Forwarder has a "Query DNS servers sequentially" option, Resolver does not. I found a 2015 post (https://serverfault.com/questions/732920/how-to-do-parallel-queries-to-the-upstream-dns-using-unbound) that unbound queries them all.
How many DNS servers do you have configured?
re: BIND, there is a package but as I vaguely recall it's not meant for resolving? Not too familiar with the package (am of course familiar with BIND).
-
@steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:
How many DNS servers do you have configured?
re: BIND, there is a package but as I vaguely recall it's not meant for resolving? Not too familiar with the package (am of course familiar with BIND).
Two external DNS servers, one for each WAN connection (=>dual WAN setup) , four for the four site2site VPNs, one for an AD sub domain.
Related to the 2015 post: unbound did this parallel queries also in the 22.05 release. This means this is/was not the root cause.What is strange is the fact that after a few cron unbound restarts, the load is now in a normal range (~0.19). Also the sockstat shows only 400 to 600 connections.
Puuh...this is something what I don`t like because it is gone before I could find out the root cause...this issue can happen again. -
It is back again. The load is definitely too high. I could recognize audio problems in Jitsi meetings.
The system load goes up and down all the time.
I think I will go back to the previous release until this is fixed. -
After updating another system to 23.01 with same behavior, I came to the end that the parallel requests are so much more aggressive in 23.01 that the answers are killing the messenger. I saw ~4500+ requests/packets per second shooting up unbound in a way that it was not able to answer anymore. I got lots of timeouts on VMs and clients for some seconds/minutes and all was freezing.
@Gertjan I think you are right...I`m a member of the 'DNS/unbound' sucks - club' now ...
My conclusion is that until unbound is not able to handle sequential requests, I should give dnsmasq alias DNS-Forwarder a try as mentioned by @steveits. -
@getcom One other idea that has come up in other threads e.g. https://forum.netgate.com/topic/178413/major-dns-bug-23-01-with-quad9-on-ssl/ is to disable DNS over TLS. It doesn't seem to be a problem for everyone but some people say it is necessary when using Unbound in forwarding mode.
I don't recall anyone reporting high CPU usage though.
-
rant:
there are quite a few problems with pfSense and DNS services , i just given up trying to chase them down, either high cpu usage, flaky dns service, dns not working at all on localhost on the firewall ... it just times out , domain overrides not working and so onnow running DNS server on a couple separate boxes, no issues at all
pfSense does show it's limits even on netgate hardware (we have a couple of 7100 appliances), they cram a lot of things into this firewall appliance for whatever reason ... i'm guessing if we need professional business level firewall that's what TNSR is for -
@steveits said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:
@getcom One other idea that has come up in other threads e.g. https://forum.netgate.com/topic/178413/major-dns-bug-23-01-with-quad9-on-ssl/ is to disable DNS over TLS. It doesn't seem to be a problem for everyone but some people say it is necessary when using Unbound in forwarding mode.
DNS over TLS is/was not activated...
-
This post is deleted! -
@aduzsardi said in After upgrade to 23.01: unbound has a high CPU load resulting in DNS request timeouts:
rant:
there are quite a few problems with pfSense and DNS services , i just given up trying to chase them down, either high cpu usage, flaky dns service, dns not working at all on localhost on the firewall ... it just times out , domain overrides not working and so onnow running DNS server on a couple separate boxes, no issues at all
pfSense does show it's limits even on netgate hardware (we have a couple of 7100 appliances), they cram a lot of things into this firewall appliance for whatever reason ... i'm guessing if we need professional business level firewall that's what TNSR is forSome tcpdumps later: all clear
At the moment we have lots of DoS and bruteforce attacks here in Germany. We are delivering tanks and defense systems to Ukraine. Perhaps there is a connection with this? Don`t know, but I never saw something like this before.
With the regular VDSL or cable contract you get an asynchronous connection, e.g. 100mbps downsteam/40mbps upstream. I saw up to 6000 DNS requests per second, each of them had only some kB, but the answers were up to four times larger. Additionally with the smaller upstream bandwidth the complete internet connection was unusable.
The smaller Netgate SG boxes ran hot in this situation...
Since I activated geoip blocking for all countries except Germany and USA and added ~150 badbot blacklists into pfblockerNG there is a silence now... -
@getcom dang man! i feel for you. keep up the good work and keep those ruzzkies out !!!