Unbound crashing
-
on 2.41 i have very often no working, reachable DNS Service.
DNS Logs say
Nov 14 23:03:28 unbound 45240:0 info: start of service (unbound 1.6.6). Nov 14 23:02:57 unbound 45240:0 info: service stopped (unbound 1.6.6). Nov 14 23:02:57 unbound 45240:0 info: start of service (unbound 1.6.6). Nov 14 23:02:25 unbound 45240:0 info: service stopped (unbound 1.6.6). Nov 14 23:00:08 filterdns adding entry 2a02:26f0:6a:280::3d5 to pf table certbot for host acme-v01.api.letsencrypt.org Nov 14 23:00:08 filterdns adding entry 2a02:26f0:6a:293::3d5 to pf table certbot for host acme-v01.api.letsencrypt.org Nov 14 23:00:08 filterdns adding entry 104.74.107.171 to pf table certbot for host acme-v01.api.letsencrypt.org
filternds logs this every 15 minutes. maybe the same issue?
-
These aren't related to pfBlockerNG… Best to post in the DNS forum section....
-
Since installing pfBlockerNG, unbound crashes randomly and as best as I can tell it's somehow related to what's going on here:
Oct 19 18:17:27 php-fpm /index.php: Successful login for user 'admin' from: 192.168.3.6 Oct 19 18:00:00 php [pfBlockerNG] Starting cron process. Oct 19 17:00:00 php [pfBlockerNG] Starting cron process. Oct 19 16:00:00 php [pfBlockerNG] Starting cron process. Oct 19 15:04:03 check_reload_status Reloading filter Oct 19 15:04:03 php-fpm 48009 /rc.newwanipv6: Removing static route for monitor fe80::201:5cff:fe6f:5046 and adding a new route through fe80::201:5cff:fe6f:5046%igb1 Oct 19 15:04:03 php-fpm 48009 /rc.newwanipv6: Removing static route for monitor 8.8.8.8 and adding a new route through 10.8.1.181 Oct 19 15:04:03 php-fpm 48009 /rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::201:5cff:fe6f:5046%igb1 Oct 19 15:04:03 php-fpm 48009 /rc.newwanipv6: ROUTING: setting default route to comcast.ip.address Oct 19 15:04:03 php-fpm 48009 /rc.newwanipv6: The command '/sbin/route change -host att.ip.address' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host att.ip.address fib 0: Invalid argument' Oct 19 15:04:03 php-fpm 48009 /rc.newwanipv6: The command '/sbin/route change -host att.ip.address' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host att.ip.address fib 0: Invalid argument' Oct 19 15:04:03 php-fpm 48009 /rc.newwanipv6: The command '/sbin/route change -host comcast.ip.address' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host comcast.ip.address fib 0: Invalid argument' Oct 19 15:04:01 dhcpleases kqueue error: unkown Oct 19 15:04:01 php-fpm 48009 /rc.newwanipv6: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1508443441] unbound[60763:0] error: bind: address already in use [1508443441] unbound[60763:0] fatal error: could not open ports' Oct 19 15:04:00 dhcpleases /var/etc/hosts changed size from original! Oct 19 15:03:59 php-fpm 48009 /rc.newwanipv6: rc.newwanipv6: on (IP address: comcast.ip6.address) (interface: wan) (real interface: igb1). Oct 19 15:03:59 php-fpm 48009 /rc.newwanipv6: rc.newwanipv6: Info: starting on igb1. Oct 19 15:03:53 check_reload_status Reloading filter Oct 19 15:03:53 php-fpm 48009 /rc.newwanipv6: Removing static route for monitor fe80::201:5cff:fe6f:5046 and adding a new route through fe80::201:5cff:fe6f:5046%igb1 Oct 19 15:03:53 php-fpm 48009 /rc.newwanipv6: Removing static route for monitor 8.8.8.8 and adding a new route through 10.8.1.181 Oct 19 15:03:53 php-fpm 48009 /rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::201:5cff:fe6f:5046%igb1 Oct 19 15:03:53 php-fpm 48009 /rc.newwanipv6: ROUTING: setting default route to comcast.ip.address Oct 19 15:03:53 php-fpm 48009 /rc.newwanipv6: The command '/sbin/route change -host att.ip.address' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host att.ip.address fib 0: Invalid argument' Oct 19 15:03:53 php-fpm 48009 /rc.newwanipv6: The command '/sbin/route change -host att.ip.address' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host att.ip.address fib 0: Invalid argument' Oct 19 15:03:53 php-fpm 48009 /rc.newwanipv6: The command '/sbin/route change -host comcast.ip.address' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Invalid argument change host comcast.ip.address fib 0: Invalid argument' Oct 19 15:03:51 dhcpleases kqueue error: unkown Oct 19 15:03:50 dhcpleases Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process. Oct 19 15:03:50 dhcpleases /var/etc/hosts changed size from original! Oct 19 15:03:49 php-fpm 48009 /rc.newwanipv6: rc.newwanipv6: on (IP address: comcast.ip6.address) (interface: wan) (real interface: igb1). Oct 19 15:03:49 php-fpm 48009 /rc.newwanipv6: rc.newwanipv6: Info: starting on igb1. Oct 19 15:00:00 php [pfBlockerNG] Starting cron process.
Using pfSense 2.3.4 and pfBlocker 2.1.1_10.
Is it related to this: https://redmine.pfsense.org/issues/7326#note-2?
Thoughts?
Do you have DHCP setup to register new leases in DNS so that they resolve ? I had this setup and when a device got a DHCP lease it needed to restart unbound to get the hostname registered
-
I am tracking a similar issue though I’m on the beta (which is shaping up nicely btw) which also seems to be an unbound problem… how many domains in total are blocked in your DNSBL?
What I’m finding is that unbound is not crashing, it is hanging temporarily - if you record a network trace then it does eventually respond to all the requests, just not before the client times out. Delays in responses can be up to 70s but will all come through at once. I have found that if I dial down the number of blocked hosts then the problem goes away.
What I am seeing could just be unbound taking a long time to reload if a client added from DHCP by pfSense causes a restart, but that could be fixable as unbound does allow dynamic updates. I’ll give the source a once-over and see if I can see anything (if I ever get time.)
-
Having the same problem, using pfBlockerNG with Unbound. What I am seeing is that things are working fine up to a point (say, half an hour) and then Unbound starts taking up 99% of cpu resources.
I have a script that does nslookups for twenty Alexa-listed sites which, if all is well, takes 2-5 seconds to complete. If Unbound is too busy they just time out but sometimes halfway Unbound gets the message and where the first ten lookups would time out the rest just flies by.
I'm not sure if this is related but I've been tracking changes in /var/unbound/pfb_dnsbl.conf and after about five to seven changes in pfBlockerNG's DNSBL lists this file stops getting updated. Restarting Unbound doesn't help, the only thing that fixes it it completely removing pfBlockerNG and reinstalling and reconfiguring it. Then Unbound works fine for a while untill the whole thing starts again.
Bringing down the number of blocked hosts seems to slightly help but because of the problem described above it is annoying to find a maximum number.
This behaviour is reproducable across installations. Using pfSense 2.4.2-RELEASE.
Has this been reported as bug? If so, where I can track it? If not, where should I report it?
-
Unbound is a project in its own right and you can report the problem to the maintainers there. However, you’re going to have to do a lot more digging to establish what configuration triggers the issue and what it actually does (are you sure that unbound is going into a race condition or if something else is happening). They will need to know how to reproduce the issue - you aren’t even close yet.
The behaviour sounds similar to what I see, you should consider putting a network trace on the DNS port on the LAN and CPU monitoring to see if unbound is doing what you think it is, and get back to us. Either way can you tell us how many domains you’re blocking and what hardware spec you’re on? I’m wondering if I should get my hands dirty with a patch to DHCP for the host registration (to reduce reloading)? Could be fun…
-
@jasonau said in Unbound crashing:
Do you have DHCP setup to register new leases in DNS so that they resolve ? I had this setup and when a device got a DHCP lease it needed to restart unbound to get the hostname registered
I am aware of the fact that this thread is a bit old but I was experiencing the same (or at least a similar) problem.
I disabled (unticked) the following options:
- [DHCP Registration] :: Register DHCP leases in the DNS Resolver
- [Static DHCP] :: Register DHCP static mappings in the DNS Resolver
- [OpenVPN Clients] :: Register connected OpenVPN clients in the DNS Resolver
Unbound seems to work as it should.
Restarting unbound takes around 3 minutes on my mini firewall (APU3C4). I have the CRON job for pfBlockerNG on once a day during night time.
I am running:
- 2.4.4-RELEASE-p1 (amd64)
- Unbound 1.8.1
- pfBlockerNG-devel 2.2.5_19
-
@bouke said in Unbound crashing:
pfBlockerNG-devel 2.2.5_19
You can try the new "Live Sync" feature, which will update the changes on the fly without an Unbound Reload. But that is only when the package updates DNSBL/Unbound and not any DHCP updates to Unbound from the pfSense side.
-
@bouke said in Unbound crashing:
APU3C4
Limit the size of DNSBL tables so that there is more free memory available. On my 8GB box, it runs fine with around 1MB entries in DNSBL. With a 4GB box I had to remove big (100K+) lists to get decent performance.
-
@bbcan177 said in Unbound crashing:
@bouke said in Unbound crashing:
pfBlockerNG-devel 2.2.5_19
You can try the new "Live Sync" feature, which will update the changes on the fly without an Unbound Reload. But that is only when the package updates DNSBL/Unbound and not any DHCP updates to Unbound from the pfSense side.
Happy new year and thanks. Just enabled this feature on 01-01-2019 08:05 am (local time) ;-)