DNS crashing every ~ 36 hours or so and unbound has to be restarted.
-
I've been having an issue where DNS has been crapping out every other day or so and I'm having to restart the DNS (unbound?) Service in order to restore service.
The only thing I've really done in the time period before this began was configuring log forwarding to a Splunk instance...
I suppose it could be related and I might try turning it back off although that will hamper my ability to collect troubleshooting information.Can someone please provide any insight such as possible causes and processes or other things to focus on when looking through the syslogs?
-
Think I may have found something.
I see the following entries right before a complete absense of any logging data from unbound until the service is restarted.
12/6/19 3:13:42.000 AM Dec 6 03:13:39 unbound: [92636:0] fatal error: Could not read config file: /unbound.conf. Maybe try unbound -dd, it stays on the commandline to see more errors, or unbound-checkconf host = gatewaysource = udp:7001sourcetype = pfsense:unbound 12/6/19 3:13:42.000 AM Dec 6 03:13:39 unbound: [92636:0] fatal error: Could not read config file: /unbound.conf. Maybe try unbound -dd, it stays on the commandline to see more errors, or unbound-checkconf host = gatewaysource = udp:7001sourcetype = pfsense:unbound 12/6/19 3:13:42.000 AM Dec 6 03:13:39 unbound: [92636:0] notice: Restart of unbound 1.9.1.
-
@gawainxx said in DNS crashing every ~ 36 hours or so and unbound has to be restarted.:
host = gatewaysource = udp:7001sourcetype = pfsense:unboun
Hi,
Can you show the unbound.conf file ?
It's here : /var/unbound/unbound.conf(and not here in the root = /unbound.conf)
-
This post is deleted! -
@Gertjan
Sorry, I had a derp moment and as accessing the wrong server.########################## # Unbound Configuration ########################## ## # Server configuration ## server: chroot: /var/unbound username: "unbound" directory: "/var/unbound" pidfile: "/var/run/unbound.pid" use-syslog: yes port: 53 verbosity: 1 hide-identity: yes hide-version: yes harden-glue: yes do-ip4: yes do-ip6: yes do-udp: yes do-tcp: yes do-daemonize: yes module-config: "validator iterator" unwanted-reply-threshold: 0 num-queries-per-thread: 4096 jostle-timeout: 200 infra-host-ttl: 900 infra-cache-numhosts: 10000 outgoing-num-tcp: 10 incoming-num-tcp: 10 edns-buffer-size: 4096 cache-max-ttl: 86400 cache-min-ttl: 0 harden-dnssec-stripped: yes msg-cache-size: 4m rrset-cache-size: 8m num-threads: 2 msg-cache-slabs: 2 rrset-cache-slabs: 2 infra-cache-slabs: 2 key-cache-slabs: 2 outgoing-range: 4096 #so-rcvbuf: 4m auto-trust-anchor-file: /var/unbound/root.key prefetch: no prefetch-key: no use-caps-for-id: no serve-expired: no # Statistics # Unbound Statistics statistics-interval: 0 extended-statistics: yes statistics-cumulative: yes # TLS Configuration tls-cert-bundle: "/etc/ssl/cert.pem" # Interface IP(s) to bind to interface-automatic: yes interface: 0.0.0.0 interface: ::0 # Outgoing interfaces to be used # DNS Rebinding # For DNS Rebinding prevention private-address: 10.0.0.0/8 private-address: ::ffff:a00:0/104 private-address: 172.16.0.0/12 private-address: ::ffff:ac10:0/108 private-address: 169.254.0.0/16 private-address: ::ffff:a9fe:0/112 private-address: 192.168.0.0/16 private-address: ::ffff:c0a8:0/112 private-address: fd00::/8 private-address: fe80::/10 # Set private domains in case authoritative name server returns a Private IP address private-domain: "_msdcs.britannia2.local" domain-insecure: "_msdcs.britannia2.local" private-domain: "britannia2.local" domain-insecure: "britannia2.local" # Access lists include: /var/unbound/access_lists.conf # Static host entries include: /var/unbound/host_entries.conf # dhcp lease entries include: /var/unbound/dhcpleases_entries.conf # Domain overrides include: /var/unbound/domainoverrides.conf # Unbound custom options server: # Allow plex to work over LAN private-domain: "plex.direct" # Configuration for Britannia2.local with the PDC of mordred.britannia2.local local-data: "_ldap._tcp.your.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "_ldap._tcp.Default-First-Site-Name._sites.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "_ldap._tcp.pdc._msdcs.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "_ldap._tcp.gc._msdcs.britannia2.local 600 IN SRV 0 100 3268 mordred.britannia2.local" local-data: "_ldap._tcp.Default-First-Site-Name._sites.gc._msdcs.britannia2.local 600 IN SRV 0 100 3268 mordred.britannia2.local" local-data: "_ldap._tcp.30e36ab8-a6ac-4c64-85aa-0fbeb612a33b.domains._msdcs.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "d4f866aa-a210-4c29-81a2-ebb256bdef7d._msdcs.britannia2.local 600 IN CNAME mordred.britannia2.local" local-data: "_kerberos._tcp.dc._msdcs.britannia2.local 600 IN SRV 0 100 88 mordred.britannia2.local" local-data: "_kerberos._tcp.Default-First-Site-Name._sites.dc._msdcs.britannia2.local 600 IN SRV 0 100 88 mordred.britannia2.local" local-data: "_ldap._tcp.dc._msdcs.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "_ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "_kerberos._tcp.britannia2.local 600 IN SRV 0 100 88 mordred.britannia2.local" local-data: "_kerberos._tcp.Default-First-Site-Name._sites.britannia2.local 600 IN SRV 0 100 88 mordred.britannia2.local" local-data: "_gc._tcp.britannia2.local 600 IN SRV 0 100 3268 mordred.britannia2.local" local-data: "_gc._tcp.Default-First-Site-Name._sites.britannia2.local 600 IN SRV 0 100 3268 mordred.britannia2.local" local-data: "_kerberos._udp.britannia2.local 600 IN SRV 0 100 88 mordred.britannia2.local" local-data: "_kpasswd._tcp.britannia2.local 600 IN SRV 0 100 464 mordred.britannia2.local" local-data: "_kpasswd._udp.britannia2.local 600 IN SRV 0 100 464 mordred.britannia2.local" local-data: "_ldap._tcp.ForestDnsZones.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "_ldap._tcp.Default-First-Site-Name._sites.ForestDnsZones.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "_ldap._tcp.DomainDnsZones.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "_ldap._tcp.Default-First-Site-Name._sites.DomainDnsZones.britannia2.local 600 IN SRV 0 100 389 mordred.britannia2.local" local-data: "britannia2.local 600 IN A 192.168.4.5" local-data: "britannia2.local 600 IN A 192.168.4.5" local-data: "gc._msdcs.britannia2.local 600 IN A 192.168.4.5" local-data: "gc._msdcs.britannia2.local 600 IN A 192.168.4.5" local-data: "ForestDnsZones.britannia2.local 600 IN A 192.168.4.5" local-data: "ForestDnsZones.britannia2.local 600 IN A 192.168.4.5" local-data: "DomainDnsZones.britannia2.local 600 IN A 192.168.4.5" local-data: "DomainDnsZones.britannia2.local 600 IN A 192.168.4.5" ### # Remote Control Config ### include: /var/unbound/remotecontrol.conf
-
Looks pretty normal to me.
-
Do you have pfBlocker installer with DNS-BL enabled? I don't see it in the conf file but that would update the file potentially causing a problem.
Steve
-
@gawainxx said in DNS crashing every ~ 36 hours or so and unbound has to be restarted.:
config file
https://github.com/NLnetLabs/unbound/blob/e828d678bafb7ef0df32623f6883bc4bdc07dc5b/daemon/unbound.c#L664
The config file is actually ok /unbound.conf - this file named is prefixed with with chrooted dir.
The chroot went wrong ? => File system errors ? -
Died again, help plox!
-
Need more info to help further. What's logged in the system log when it fails? Or just before it fails?
-
@stephenw10 I'll grab those logs for you in a bit once I'm able to access my network again , am currently remote and am locked out due to the issue.
My logs are divided by process within Splunk.
Is there a specific process that would have the most relevant log data? -
Here are some logs, they are csvs renamed to .txt
Unbound logs from 10:40am - 3 PM
1576040675_650.txt
System" logs from 10:40am - 3 PM
1576040737_651.txtI've also purged the DC related entries in my unbound config to see if that perhaps makes a change as I really only use that for labs/training stuff. Also adjusted my firewall rules so that I can access the router webUI from it, would have saved myself a lot of headache if I could have just restarted it via Ovpn.
-
unbound is stopped and restarted.
More logs are needed to see which process is doing this. It could also be a hardware event like a "LINK UP / LINK UP"
Btw : this "plunked" unbound log is close to totally unreadable : possible to see the original one ?
And while testing, can snort be send on a holiday ? What is snort protecting ? -
@Gertjan said in DNS crashing every ~ 36 hours or so and unbound has to be restarted.:
File system errors ?
?
-
Yeah very hard to read that. You should export the snort logs separately and not log the main system log, that makes it much easier to see actual system events.
But anyway nothing seems to be logged there, not much to go on.Running a filesystem check is probably a good idea.
Steve
-
@stephenw10
Unfortunately the system logs have already looped and only go as far back as this morning.I've set up service watchdog to monitor the unbound process which will hopefully prevent the issue from causing extended outages while I work on getting it figured out. Snort is protecting my home network as well as a few miscellaneous things, mostly running for added security, I'm using one of the lighter pre-defined snort ruleset bundles.
I've exported the data from splunk in a raw format, perhaps that will be closer to the original?
Here are my logs from yesterday, the outage was around 10:56am, where there is an absolute absense of unbound log data until I had someone at home restart the server via console.
UnboundIssues_SystemLogs.txt
UnboundIssues_UnboundLogs.txt
UnboundIssues_SnortLogs.txt -
Nothing logged but that file error seems like a permissions issue.
I would definitely run the file system check. I would consider just reinstalling and restoring, it's usually pretty quick.
Steve
-
This post is deleted! -
Thanks,
Here are the things i'm currently planning to do in order, moving to the next one if I see the service failure in the logs afterwords.- Gutting all non-critical code from my unbound.conf (Awaiting results on this currently).
- SSHing to the router and running a filesystem check.
- Toggling snort
- Toggling Avahi
- Toggling NUT
- Reload and restore
Seem fair?
-
@gawainxx said in DNS crashing every ~ 36 hours or so and unbound has to be restarted.:
Thanks,
Here are the things i'm currently planning to do in order, moving to the next one if I see the service failure in the logs afterwords.- Gutting all non-critical code from my unbound.conf (Awaiting results on this currently).
- SSHing to the router and running a filesystem check.
- Toggling snort
- Toggling Avahi
- Toggling NUT
- Reload and restore
Seem fair?
Just an FYI. Snort and Unbound have absolutely nothing to do with each other in terms of Unbound starting or stopping. However, the DNSBL function of pfBlockerNG does rewrite the
unbound.conf
file and that can lead to Unbound issues.While troubleshooting it is certainly prudent to stop Snort to remove that variable, but Snort running or not will have no impact on Unbound stopping and failing to restart.