Unbound Failing after Short Outage
-
For some reason I am losing DNS Resolution after a short ISP outage. I'm hoping someone can give me some guidance as to how make sure that unbound restarts promptly when internet comes back after an outage. I have highlighted my questions.
I'm running pfSense community edition 2.6.0 on a generic J1900 box with Intel NICs and 4GB of RAM. Resource utilization doesn't seem to be an issue.
My wife complained of "no internet", and when I logged into pfSense and did a ping 8.8.8.8 I got ping replies so I restarted unbound. After doing so things were working again.
Here's my analysis of what I believe what happened (I have posted more complete logs below for context).
WAN em0 experienced a short outage:
Nov 1 10:46:21 fwrouter rc.gateway_alarm[8390]: >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:17.900ms RTTsd:2.951ms Loss:21%) Nov 1 10:51:46 fwrouter kernel: em0: link state changed to DOWN Nov 1 10:51:53 fwrouter kernel: em0: link state changed to UP
FYI this message is from a watchdog script that I built for my VPN:
Nov 1 10:50:01 fwrouter root[25004]: ***** 22/11/01-10:50:00-VPN_UDP_TOS_VPNV4-Attempting to issue restart to openvpn service... openvpn has been restarted.
I had trouble awhile back with the VPN losing connectivity, and a manual restart of the OpenVPN service was required to get it running agaion, so I just wrote a simple script that restarts the VPN every 5 minutes if it goes down as long as the WAN is up. Given the time stamp and the Gateway alarm, it appears as if connectivity was lost about 10:45
Unbound Crashed:
Nov 1 10:50:33 fwrouter php-fpm[47855]: /rc.linkup: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1667314233] unbound[36793:0] error: bind: address already in use [1667314233] unbound[36793:0] fatal error: could not open ports'
What is this? I assume it is due to the Unbound issue.
Nov 1 10:50:34 fwrouter php-fpm[47855]: /rc.linkup: The command '/usr/sbin/arp -s '10.10.50.1' '00:28:1a:XX:XX:X5'' returned exit code '1', the output was 'arp: writing to routing socket: Operation not permitted'
What is this? Does this mean Unbound is running out of cache memory?
debug: cache memory msg=516099 rrset=1547265 infra=578247 val=0Is this is because the link is down? Or is there likely something else going on?
Nov 1 10:48:36 guardian unbound[74982]: [74982:0] debug: request nleditor.osi.office.net. has exceeded the maximum number of glue fetches 60 to a single delegation point Nov 1 10:48:36 guardian unbound[74982]: [74982:0] debug: request nleditor.osi.office.net. has exceeded the maximum number of glue fetches 60 to a single delegation point
I have noticed a number of posts regarding unbound being "somewhat unreliable", but I was wondering if this is likely due to misconfiguration on my part and/or if there is a good workaround so I don't have to deal with a long term internet outage?
Any suggestions/assistance is much appreciated, and I have included more log info below.
Relevent syslog Entries:
Nov 1 10:46:21 fwrouter rc.gateway_alarm[8390]: >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:17.900ms RTTsd:2.951ms Loss:21%) Nov 1 10:46:21 fwrouter check_reload_status[396]: updating dyndns WAN_DHCP Nov 1 10:46:21 fwrouter check_reload_status[396]: Restarting IPsec tunnels Nov 1 10:46:21 fwrouter check_reload_status[396]: Restarting OpenVPN tunnels/interfaces Nov 1 10:46:21 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:46:21 fwrouter rc.gateway_alarm[9850]: >>> Gateway alarm: VPN_UDP_TOS_VPNV4 (Addr:8.8.4.4 Alarm:1 RTT:36.706ms RTTsd:2.263ms Loss:22%) Nov 1 10:46:21 fwrouter check_reload_status[396]: updating dyndns VPN_UDP_TOS_VPNV4 Nov 1 10:46:21 fwrouter check_reload_status[396]: Restarting IPsec tunnels Nov 1 10:46:21 fwrouter check_reload_status[396]: Restarting OpenVPN tunnels/interfaces Nov 1 10:46:21 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:46:22 fwrouter php-fpm[33266]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Nov 1 10:46:22 fwrouter php-fpm[33266]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Nov 1 10:46:22 fwrouter php-fpm[47855]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Nov 1 10:46:22 fwrouter php-fpm[47855]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use VPN_UDP_TOS_VPNV4. Nov 1 10:48:32 fwrouter kernel: em0: link state changed to DOWN Nov 1 10:48:32 fwrouter check_reload_status[396]: Linkup starting em0 Nov 1 10:48:33 fwrouter php-fpm[80175]: /rc.linkup: DEVD Ethernet detached event for wan Nov 1 10:48:34 fwrouter php-fpm[80175]: /rc.linkup: Removing static route for monitor 8.8.4.4 and adding a new route through 10.15.110.7 Nov 1 10:48:34 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:48:36 fwrouter rc.gateway_alarm[96558]: >>> Gateway alarm: VPN_UDP_TOS_VPNV4 (Addr:8.8.4.4 Alarm:1 RTT:0.000ms RTTsd:0.000ms Loss:100%) Nov 1 10:48:36 fwrouter check_reload_status[396]: updating dyndns VPN_UDP_TOS_VPNV4 Nov 1 10:48:36 fwrouter check_reload_status[396]: Restarting IPsec tunnels Nov 1 10:48:36 fwrouter check_reload_status[396]: Restarting OpenVPN tunnels/interfaces Nov 1 10:48:36 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:48:37 fwrouter php-fpm[33266]: /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Nov 1 10:48:37 fwrouter php-fpm[33266]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use VPN_UDP_TOS_VPNV4. Nov 1 10:48:39 fwrouter check_reload_status[396]: Linkup starting em0 Nov 1 10:48:39 fwrouter kernel: em0: link state changed to UP Nov 1 10:48:40 fwrouter php-fpm[47855]: /rc.linkup: DEVD Ethernet attached event for wan Nov 1 10:48:40 fwrouter php-fpm[47855]: /rc.linkup: HOTPLUG: Configuring interface wan Nov 1 10:49:00 fwrouter sshguard[15203]: Exiting on signal. Nov 1 10:49:00 fwrouter sshguard[86330]: Now monitoring attacks. Nov 1 10:49:21 fwrouter kernel: ovpnc1: link state changed to DOWN Nov 1 10:49:21 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:49:51 fwrouter check_reload_status[396]: rc.newwanip starting em0 Nov 1 10:49:51 fwrouter php-fpm[47855]: /rc.linkup: Gateway, none 'available' for inet6, use the first one configured. '' Nov 1 10:49:51 fwrouter check_reload_status[396]: Restarting IPsec tunnels Nov 1 10:49:52 fwrouter php-fpm[17928]: /rc.newwanip: rc.newwanip: Info: starting on em0. Nov 1 10:49:52 fwrouter php-fpm[17928]: /rc.newwanip: rc.newwanip: on (IP address: 99.XXX.XXX.XXX) (interface: WAN[wan]) (real interface: em0). Nov 1 10:49:52 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:49:58 fwrouter php-fpm[47855]: /rc.linkup: The command '/usr/sbin/arp -s '10.10.50.1' '00:28:1a:XX:XX:X5'' returned exit code '1', the output was 'arp: writing to routing socket: Operation not permitted' Nov 1 10:49:59 fwrouter check_reload_status[396]: updating dyndns wan Nov 1 10:50:00 fwrouter sshguard[86330]: Exiting on signal. Nov 1 10:50:00 fwrouter sshguard[82308]: Now monitoring attacks. Nov 1 10:50:01 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:50:01 fwrouter root[25004]: ***** 22/11/01-10:50:00-VPN_UDP_TOS_VPNV4-Attempting to issue restart to openvpn service... openvpn has been restarted. Nov 1 10:50:33 fwrouter php-fpm[47855]: /rc.linkup: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1667314233] unbound[36793:0] error: bind: address already in use [1667314233] unbound[36793:0] fatal error: could not open ports' Nov 1 10:50:34 fwrouter php-fpm[47855]: /rc.linkup: The command '/usr/sbin/arp -s '10.10.50.1' '00:28:1a:XX:XX:X5'' returned exit code '1', the output was 'arp: writing to routing socket: Operation not permitted' Nov 1 10:50:34 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:51:46 fwrouter kernel: em0: link state changed to DOWN Nov 1 10:51:46 fwrouter check_reload_status[396]: Linkup starting em0 Nov 1 10:51:47 fwrouter php-fpm[38020]: /rc.linkup: DEVD Ethernet detached event for wan Nov 1 10:51:48 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:51:53 fwrouter check_reload_status[396]: Linkup starting em0 Nov 1 10:51:53 fwrouter kernel: em0: link state changed to UP Nov 1 10:51:54 fwrouter php-fpm[91311]: /rc.linkup: DEVD Ethernet attached event for wan Nov 1 10:51:54 fwrouter php-fpm[91311]: /rc.linkup: HOTPLUG: Configuring interface wan Nov 1 10:53:06 fwrouter check_reload_status[396]: rc.newwanip starting em0 Nov 1 10:53:06 fwrouter php-fpm[91311]: /rc.linkup: Gateway, none 'available' for inet6, use the first one configured. '' Nov 1 10:53:06 fwrouter check_reload_status[396]: Restarting IPsec tunnels Nov 1 10:53:07 fwrouter php-fpm[80175]: /rc.newwanip: rc.newwanip: Info: starting on em0. Nov 1 10:53:07 fwrouter php-fpm[80175]: /rc.newwanip: rc.newwanip: on (IP address: 99.XXX.XXX.XXX) (interface: WAN[wan]) (real interface: em0). Nov 1 10:53:07 fwrouter check_reload_status[396]: Reloading filter Nov 1 10:53:12 fwrouter php-fpm[91311]: /rc.linkup: The command '/usr/sbin/arp -s '10.10.50.1' '00:28:1a:XX:XX:X5'' returned exit code '1', the output was 'arp: writing to routing socket: Operation not permitted' Nov 1 10:53:12 fwrouter check_reload_status[396]: updating dyndns wan Nov 1 10:53:46 fwrouter php-fpm[91311]: /rc.linkup: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1667314426] unbound[41349:0] error: bind: address already in use [1667314426] unbound[41349:0] fatal error: could not open ports' Nov 1 10:53:47 fwrouter php-fpm[91311]: /rc.linkup: The command '/usr/sbin/arp -s '10.10.50.1' '00:28:1a:XX:XX:X5'' returned exit code '1', the output was 'arp: writing to routing socket: Operation not permitted' Nov 1 10:53:47 fwrouter check_reload_status[396]: Reloading filter
Sample resolver.log Entries:
Nov 1 10:47:59 guardian unbound[74982]: [74982:2] debug: sending to target: <com.> 192.52.178.30#53 Nov 1 10:47:59 guardian unbound[74982]: [74982:2] debug: cache memory msg=516099 rrset=1547265 infra=578247 val=0 Nov 1 10:48:00 guardian unbound[74982]: [74982:3] debug: iterator[module 0] operate: extstate:module_wait_reply event:module_event_noreply Nov 1 10:48:00 guardian unbound[74982]: [74982:3] info: iterator operate: query canadaeast1-0.pushnp.svc.ms. A IN Nov 1 10:48:00 guardian unbound[74982]: [74982:3] info: processQueryTargets: canadaeast1-0.pushnp.svc.ms. A IN Nov 1 10:48:00 guardian unbound[74982]: [74982:3] info: sending query: canadaeast1-0.pushnp.svc.ms. A IN Nov 1 10:48:00 guardian unbound[74982]: [74982:3] debug: sending to target: <ms.> 185.28.194.194#53 Nov 1 10:48:00 guardian unbound[74982]: [74982:3] debug: cache memory msg=516099 rrset=1547265 infra=578247 val=0 Nov 1 10:48:00 guardian unbound[74982]: [74982:2] debug: cache memory msg=516099 rrset=1547265 infra=578247 val=0 Nov 1 10:48:00 guardian unbound[74982]: [74982:1] debug: cache memory msg=516099 rrset=1547265 infra=578247 val=0 Nov 1 10:48:00 guardian unbound[74982]: [74982:0] debug: iterator[module 0] operate: extstate:module_wait_reply event:module_event_noreply Nov 1 10:48:00 guardian unbound[74982]: [74982:0] info: iterator operate: query ssl.gstatic.com. A IN Nov 1 10:48:00 guardian unbound[74982]: [74982:0] info: processQueryTargets: ssl.gstatic.com. A IN Nov 1 10:48:00 guardian unbound[74982]: [74982:0] info: sending query: ssl.gstatic.com. A IN Nov 1 10:48:00 guardian unbound[74982]: [74982:0] debug: sending to target: <gstatic.com.> 216.239.36.10#53 Nov 1 10:48:00 guardian unbound[74982]: [74982:0] debug: cache memory msg=516099 rrset=1547265 infra=578247 val=0 Nov 1 10:48:01 guardian unbound[74982]: [74982:3] debug: iterator[module 0] operate: extstate:module_wait_reply event:module_event_noreply [SNIP] Nov 1 10:48:36 guardian unbound[74982]: [74982:0] info: sending query: nleditor.osi.office.net. A IN Nov 1 10:48:36 guardian unbound[74982]: [74982:0] debug: sending to target: <net.> 192.31.80.30#53 Nov 1 10:48:36 guardian unbound[74982]: [74982:0] debug: error sending query to auth server 192.31.80.30 port 53 Nov 1 10:48:36 guardian unbound[74982]: [74982:0] info: processQueryTargets: nleditor.osi.office.net. A IN Nov 1 10:48:36 guardian unbound[74982]: [74982:0] debug: request nleditor.osi.office.net. has exceeded the maximum number of glue fetches 60 to a single delegation point [SNIP] Nov 1 10:48:36 guardian unbound[74982]: [74982:0] info: sending query: . NS IN Nov 1 10:48:36 guardian unbound[74982]: [74982:0] debug: sending to target: <.> 193.0.14.129#53 Nov 1 10:48:36 guardian unbound[74982]: [74982:0] debug: error sending query to auth server 193.0.14.129 port 53 Nov 1 10:48:36 guardian unbound[74982]: [74982:0] info: processQueryTargets: . NS IN Nov 1 10:48:36 guardian unbound[74982]: [74982:0] debug: request . has exceeded the maximum number of glue fetches 34 to a single delegation point Nov 1 10:48:36 guardian unbound[74982]: [74982:0] info: sending query: . NS IN [SNIP] Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: processQueryTargets: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: new pside target D.ROOT-SERVERS.NET. A IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: iterator[module 0] operate: extstate:module_state_initial event:module_event_pass Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: iterator operate: query D.ROOT-SERVERS.NET. A IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: resolving D.ROOT-SERVERS.NET. A IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: priming . IN NS Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: return error response REFUSED Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: iterator[module 0] operate: extstate:module_wait_subquery event:module_event_pass Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: iterator operate: query . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: processQueryTargets: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: sending query: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: sending to target: <.> 199.9.14.201#53 Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: error sending query to auth server 199.9.14.201 port 53 Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: processQueryTargets: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: sending query: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: sending to target: <.> 192.203.230.10#53 Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: error sending query to auth server 192.203.230.10 port 53 Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: processQueryTargets: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: sending query: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: sending to target: <.> 192.36.148.17#53 Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: error sending query to auth server 192.36.148.17 port 53 Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: processQueryTargets: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: sending query: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: sending to target: <.> 192.33.4.12#53 Nov 1 10:49:00 guardian unbound[74982]: [74982:2] debug: error sending query to auth server 192.33.4.12 port 53 Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: processQueryTargets: . NS IN [SNIP] Nov 1 10:49:00 guardian unbound[74982]: [74982:1] debug: iterator[module 0] operate: extstate:module_wait_subquery event:module_event_pass Nov 1 10:49:00 guardian unbound[74982]: [74982:1] info: iterator operate: query m.root-servers.net. A IN Nov 1 10:49:00 guardian unbound[74982]: [74982:1] info: processQueryTargets: m.root-servers.net. A IN Nov 1 10:49:00 guardian unbound[74982]: [74982:1] debug: request has exceeded the maximum number of nxdomain nameserver lookups with 6 Nov 1 10:49:00 guardian unbound[74982]: [74982:1] debug: return error response SERVFAIL [SNIP] Nov 1 10:49:00 guardian unbound[74982]: [74982:3] info: sending query: . NS IN Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: processQueryTargets: shadow.mxrouting.net. AAAA IN Nov 1 10:49:00 guardian unbound[74982]: [74982:1] debug: sending to target: <org.> 199.19.54.1#53 Nov 1 10:49:00 guardian unbound[74982]: [74982:0] debug: sending to target: <.> 202.12.27.33#53 Nov 1 10:49:00 guardian unbound[74982]: [74982:0] debug: error sending query to auth server 202.12.27.33 port 53 Nov 1 10:49:00 guardian unbound[74982]: [74982:2] info: sending query: shadow.mxrouting.net. AAAA IN Nov 1 10:49:00 guardian unbound[74982]: [74982:1] debug: error sending query to auth server 199.19.54.1 port 53 Nov 1 10:49:00 guardian unbound[74982]: [74982:3] debug: sending to target: <.> 198.97.190.53#53
-
i was having similar symptoms for months after an isp change. elusive, intermittent failures. certain sites faring well, others almost unusable. then a day later, seemingly random redistribution of problematic sites and working sites.
figured out my isp was handing out ip6 and ip4 addresses to dhcp clients, while only allowing port 53 traffic over ip4 for some ungodly reason.
SOLVED: putting "do-ip6: no" in unbound.conf cleared everything right up. AAAA records still come back but everything port 53 up and downstream of my resolver now happens over ip4.
sadly i guess i cannot access any of the zero domains known to me with their authoritative nameservers only available via ip6 unless i use my isps cache (lazy bums actually only offer up google's in the dhcp response anyway; no thank you)