Unbound ipv6 problems anyone?
-
@w0w:
I think it can be related to https://forum.pfsense.org/index.php?topic=133129.0 not directly but somehow…
I don't see any other changes regarding unbound last days.This has been going on for more than a month, so it's not just from the past few days.
-
not on latest snapshot but am using 2.4, no issues with ipv6 lookups using unbound resolver.
-
I just updated and for the first time in a long time, nslookup and dig both worked without restarting unbound. However, I rebooted again to see if it would stick and unfortunately, both were broken again until after I restarted unbound.
-
I suggest you examine your logs to look for indicators of whats going on.
In unbound settings, disable options to bind to specific interfaces (so it binds to everything) as that can cause the problem you are reporting, basically it starts before ipv6 is up so doesnt bind to it.
-
I noticed watching the console while the system was rebooting that unbound starts well before dhcpv6. I am using the do not wait for RA setting, so PD does not happen until dhcp6c starts. That may be a problem, because dhcp6c and unbound are starting almost concurrently, as you can see from the log messages (in reverse chronological order).
All of the unbound settings are default. I have not enabled any options to bind to specific interfaces.
Here are the first few log messages from unbound after booting (note error messages):
Jul 18 18:44:30 unbound 23934:1 info: query response was ANSWER Jul 18 18:44:30 unbound 23934:1 info: reply from <.> 192.33.4.12#53 Jul 18 18:44:30 unbound 23934:1 info: response for . DNSKEY IN Jul 18 18:44:30 unbound 23934:1 info: resolving . DNSKEY IN Jul 18 18:44:30 unbound 23934:1 info: prime trust anchor Jul 18 18:44:30 unbound 23934:1 info: query response was NXDOMAIN ANSWER Jul 18 18:44:30 unbound 23934:1 info: reply from <.> 192.33.4.12#53 Jul 18 18:44:30 unbound 23934:1 info: response for localdomain. AAAA IN Jul 18 18:44:30 unbound 23934:1 info: priming successful for . NS IN Jul 18 18:44:30 unbound 23934:1 info: query response was ANSWER Jul 18 18:44:30 unbound 23934:1 info: reply from <.> 192.36.148.17#53 Jul 18 18:44:30 unbound 23934:1 info: response for . NS IN Jul 18 18:44:29 unbound 23934:1 info: error sending query to auth server 2001:500:84::b port 53 Jul 18 18:44:29 unbound 23934:1 info: error sending query to auth server 2001:500:1::53 port 53 Jul 18 18:44:29 unbound 23934:1 info: error sending query to auth server 2001:500:2f::f port 53 Jul 18 18:44:29 unbound 23934:1 info: priming . IN NS Jul 18 18:44:29 unbound 23934:1 info: resolving localdomain. AAAA IN Jul 18 18:44:27 unbound 23934:0 info: start of service (unbound 1.6.3). Jul 18 18:44:27 unbound 23934:0 notice: init module 1: iterator Jul 18 18:44:27 unbound 23934:0 notice: init module 0: validator
Here are the first few log messages from dhcp after booting:
Jul 18 18:44:28 dhcp6c 12378 add an address 2001:a:b:c:215:5dff:fe5c:e21e/64 on hn0 Jul 18 18:44:28 dhcp6c 12378 dhcp6c Received REQUEST Jul 18 18:44:27 dhcp6c 12378 Sending Request Jul 18 18:44:26 dhcp6c 12378 Sending Solicit Jul 18 18:44:25 dhcp6c 12058 skip opening control port Jul 18 18:44:25 dhcp6c 12058 failed initialize control message authentication Jul 18 18:44:25 dhcp6c 12058 failed to open /usr/local/etc/dhcp6cctlkey: No such file or directory Jul 18 18:44:22 dhclient Creating resolv.conf Jul 18 18:44:22 dhclient /sbin/route add default 154.20.116.1 Jul 18 18:44:22 dhclient Adding new routes to interface: hn1 Jul 18 18:44:22 dhclient New Routers (hn1): 154.20.116.1 Jul 18 18:44:22 dhclient New Broadcast Address (hn1): 154.20.119.255 Jul 18 18:44:22 dhclient New Subnet Mask (hn1): 255.255.252.0 Jul 18 18:44:22 dhclient New IP Address (hn1): 154.20.118.8 Jul 18 18:44:22 dhclient ifconfig hn1 inet 154.20.118.8 netmask 255.255.252.0 broadcast 154.20.119.255 Jul 18 18:44:22 dhclient Starting add_new_address() Jul 18 18:44:22 dhclient REBOOT Jul 18 18:44:22 dhclient PREINIT
Here are the first few log messages from unbound after restarting unbound (note no error messages):
Jul 18 18:47:35 unbound 90108:1 info: query response was REFERRAL Jul 18 18:47:35 unbound 90108:1 info: reply from <.> 198.97.190.53#53 Jul 18 18:47:35 unbound 90108:1 info: response for ns1.netgate.com. AAAA IN Jul 18 18:47:35 unbound 90108:1 info: query response was REFERRAL Jul 18 18:47:35 unbound 90108:1 info: reply from <.> 2001:7fd::1#53 Jul 18 18:47:35 unbound 90108:1 info: response for ns1.netgate.com. A IN Jul 18 18:47:34 unbound 90108:1 info: query response was REFERRAL Jul 18 18:47:34 unbound 90108:1 info: reply from <.> 2001:500:12::d0d#53 Jul 18 18:47:34 unbound 90108:1 info: response for ns2.netgate.com. A IN Jul 18 18:47:34 unbound 90108:1 info: resolving ns1.netgate.com. A IN Jul 18 18:47:34 unbound 90108:1 info: resolving ns2.netgate.com. AAAA IN Jul 18 18:47:34 unbound 90108:1 info: resolving ns2.netgate.com. A IN Jul 18 18:47:34 unbound 90108:1 info: resolving ns1.netgate.com. AAAA IN Jul 18 18:47:34 unbound 90108:1 info: query response was REFERRAL Jul 18 18:47:34 unbound 90108:1 info: reply from <org.> 2001:500:e::1#53 Jul 18 18:47:34 unbound 90108:1 info: response for _https._tcp.beta.pfsense.org. SRV IN Jul 18 18:47:34 unbound 90108:1 info: query response was REFERRAL Jul 18 18:47:34 unbound 90108:1 info: reply from <.> 192.5.5.241#53 Jul 18 18:47:34 unbound 90108:1 info: response for _https._tcp.beta.pfsense.org. SRV IN Jul 18 18:47:34 unbound 90108:1 info: priming successful for . NS IN Jul 18 18:47:34 unbound 90108:1 info: query response was ANSWER Jul 18 18:47:34 unbound 90108:1 info: reply from <.> 2001:500:12::d0d#53 Jul 18 18:47:34 unbound 90108:1 info: response for . NS IN Jul 18 18:47:34 unbound 90108:1 info: priming . IN NS Jul 18 18:47:34 unbound 90108:1 info: resolving _https._tcp.beta.pfsense.org. SRV IN Jul 18 18:47:32 unbound 90108:0 info: start of service (unbound 1.6.3). Jul 18 18:47:32 unbound 90108:0 notice: init module 1: iterator Jul 18 18:47:32 unbound 90108:0 notice: init module 0: validator</org.>
-
It would not be difficult to force Unbound to restart on a valid update from dhcp6c, the changes implemented a while back means that it's now possible to differentiate between an a refresh or new lease. However, Chris was having a lot of issues with unbound that I think are now fine and I am not sure if an Unbound restart on a new lease might bring his problems back.
If you like, I can have a look and send out a patch to you and Chris to see if it works for both of you.
The other option is a flag to enable or disable an unbound restart on a new ipv6 lease.
Thoughts.
-
please no :)
unbound restarting on every dhcp6 renew is crazy.
As long as the renew keeps the same prefix/ip I dont see why unbound would have a problem.
Martin I will test the patch tho. As I expect a new lease wont be triggered on a renew, as long as you also exclude a rebind.
-
:) I did not intend to restart it on a renew or rebind, that would be silly.
-
yeah I know you will find a solution that works for everyone, email me when ready with the patch I will test no problem. :)
Your optional flag seems a good idea.
-
I'll try and find time tomorrow… if I can stop playing Elite Dangerous :o
-
Something else is afoot here. I have rebooted multiple times, using dhcp6 before RA, and I cannot make it fall over.
Bimmer, is it possible that sometimes there is a slow response from your ISP for the v6 lease, hence the 'sometimes' its OK?
It maybe that if that is the case that there is a bit of a catch 22 going on, in that rc.newwanipv6 cannot restart unbound as it's still in the booting up process. Also, and I've not delved into that bit of the code before, unbound is being killed and restarted, if already running, rather than a SIGHUP; I need to study that a bit more to see if there is a reason for that.
- note *
Having studied and read Jimp's note that unbound should be killed then that answers that question. Whilst there is already work going on in this area I will not mess with it, too many cooks….
-
Sorry for the slow response. I've been away all week… I'll try running wireshark to see what's happening. I suspect it will the same as it always was. I have a suspicion this problem has been present all along, but I haven't noticed it because unbound was getting restarted so often. Possibly now that it's not getting restarted, the problem is visible. That would explain why I didn't notice it before recently, along with the other changes.
-
I captured icmpv6 and dhcpv6 packets with wireshark during a reboot. Everything appears the way it always has. There are no significant delays in any of the replies.
-
I created bug #7750 "unbound refuses ipv6 queries after reboot" for this issue.
-
I just did another reboot to capture log info. It's consistent that unbound starts before the PD process has completed.
Aug 2 13:12:15 dhcp6c 11947 add an address 2001:*:*:*:215:5dff:fe5c:e21e/64 on hn0 Aug 2 13:12:15 dhcp6c 11947 dhcp6c Received REQUEST Aug 2 13:12:15 dhcp6c 11947 Sending Request Aug 2 13:12:14 dhcp6c 11947 Sending Solicit Aug 2 13:12:13 dhcp6c 11867 skip opening control port Aug 2 13:12:13 dhcp6c 11867 failed initialize control message authentication Aug 2 13:12:13 dhcp6c 11867 failed to open /usr/local/etc/dhcp6cctlkey: No such file or directory
Aug 2 13:12:27 unbound 24093:1 info: response for . NS IN Aug 2 13:12:27 unbound 24093:1 info: priming . IN NS Aug 2 13:12:27 unbound 24093:1 info: resolving ipv6.msftconnecttest.com. A IN Aug 2 13:12:27 unbound 24093:1 info: priming . IN NS Aug 2 13:12:27 unbound 24093:1 info: resolving ipv6.msftconnecttest.com. AAAA IN Aug 2 13:12:27 unbound 24093:1 info: priming . IN NS Aug 2 13:12:27 unbound 24093:1 info: resolving 0.pfsense.pool.ntp.org. A IN Aug 2 13:12:14 unbound 24093:0 info: start of service (unbound 1.6.3). Aug 2 13:12:14 unbound 24093:0 notice: init module 1: iterator Aug 2 13:12:14 unbound 24093:0 notice: init module 0: validator
-
I've been trying to recreate this, as I'm also on Telus using dhcp6c before RA, but so far no luck, if I reboot from an upgrade or otherwise, everything still works fine. For curiosity sake, I even looked to see when my unbound is starting and it happens after I get my prefix. I'm running on the latest 2.4 snapshot. I'm in AB on FTTH with the T3200M bridged to port1 with a bare metal install on an SG-2440.
Aug 2 17:10:24 dhcp6c 23456 dhcp6c Received REQUEST Aug 2 17:10:24 dhcp6c 23456 nameserver[0] 2001:568:ff09:10c::53 Aug 2 17:10:24 dhcp6c 23456 nameserver[1] 2001:568:ff09:10b::122 Aug 2 17:10:24 dhcp6c 23456 make an IA: PD-0 Aug 2 17:10:24 dhcp6c 23456 create a prefix 2001:56a:73d0:3c00::/56 pltime=140733193402432, vltime=14700 Aug 2 17:10:24 dhcp6c 23456 add an address 2001:56a:73d0:3c01:208:a3ff:fe0a:5ad6/64 on igb1_vlan101 Aug 2 17:10:24 dhcp6c 23456 executes /var/etc/dhcp6c_wan_dhcp6withoutra_script.sh Aug 2 17:10:29 unbound 48227:0 notice: init module 0: validator Aug 2 17:10:29 unbound 48227:0 notice: init module 1: iterator Aug 2 17:10:29 unbound 48227:0 info: start of service (unbound 1.6.3). Aug 2 17:10:30 unbound 48227:0 info: resolving s.youtube.com. A IN
-
That's really strange. Your dhcp6c is running well ahead of mine, relative to unbound.
I was on a T2200H but just switched to a T3200M. I didn't expect it would make any difference and it didn't.
All unbound settings are default.
I've tried a clean install in a fresh VM (running on hyper-v), but that didn't make any difference either.
![Capture 1.PNG_thumb](/public/imported_attachments/1/Capture 1.PNG_thumb)
![Capture 1.PNG](/public/imported_attachments/1/Capture 1.PNG)
![Capture 2.PNG](/public/imported_attachments/1/Capture 2.PNG)
![Capture 2.PNG_thumb](/public/imported_attachments/1/Capture 2.PNG_thumb) -
I enabled dhcp debug mode to see if there would be any earlier messages.
Here are the first messages (newest at the top):
Aug 2 18:09:14 dhcp6c 11792 failed initialize control message authentication Aug 2 18:09:14 dhcp6c 11792 failed to open /usr/local/etc/dhcp6cctlkey: No such file or directory Aug 2 18:09:14 dhcp6c 11792 extracted an existing DUID from /var/db/dhcp6c_duid: 00:01:00:01:20:d5:a4:23:00:15:5d:5c:e2:1e
Here are the last messages:
Aug 2 18:09:19 dhcp6c 11840 got an expected reply, sleeping. Aug 2 18:09:19 dhcp6c 11840 removing server (ID: 00:03:00:01:0c:a4:02:29:50:01) Aug 2 18:09:19 dhcp6c 11840 removing an event on hn1, state=REQUEST Aug 2 18:09:19 dhcp6c 11840 script "/var/etc/dhcp6c_wan_dhcp6withoutra_script.sh" terminated Aug 2 18:09:19 dhcp6c dhcp6c REQUEST on hn1 - running rc.newwanipv6 Aug 2 18:09:17 dhcp6c 11840 executes /var/etc/dhcp6c_wan_dhcp6withoutra_script.sh Aug 2 18:09:17 dhcp6c 11840 add an address 2001:*:*:*:215:5dff:fe5c:e21e/64 on hn0 Aug 2 18:09:17 dhcp6c 11840 create a prefix 2001:*:*:*::/56 pltime=140733193402432, vltime=14700 Aug 2 18:09:17 dhcp6c 11840 make an IA: PD-0 Aug 2 18:09:17 dhcp6c 11840 nameserver[1] 2001:568:ff09:10b::122 Aug 2 18:09:17 dhcp6c 11840 nameserver[0] 2001:568:ff09:10a::55 Aug 2 18:09:17 dhcp6c 11840 dhcp6c Received REQUEST
Here are the first unbound messages:
Aug 2 18:09:16 unbound 24030:0 info: start of service (unbound 1.6.3). Aug 2 18:09:16 unbound 24030:0 notice: init module 1: iterator Aug 2 18:09:16 unbound 24030:0 notice: init module 0: validator
-
It would appear that something is odd with your system Bimmer, otherwise Chris and Ned over here would be suffering also, and they are not having any issues.
Time for you and Scott to do some comparisons :)
So, let's start with the basics, Scott, what are you running pfSense on?
-
@marjohn56:
It would appear that something is odd with your system Bimmer, otherwise Chris and Ned over here would be suffering also, and they are not having any issues.
Time for you and Scott to do some comparisons :)
So, let's start with the basics, Scott, what are you running pfSense on?
Something is strange, that's for sure.
Do you notice these messages in the dhcp log? They are the very first dhcp6c messages
Aug 3 13:38:05 dhcp6c 10883 failed initialize control message authentication Aug 3 13:38:05 dhcp6c 10883 failed to open /usr/local/etc/dhcp6cctlkey: No such file or directory Aug 3 13:38:05 dhcp6c 10883 extracted an existing DUID from /var/db/dhcp6c_duid: 00:01:00:01:20:d5:a4:23:00:15:5d:5c:e2:1e Aug 3 13:38:02 dhclient Creating resolv.conf
I'm wondering if perhaps they are slowing down the execution of dhcp6c.