Unbound ipv6 problems anyone?



  • yeah I know you will find a solution that works for everyone, email me when ready with the patch I will test no problem. :)

    Your optional flag seems a good idea.



  • I'll try and find time tomorrow… if I can stop playing Elite Dangerous :o



  • Something else is afoot here. I have rebooted multiple times, using dhcp6 before RA, and I cannot make it fall over.

    Bimmer, is it possible that sometimes there is a slow response from your ISP for the v6 lease, hence the 'sometimes' its OK?

    It maybe that if that is the case that there is a bit of a catch 22 going on, in that rc.newwanipv6 cannot restart unbound as it's still in the booting up process. Also, and I've not delved into that bit of the code before, unbound is being killed and restarted, if already running, rather than a SIGHUP; I need to study that a bit more to see if there is a reason for that.

    • note *

    Having studied and read Jimp's note that unbound should be killed then that answers that question. Whilst there is already work going on in this area I will not mess with it, too many cooks….



  • Sorry for the slow response. I've been away all week… I'll try running wireshark to see what's happening. I suspect it will the same as it always was. I have a suspicion this problem has been present all along, but I haven't noticed it because unbound was getting restarted so often. Possibly now that it's not getting restarted, the problem is visible. That would explain why I didn't notice it before recently, along with the other changes.



  • I captured icmpv6 and dhcpv6 packets with wireshark during a reboot. Everything appears the way it always has. There are no significant delays in any of the replies.




  • I created bug #7750 "unbound refuses ipv6 queries after reboot" for this issue.



  • I just did another reboot to capture log info. It's consistent that unbound starts before the PD process has completed.

    Aug 2 13:12:15	dhcp6c	11947	add an address 2001:*:*:*:215:5dff:fe5c:e21e/64 on hn0
    Aug 2 13:12:15	dhcp6c	11947	dhcp6c Received REQUEST
    Aug 2 13:12:15	dhcp6c	11947	Sending Request
    Aug 2 13:12:14	dhcp6c	11947	Sending Solicit
    Aug 2 13:12:13	dhcp6c	11867	skip opening control port
    Aug 2 13:12:13	dhcp6c	11867	failed initialize control message authentication
    Aug 2 13:12:13	dhcp6c	11867	failed to open /usr/local/etc/dhcp6cctlkey: No such file or directory
    
    Aug 2 13:12:27	unbound	24093:1	info: response for . NS IN
    Aug 2 13:12:27	unbound	24093:1	info: priming . IN NS
    Aug 2 13:12:27	unbound	24093:1	info: resolving ipv6.msftconnecttest.com. A IN
    Aug 2 13:12:27	unbound	24093:1	info: priming . IN NS
    Aug 2 13:12:27	unbound	24093:1	info: resolving ipv6.msftconnecttest.com. AAAA IN
    Aug 2 13:12:27	unbound	24093:1	info: priming . IN NS
    Aug 2 13:12:27	unbound	24093:1	info: resolving 0.pfsense.pool.ntp.org. A IN
    Aug 2 13:12:14	unbound	24093:0	info: start of service (unbound 1.6.3).
    Aug 2 13:12:14	unbound	24093:0	notice: init module 1: iterator
    Aug 2 13:12:14	unbound	24093:0	notice: init module 0: validator
    


  • I've been trying to recreate this, as I'm also on Telus using dhcp6c before RA, but so far no luck, if I reboot from an upgrade or otherwise, everything still works fine. For curiosity sake, I even looked to see when my unbound is starting and it happens after I get my prefix. I'm running on the latest 2.4 snapshot. I'm in AB on FTTH with the T3200M bridged to port1 with a bare metal install on an SG-2440.

    
    Aug 2 17:10:24 	dhcp6c 	23456 	dhcp6c Received REQUEST
    Aug 2 17:10:24 	dhcp6c 	23456 	nameserver[0] 2001:568:ff09:10c::53
    Aug 2 17:10:24 	dhcp6c 	23456 	nameserver[1] 2001:568:ff09:10b::122
    Aug 2 17:10:24 	dhcp6c 	23456 	make an IA: PD-0
    Aug 2 17:10:24 	dhcp6c 	23456 	create a prefix 2001:56a:73d0:3c00::/56 pltime=140733193402432, vltime=14700
    Aug 2 17:10:24 	dhcp6c 	23456 	add an address 2001:56a:73d0:3c01:208:a3ff:fe0a:5ad6/64 on igb1_vlan101
    Aug 2 17:10:24 	dhcp6c 	23456 	executes /var/etc/dhcp6c_wan_dhcp6withoutra_script.sh 
    
    Aug 2 17:10:29 	unbound 	48227:0 	notice: init module 0: validator
    Aug 2 17:10:29 	unbound 	48227:0 	notice: init module 1: iterator
    Aug 2 17:10:29 	unbound 	48227:0 	info: start of service (unbound 1.6.3). 
    Aug 2 17:10:30 	unbound 	48227:0 	info: resolving s.youtube.com. A IN 
    


  • That's really strange. Your dhcp6c is running well ahead of mine, relative to unbound.

    I was on a T2200H but just switched to a T3200M. I didn't expect it would make any difference and it didn't.

    All unbound settings are default.

    I've tried a clean install in a fresh VM (running on hyper-v), but that didn't make any difference either.

    ![Capture 1.PNG_thumb](/public/imported_attachments/1/Capture 1.PNG_thumb)
    ![Capture 1.PNG](/public/imported_attachments/1/Capture 1.PNG)
    ![Capture 2.PNG](/public/imported_attachments/1/Capture 2.PNG)
    ![Capture 2.PNG_thumb](/public/imported_attachments/1/Capture 2.PNG_thumb)



  • I enabled dhcp debug mode to see if there would be any earlier messages.

    Here are the first messages (newest at the top):

    Aug 2 18:09:14	dhcp6c	11792	failed initialize control message authentication
    Aug 2 18:09:14	dhcp6c	11792	failed to open /usr/local/etc/dhcp6cctlkey: No such file or directory
    Aug 2 18:09:14	dhcp6c	11792	extracted an existing DUID from /var/db/dhcp6c_duid: 00:01:00:01:20:d5:a4:23:00:15:5d:5c:e2:1e
    

    Here are the last messages:

    Aug 2 18:09:19	dhcp6c	11840	got an expected reply, sleeping.
    Aug 2 18:09:19	dhcp6c	11840	removing server (ID: 00:03:00:01:0c:a4:02:29:50:01)
    Aug 2 18:09:19	dhcp6c	11840	removing an event on hn1, state=REQUEST
    Aug 2 18:09:19	dhcp6c	11840	script "/var/etc/dhcp6c_wan_dhcp6withoutra_script.sh" terminated
    Aug 2 18:09:19	dhcp6c		dhcp6c REQUEST on hn1 - running rc.newwanipv6
    Aug 2 18:09:17	dhcp6c	11840	executes /var/etc/dhcp6c_wan_dhcp6withoutra_script.sh
    Aug 2 18:09:17	dhcp6c	11840	add an address 2001:*:*:*:215:5dff:fe5c:e21e/64 on hn0
    Aug 2 18:09:17	dhcp6c	11840	create a prefix 2001:*:*:*::/56 pltime=140733193402432, vltime=14700
    Aug 2 18:09:17	dhcp6c	11840	make an IA: PD-0
    Aug 2 18:09:17	dhcp6c	11840	nameserver[1] 2001:568:ff09:10b::122
    Aug 2 18:09:17	dhcp6c	11840	nameserver[0] 2001:568:ff09:10a::55
    Aug 2 18:09:17	dhcp6c	11840	dhcp6c Received REQUEST
    

    Here are the first unbound messages:

    Aug 2 18:09:16	unbound	24030:0	info: start of service (unbound 1.6.3).
    Aug 2 18:09:16	unbound	24030:0	notice: init module 1: iterator
    Aug 2 18:09:16	unbound	24030:0	notice: init module 0: validator
    


  • It would appear that something is odd with your system Bimmer, otherwise Chris and Ned over here would be suffering also, and they are not having any issues.

    Time for you and Scott to do some comparisons :)

    So, let's start with the basics, Scott, what are you running pfSense  on?



  • @marjohn56:

    It would appear that something is odd with your system Bimmer, otherwise Chris and Ned over here would be suffering also, and they are not having any issues.

    Time for you and Scott to do some comparisons :)

    So, let's start with the basics, Scott, what are you running pfSense  on?

    Something is strange, that's for sure.

    Do you notice these messages in the dhcp log? They are the very first dhcp6c messages

    Aug 3 13:38:05	dhcp6c	10883	failed initialize control message authentication
    Aug 3 13:38:05	dhcp6c	10883	failed to open /usr/local/etc/dhcp6cctlkey: No such file or directory
    Aug 3 13:38:05	dhcp6c	10883	extracted an existing DUID from /var/db/dhcp6c_duid: 00:01:00:01:20:d5:a4:23:00:15:5d:5c:e2:1e
    Aug 3 13:38:02	dhclient		Creating resolv.conf
    

    I'm wondering if perhaps they are slowing down the execution of dhcp6c.



  • @marjohn56:

    So, let's start with the basics, Scott, what are you running pfSense  on?

    I'm running it on an official pfSense (Negate) SG-2440 appliance.

    @bimmerdriver:

    Do you notice these messages in the dhcp log? They are the very first dhcp6c messages

    I believe I have these messages in my log too, I will confirm and post when I get home from work.



  • Ignore those messages, they are in everyone's log and have been since day one. dhcp6c is looking for the a key that does not exist and will not exist as pfSense does not use that method of control, it's just noise in the log. I could stop it but it really is nothing.



  • @marjohn56:

    Ignore those messages, they are in everyone's log and have been since day one. dhcp6c is looking for the a key that does not exist and will not exist as pfSense does not use that method of control, it's just noise in the log. I could stop it but it really is nothing.

    If they are normal, I don't care that they are there. I was just wondering if dhcp6c was blocking momentarily because of some issue.



  • @scott83:

    @marjohn56:

    So, let's start with the basics, Scott, what are you running pfSense  on?

    I'm running it on an official pfSense (Negate) SG-2440 appliance.

    I'm running on a hyper-v server. The guest has 2000 MB of RAM and 2 processors. It's basically idling. The server is lightly loaded.

    @scott83:

    @bimmerdriver:

    Do you notice these messages in the dhcp log? They are the very first dhcp6c messages

    I believe I have these messages in my log too, I will confirm and post when I get home from work.

    It sounds like they are a non-issue.



  • @marjohn56:

    It would appear that something is odd with your system Bimmer, otherwise Chris and Ned over here would be suffering also, and they are not having any issues.

    Here is some further info that makes me wonder why dhcp6c and unbound messages start to appear in the log around the same time during the reboot. Not clear what is happening during the "Configuring WAN interface…done." phase, but it's a few steps before unbound is started, yet dhcp6c is only barely into the startup phase by the time unbound is starting.

    ![pfsense console.PNG](/public/imported_attachments/1/pfsense console.PNG)
    ![pfsense console.PNG_thumb](/public/imported_attachments/1/pfsense console.PNG_thumb)



  • And here are a few more log messages from general:

    Aug 4 09:19:22	php-fpm	291	/rc.newwanipv6: Removing static route for monitor fe80::ea4:2ff:fe29:5001 and adding a new route through fe80::ea4:2ff:fe29:5001%hn1
    Aug 4 09:19:22	php-fpm	291	/rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::ea4:2ff:fe29:5001%hn1
    Aug 4 09:19:22	php-fpm	291	/rc.newwanipv6: ROUTING: setting default route to 154.20.116.1
    Aug 4 09:19:22	php-fpm	291	/rc.newwanipv6: rc.newwanipv6: on (IP address: fe80::215:5dff:fe5c:e21d%hn1) (interface: wan) (real interface: hn1).
    Aug 4 09:19:22	php-fpm	291	/rc.newwanipv6: rc.newwanipv6: Info: starting on hn1.
    Aug 4 09:19:21	rtsold		Received RA specifying route fe80::ea4:2ff:fe29:5001 for interface wan(hn1)
    Aug 4 09:19:17	kernel		done.
    Aug 4 09:19:16	php-cgi		rc.bootup: ROUTING: setting default route to 154.20.116.1
    Aug 4 09:19:16	kernel		.done.
    Aug 4 09:19:15	kernel		pflog0: promiscuous mode enabled
    Aug 4 09:19:15	php-cgi		rc.bootup: Resyncing OpenVPN instances.
    Aug 4 09:19:15	kernel		hn0: link state changed to UP
    Aug 4 09:19:15	check_reload_status		Linkup starting hn0
    Aug 4 09:19:15	php-cgi		rc.bootup: Starting dhcp6 client for interface wan hn1 in DHCP6 without RA mode
    

    The first unbound log message appeared at 09:19:17.



  • scrapped my post as you already said its defaults, I would follow the advice given and compare ipv6 settings.



  • I'm using the same ipv6 settings that I've been using for months which are correct for my ISP. The only variable is whether or not to send a prefix hint and that makes no difference. Without the other settings are they are, the PD process will not complete.

    I had thought this was a new problem since the other recent changes to unbound, but I restarted my 2.3.4 system and noticed it does the same thing. As I said before, I think this slipped through the cracks before because unbound used to restart so often on its own. However, when I rebooted my 2.3.4 system, I checked nslookup google.com immediately after booting and it failed. After restarting unbound, it worked.

    Unless someone has another explanation, this problem appears to be caused by unbound starting before the PD process is complete.



  • I did a couple more packet captures with wireshark, two reboots from the 2.3.4 system and two from the 2.4 beta system.

    The 2.3.4 system took 2.34 seconds and 1.36 seconds. The 2.4 beta system took 4.84 seconds both times.

    Both systems have identical WAN ipv6 settings. It's interesting to notice that pfsense 2.3.4 appears to send the RS after sending the second dhcp request before the reply is received, rather than apparently waiting for the reply as pfsense 2.4 beta does. I have never noticed this behaviour before. It could be a coincidence.

    You can see that the link-local address of the edge router is the same in both cases, so any difference is from pfsense, not the edge router.



    ![2.4 B.PNG](/public/imported_attachments/1/2.4 B.PNG)
    ![2.4 B.PNG_thumb](/public/imported_attachments/1/2.4 B.PNG_thumb)



  • When I get some time to play I'll put something together for you. Don't know when that will be as real work is exceptionally busy at the moment.



  • I've issued a PR for this.



  • I installed the PR and I'm happy to confirm that it works perfectly.

    So you can see it in action, here are some log messages:

    Aug 11 10:23:32	php-cgi		rc.bootup: sync unbound done.
    Aug 11 10:23:31	php-cgi		rc.bootup: dhcp6 init complete. Continuing
    Aug 11 10:23:30	php-fpm	291	/rc.newwanipv6: Removing static route for monitor fe80::ea4:2ff:fe29:5001 and adding a new route through fe80::ea4:2ff:fe29:5001%hn1
    Aug 11 10:23:30	php-fpm	291	/rc.newwanipv6: ROUTING: setting IPv6 default route to fe80::ea4:2ff:fe29:5001%hn1
    Aug 11 10:23:30	php-fpm	291	/rc.newwanipv6: ROUTING: setting default route to *.*.116.1
    Aug 11 10:23:30	php-fpm	291	/rc.newwanipv6: rc.newwanipv6: on (IP address: fe80::215:5dff:fe5c:e21d%hn1) (interface: wan) (real interface: hn1).
    Aug 11 10:23:30	php-fpm	291	/rc.newwanipv6: rc.newwanipv6: Info: starting on hn1.
    Aug 11 10:23:30	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    Aug 11 10:23:29	rtsold		Received RA specifying route fe80::ea4:2ff:fe29:5001 for interface wan(hn1)
    Aug 11 10:23:29	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    Aug 11 10:23:28	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    Aug 11 10:23:27	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    Aug 11 10:23:33	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    Aug 11 10:23:32	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    Aug 11 10:23:31	php-cgi		rc.bootup: Unbound start waiting on dhcp6c.
    Aug 11 10:23:31	php-cgi		rc.bootup: ROUTING: setting default route to 154.20.116.1
    Aug 11 10:23:30	php-cgi		rc.bootup: Resyncing OpenVPN instances.
    Aug 11 10:23:30	kernel		hn0: link state changed to UP
    Aug 11 10:23:30	check_reload_status		Linkup starting hn0
    Aug 11 10:23:30	php-cgi		rc.bootup: Starting dhcp6 client for interface wan hn1 in DHCP6 without RA mode
    Aug 11 10:23:28	php-fpm	291	/rc.newwanip: rc.newwanip: on (IP address: *.*.118.8) (interface: WAN[wan]) (real interface: hn1).
    Aug 11 10:23:28	php-fpm	291	/rc.newwanip: rc.newwanip: Info: starting on hn1.
    Aug 11 10:23:27	php-cgi		rc.bootup: Accept router advertisements on interface hn1
    Aug 11 10:23:27	php-cgi		rc.bootup: calling interface_dhcpv6_configure.
    Aug 11 10:23:27	check_reload_status		rc.newwanip starting hn1
    

    Thank you very much for fixing this problem.



  • Apart from the weird time stamps.  8)



  • @marjohn56:

    Apart from the weird time stamps.  8)

    I never noticed that. I guess that's a bug in the log.



  • @marjohn56:

    Apart from the weird time stamps.  8)

    You've cracked time travel as well  :-)



  • Just to be sure I did test this and seems to not break anything on uk sky connections.  So good from my end.



  • bimmerdriver, is it possible that you have trouble with hardware clock?  ???



  • @nivek1612:

    @marjohn56:

    Apart from the weird time stamps.  8)

    You've cracked time travel as well  :-)

    I rebooted and took another look at the logs. I guess the time travel was a once-only event. Strange.



  • Maybe NTP daemon syncs sometimes in the same time unbound starts?



  • I think the time traveller idea is more fun. 8)


Log in to reply