IPv6 changes in 2.2.5
-
Try to reset modem and reboot pfsense.
-
@cmb:
There were fixes for PPP and IPv6 which were well tested by us and others. Specifically "Correct handling of SLAAC, DHCP6 and DHCP-PD with PPP interfaces" sounds like it might be most relevant. But that fixed up some edge cases that would have been problematic in that situation before, like the disconnect/reconnect potentially (and definitely upon loss and regain of NIC link).
https://doc.pfsense.org/index.php?title=2.2.5_New_Features_and_Changes#InterfacesI'll hold my hand up as the author of that code. As cmb says, it was well tested, and it fixed definite brokenness in my environment as previously pfSense made the false assumption that IPv6 would just start to work again following link up -> link down -> link up transitions.
Unfortunately, there is the possibility of some timing related issues with this code as it's trying to be all things to all people - it has important work to do but has limited scope to wait to try to ensure safety, as additional delay leaves traffic flowing over the interface that pfSense isn't ready to handle, causing packet loss and errors in the PPP log (I experimented at some length in the development phase).
If you want to return to the pre-2.2.5 behaviour, edit /usr/local/sbin/ppp-linkup and /usr/local/sbin/ppp-linkdown, in each case putting a # before the call to /usr/local/sbin/ppp-ipv6. If that fixes your problem, the new code is causing your problem, but the previous behaviour was actually more broken - that is, my code needs further refining, not reverting. I had limited options in terms of what I could do because of the ways IPv6 is handled in /etc/inc/interfaces.inc . I documented the design decisions in https://github.com/pfsense/pfsense/pull/1961 and in the comments in the ppp-ipv6 script.
There is a nasty 'immediately after boot' issue with IPv6 PPPoE that I never got round to characterising and reporting, which is not caused by the new code in 2.2.5 but might interact negatively with it. On my system at least, with the PPPoE parent interface on a vlan, the MAC address of the PPPoE interface is random for the first connection after boot, resulting in a random link local address for the PPPoE interface. Once the system is booted, this settles down to a predictable MAC address based on the system hardware and PPPoE works reliably. It is possible that the new code in 2.2.5 allows the random link local address to be used for IPV6CP, but the predictable link local address for dhcp6c, which will likely result in strange IPv6 brokenness.
The workround for this issue is to Disconnect the PPPoE interface in Status -> Interfaces, then Connect it again. All will then work properly until the system is next booted.
This really needs to be on redmine - I just didn't get round to it.
-
Thanks David, was hoping you'd chime in with thoughts.
M_Devil: Making the change he suggested is a good first step, see if that changes the behavior.
-
Hi,
Apologies if this is hijacking the thread but it seems like the best place to put this.
I have just upgraded to 2.2.5 and while my IPv6 is still working I have a new error in the system logs. The changes in 2.2.5 have fixed some of the error messages I was getting in the system logs I see this every 30 minutes:
Nov 7 10:06:48 php-fpm[69022]: /rc.newwanipv6: The command '/sbin/route change -host -inet6 2001:44b8:1::1 fe80::xxx:xxxx:xxxx:bc00' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Network is unreachable change host 2001:44b8:1::1: gateway fe80::xxx:xxxx:xxxx:bc00 fib 0: Network is unreachable' Nov 7 10:06:48 php-fpm[69022]: /rc.newwanipv6: The command '/sbin/route change -host -inet6 2001:44b8:2::2 fe80::xxx:xxxx:xxxx:bc00' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Network is unreachable change host 2001:44b8:2::2: gateway fe80::xxx:xxxx:xxxx:bc00 fib 0: Network is unreachable' Nov 7 10:06:48 php-fpm[69022]: /rc.newwanipv6: ROUTING: setting default route to yyy.yyy.yyy.yyy
2001:44b8:1::1 & 2001:44b8:2::2 are the DNS servers for my ISP.
fe80::xxx:xxxx:xxxx:bc00 is listed as the gateway address for the WAN_DHCP6 gateway in the GUI.
yyy.yyy.yyy.yyy is listed as the gateway address for the WAN_PPPOE gateway in the GUI (IPv4 address).
My connection is using PPPoE.
Any thoughts on this? I had a look at the code for rc.newwanipv6 but the route change command must be called by an external library as it doesn't seem to be there. I'm wondering if a piece of code is getting confused by the PPPoE connection as the routing table has fe80::xxx:xxxx:xxxx:bc00%pppoe0 as the default route gateway not just fe80::xxx:xxxx:xxxx:bc00.
Any thoughts on this would be appreciated. As I said everything IPv6 seems to be working but I'd like to get this error out of the logs.
Greg
-
It is possible that the new code in 2.2.5 allows the random link local address to be used for IPV6CP, but the predictable link local address for dhcp6c, which will likely result in strange IPv6 brokenness.
On further examination, this does not appear possible. The link local address of the PPPoE interface will not change unless and until the interface is destroyed and recreated, which can only happen via an explicit Disconnect / Connect cycle. The ppp-ipv6 script will only delete autoconf IPv6 addresses that are about to become stale on a link down event - it does not affect link local addresses in any way.
One problem I had when writing the ppp-ipv6 script is that, on link up, it needs to call interface_dhcpv6_configure() if and only if this function hasn't been called since the link last went down. When a PPPoE interface is first created, this function is called by interface_configure() and must not be called again by ppp-linkup. Following a link down -> link up transition, ppp-linkup needs to call interface_dhcpv6_configure() in order to restart dhcp6c.
The work round I used was to test whether ACCEPT_RTADV is unset on the interface. This option is set on the interface towards the end of interface_dhcpv6_configure() in order to allow rtsold to receive an RA. ACCEPT_RTADV is unset shortly after interface creation in interface_configure() and by the code called on link down in ppp-ipv6. If ppp-ipv6 is called on link up with ACCEPT_RTADV unset, this indicates the need to call interface_dhcpv6_configure().
There is the possibility of this logic failing. Depending on the value of the net.inet6.ip6.accept_rtadv sysctl, ACCEPT_RTADV can be set by default on interface creation before being removed later in interface_configure(). (As an aside, I question why interface_dhcpv6_configure() sets this sysctl to 1, creating the possibility of unwanted RA reception. As sysctl -d notes, this sysctl is the "Default value of per-interface flag for accepting ICMPv6 RouterAdvertisement messages" and I don't believe setting this sysctl to 1 is a pre-requisite for RA reception. ifconfig <interface>inet6 accept_rtadv should be all that is needed.)
It might be an explicit flag - a file in /tmp - would be better than this logic, but I can't see how a race condition on initial PPPoE connection is possible, as ifconfig <interface>inet6 -accept_rtadv is called fairly early in interface_configure(), long before the call to interface_ppps_configure() that configures and launches the mpd5 daemon (setting in process an eventual call to ppp-ipv6 via ppp-linkup).
The approach I adopted aimed to avoid any edits to /etc/inc/interfaces.inc . If I was to change /etc/inc/interfaces.inc, I would suggest changing interface_configure() so that it does not call interface_dhcpv6_configure() for a PPPoE interface, allowing ppp-ipv6 to call interface_dhcpv6_configure() unconditionally once mpd5 signals IPv6 link up.
If anyone having problems uses SLAAC, DHCPv6 and/or DHCP-PD over PPPoE, it would be interesting to see the output of the following commands (by PM if you don't want to clutter this thread):
clog /var/log/ppp.log | grep -A 1 -E -e 'IPV6CP: LayerUp' | tail -n 2
ifconfig pppoe0 inet6 | grep -E -e '( fe80::|nd6)'
ps -auwwx | grep -E -e '(dhcp6c|rtsold)'
clog /var/log/dhcpd.log | grep dhcp6c | tail -n 40The first command displays the link local addresses of the PPPoE interface and the gateway as negotiated by IPV6CP. (N.B. it's a number 1 after -A, not letter l).
The second command displays the link local address of the PPPoE interface now. If you remove the fe80:: prefix and the %pppoe0 scope, it should be the same as the left hand side of the second line output by the first command. If it is not, the link local address has changed somehow since IPV6CP came up, which may well mean IPv6 is completely broken until the interface has been destroyed and rebuilt (Disconnect and then Connect in Status->Interfaces will do the trick).
This command also displays the nd6 options, which should include ACCEPT_RTADV.
The third command displays any running dhcp6c and rtsold commands. Assuming you are using DHCP6 and/or DHCP-PD, you should have one dhcp6c process for pppoe0 and no rtsold processes. If rtsold is still running, pfSense hasn't managed to get an RA from the remote end, so dhcp6c will not have been triggered. If there is more than one dhcp6c running, it is possible that the new ppp-ipv6 script has called interface_dhcpv6_configure() when it had been called by interface_configure(), which suggests one of the two changes I mooted above should be made (having a flag in /tmp or my preferred option of not calling interface_dhcpv6_configure() in interface_configure() for a PPPoE interface and changing the call to interface_dhcpv6_configure() in ppp-ipv6 to be unconditional).
If you do have more than one dhcp6c process for pppoe0, it might be worth seeing if you can simply kill off the mess and restart DHCPv6 and DHCP-PD using:
/usr/local/sbin/ppp-ipv6 pppoe0 down ; pkill -f 'dhcp6c .* pppoe0' ; sleep 2 ; /usr/local/sbin/ppp-ipv6 pppoe0 upThe fourth command displays the last section of the DHCP log that relates to dhcp6c. Unfortunately this isn't very verbose, as dhcp6c is called without verbose debugging turned on. It would help if those affected changed the verbosity - edit the call to /usr/local/sbin/dhcp6c around line 3560 of /etc/inc/interfaces.inc to start /usr/local/sbin/dhcp6c -D instead of /usr/local/sbin/dhcp6c -d.
If there are problems with IPv6, I wonder whether it is to do with interface_dhcpv6_configure() returning before SLAAC, DHCPv6 and DHCP-PD have completed. This means various services are configured at the end of interface_configure() without IPv6 necessarily being active. If DHCPv6 and/or DHCP-PD later completes, /etc/rc.newwanipv6 is called by dhcp6c's script, which should sort out any problems.
However, if SLAAC is in use without DHCPv6 and/or DHCP-PD, I can see the possibility that interface_configure() will complete before SLAAC has completed and there is nothing to call /etc/rc.newwanipv6. This would be an unusual configuration, as normally domain name servers are received via DHCPv6. However problems appear possible if SLAAC is in use and DHCPv6 is inactive, which might be the case if RDNSS is in use.
I can only endorse the desire to design and implement an entirely new architecture! If there was a clearly defined API to an abstract interface object, with a system for callbacks on certain events including address changes, this would be much cleaner. The current maze of helper scripts and the lack of clear modularisation creates all manner of possible race and error conditions.</interface></interface>
-
Any thoughts on this? I had a look at the code for rc.newwanipv6 but the route change command must be called by an external library as it doesn't seem to be there.
This is unrelated to the issue we were discussing, but it is worth thinking through.
The code in question is in the system_resolvconf_generate() function in /etc/inc/system.inc.
I'm wondering if a piece of code is getting confused by the PPPoE connection as the routing table has fe80::xxx:xxxx:xxxx:bc00%pppoe0 as the default route gateway not just fe80::xxx:xxxx:xxxx:bc00.
As fe80::/10 addresses are link local, they only have meaning in a single scope. %pppoe0 is the correct syntax to indicate that scope.
It looks as if the IPv6 nameservers your ISP is returning are not reachable via the IPv6 gateway returned via IPv6CP (which will have a link local address). Can you ping6 the addresses?
-
It looks as if the IPv6 nameservers your ISP is returning are not reachable via the IPv6 gateway returned via IPv6CP (which will have a link local address). Can you ping6 the addresses?
Yes I can ping the DNS servers. I figured out what it was though and it wasn't related to the 2.2.5 fixes specifically. I had those DNS servers manually entered in the System - General Setup - DNS Servers list and somehow had forced them to use the WAN_DHCP6 gateway as the gateway for the traffic to them. Once I removed that the errors are gone from the logs.
The only issue I have now is that for the last 20 hours or so access via IPV6 to only the pfSense websites (www, blog, packages, doc, etc) are all very slow and timeout frequently. This are the only IPV6 sites that I am having trouble with. Everything was working fine after the upgrade to 2.2.5 so I don't think its specifically related to that. Given no one else seems to be having issues I'm not sure what is going on. I no longer have any errors in the system logs and have verified no traffic to the pfSense IPs is being blocked. This occurs from inside the network but also affects the pfSense GUI as well. If I check "Prefer IPv4" in System - Advanced Settings or connect from a machine with only IPv4 then everything is fine.
-
The only issue I have now is that for the last 20 hours or so access via IPV6 to only the pfSense websites (www, blog, packages, doc, etc) are all very slow and timeout frequently.
Yeah it's been completely broken for about a day now. Certainly not related to upgrade.
-
Yeah it's been completely broken for about a day now. Certainly not related to upgrade.
Thanks! I'll stop trying to troubleshoot. :D
-
I too am having problems with IPv6 in 2.2.5. My previously rock solid IPv6 connection is now disconnecting after only a few hours.
Different from others in this thread, I'm not on PPPoE. I just have a normal Comcast connection.
In that case, the new code I contributed to 2.2.5 cannot be to blame, as it is PPPoE specific.
You could usefully try:
ps -auwwx | grep -E -e '(dhcp6c|rtsold)'
clog /var/log/dhcpd.log | grep dhcp6c | tail -n 40As explained above, the first command displays details of any running dhcp6c and rtsold processes, whilst the second one displays the last 40 lines of available dhcp6c related messages.
You should have one dhcp6c process for every interface that uses DHCP6 and/or DHCP-PD. Likely, this will only be your WAN interface.
I have seen dhcp6c disappear once on my WAN interface in 2.2.5, apparently dying silently, so I have no idea whether that was a one-off or not.
-
…
There is a nasty 'immediately after boot' issue with IPv6 PPPoE that I never got round to characterising and reporting, which is not caused by the new code in 2.2.5 but might interact negatively with it. On my system at least, with the PPPoE parent interface on a vlan, the MAC address of the PPPoE interface is random for the first connection after boot, resulting in a random link local address for the PPPoE interface. Once the system is booted, this settles down to a predictable MAC address based on the system hardware and PPPoE works reliably. It is possible that the new code in 2.2.5 allows the random link local address to be used for IPV6CP, but the predictable link local address for dhcp6c, which will likely result in strange IPv6 brokenness.The workround for this issue is to Disconnect the PPPoE interface in Status -> Interfaces, then Connect it again. All will then work properly until the system is next booted.
...I "suffer" from this phenomenon too…
Could you tell if [System: Advanced: System Tunables](net.inet6.ip6.use_tempaddr OR net.inet6.ip6.prefer_tempaddr) is/should be related and should be a fix ?
I tested and changing the value to '0' has no effect.
-
@hda:
On my system at least, with the PPPoE parent interface on a vlan, the MAC address of the PPPoE interface is random for the first connection after boot, resulting in a random link local address for the PPPoE interface. Once the system is booted, this settles down to a predictable MAC address based on the system hardware and PPPoE works reliably.
I "suffer" from this phenomenon too…
Is your PPPoE parent interface a vlan or a physical interface?
@hda:
Could you tell if [System: Advanced: System Tunables](net.inet6.ip6.use_tempaddr OR net.inet6.ip6.prefer_tempaddr) is/should be related and should be a fix ?
I tested and changing the value to '0' has no effect.
Those two sysctls control IPv6 Privacy Extensions (see RFC 4941 for details). I'm pretty certain they are not involved here, especially as their default value under FreeBSD is 0. If these sysctls were set to 1, the link local address would always be generated using privacy extensions.
I instrumented up interface_ppps_configure() and have verified that the MAC addresses of the parent interface (the VLAN) and the parent of the parent interface (the physical interface) are set as I expect and the ngctl msg <parent interface="">: setautosrc 1 call made by interface_ppps_configure() has succeeded. I've also verified the two sysctls you mention are set to 0, as I expect.
I added a six second delay in interface_ppps_configure() just before the call to invoke mpd5 if the system is booting, but that didn't correct the behaviour.
I have a suspicion that the IPv6CP code in mpd5 is responsible for this problem. The current code arguably goes against the spirit of section 4.1 of RFC 5072, which notes (emphasis added):
The non-zero value of the tentative interface identifier SHOULD be chosen such that the value is unique to the link and, preferably, consistently reproducible across initializations of the IPV6CP finite state machine (administrative Close and reOpen, reboots, etc.). The rationale for preferring a consistently reproducible unique interface identifier to a completely random interface identifier is to provide stability to global scope addresses (see Appendix A) that can be formed from the interface identifier.
On boot, /etc/rc.bootup calls interfaces_configure() in /etc/inc/interfaces.inc. This walks through the configured interfaces, initialising them sequentially. It may well be that the WAN interface is configured before any other interfaces on the machine are up, which is significant as CreateInterfaceID() in ipv6cp.c calls GetEther(NULL, &hwaddr) in an attempt to discover a hardware address to base the interface identifier on before falling back to a random interface identifier.
If you look at the definition of GetEther() in util.c, it ignores interfaces that only have point-to-point or loopback addresses even if they have a MAC address (i.e. an EUI-48). This failure to use an available EUI-48 violates section 4.1 of RFC 5072 as the RFC requires the use of any EUI-64 or EUI-48 on the machine before falling back to other sources of uniqueness and then a random interface identifier. CreateInterfaceID() should use MAC addresses of interfaces that have no addresses at all.
I suspect the fix will require changing to CreateInterfaceID() to follow the RFC more closely, basing the interface ID on the first valid value from:
-
the MAC address(es) of any interface used as part of the PPP link (if any - PPPoE is not necessarily in use)
-
the MAC address(es) of any other interface on the machine
-
GetEther() (as now - though this is arguably redundant as all MAC addresses should be considered by the first two steps)
-
randomness (as now)</parent>
-
-
Is your PPPoE parent interface a vlan or a physical interface?
Yes, a physical interface in my case.
…
I suspect the fix will require changing to CreateInterfaceID() to follow the RFC more closely,...OK, I will continue my procedure, ever since 2.2.x, to disconnect/connect once every time after a reboot/boot of pfSense (until your recommendation will lead to code change).
Doing so, my IPv6 ISP lease will be continued every hour and not terminated after 2 hours (remarkably, the IPv4 has no problems with hourly re-lease).Thank you for the clear insight and references in the case.
-
Thank you for the clear insight and references in the case.
David_W, many many thanks from myself too for the advice you have shared here and in the Zen forums. I think I have pfSense working with IPv6 and Zen, although I need to go through everything to be one thousand percent sure I certainly would not have got this far without your taking the time to discuss your setup.
-
@hda:
I suspect the fix will require changing to CreateInterfaceID() to follow the RFC more closely,…
OK, I will continue my procedure, ever since 2.2.x, to disconnect/connect once every time after a reboot/boot of pfSense (until your recommendation will lead to code change).
Doing so, my IPv6 ISP lease will be continued every hour and not terminated after 2 hours (remarkably, the IPv4 has no problems with hourly re-lease).I've pretty much proved my diagnosis that the problem is my earlier supposition about CreateInterfaceID() by proving you can work round the problem by assigning a bogon IPv4 address to the PPPoE parent interface temporarily at boot.
Install the Shellcmd package and create an new entry:
| Command | sh -c 'ifconfig igb0 inet 192.0.2.248/31 alias > /dev/null 2>&1 ; sleep 120 ; ifconfig igb0 inet 192.0.2.248/31 -alias > /dev/null 2>&1' >/dev/null 2>/dev/null & |
| Shellcmd type | earlyshellcmd |
| Description | Temporarily assign a bogon (RFC 5737) IPv4 address to an interface to ensure sane IPv6CP interface identifier allocation immediately after boot - https://forum.pfsense.org/index.php?topic=101967.0 |You will need to change igb0 (twice) to to the interface that is normally used to set the interface identifier. You should be able to recognise this interface from its MAC address.
When you reboot, the IPv6 WAN address should be the same on the initial connection as on subsequent connections.
When I have the time, I will open a redmine bug about this issue.
-
…
When you reboot, the IPv6 WAN address should be the same on the initial connection as on subsequent connections.Yes, Confirmed 1st step after reboot. Great temporary solution, thanks David.
Will report again after 1 and 2 hr uptime.(Alix-box on 2.2.6)
-
Well, maybe half a solution :)
No IPv6 lease renewal after 1 hr and not after the 2hr limit, then connection is gone as usual in such case (for IPv6 only).Then:
Do Status-Interfaces PPPoE-Disconnect then Diagnostic-Command Prompt(ps ax | grep dhcp6c) :: still the dhcp6c PID there ! (bad)
Do Diagnostic-Command Prompt(kill -9 $PID) :: OKDo Status-Interfaces PPPoE-Connect :: Diagnostic-Command Prompt(ps ax | grep dhcp6c) - new PID dhcp6c, but address change from WAN-MAC to LAN-MAC !?
Check back after 1 & 2 hr if lease renewal is reported in System Logs.
-
1 hr, no lease renewal. Wait for definite 2hr…. No and lost IPv6 as expected.
-
@hda:
Well, maybe half a solution :)
No IPv6 lease renewal after 1 hr and not after the 2hr limit, then connection is gone as usual in such case (for IPv6 only).Then:
Do Status-Interfaces PPPoE-Disconnect then Diagnostic-Command Prompt(ps ax | grep dhcp6c) :: still the dhcp6c PID there ! (bad)
Do Diagnostic-Command Prompt(kill -9 $PID) :: OKDo Status-Interfaces PPPoE-Connect :: Diagnostic-Command Prompt(ps ax | grep dhcp6c) - new PID dhcp6c, but address change from WAN-MAC to LAN-MAC !?
Check back after 1 & 2 hr if lease renewal is reported in System Logs.
The temporary work-round I posted earlier today only fixes the random IPv6CP interface identifier after boot. That problem is now clearly characterised. Leave the temporary work-round in place pending a more permanent solution. I'm thinking of implementing this work-round in interfaces_configure() as the next stage, because reimplementing mpd5's CreateInterfaceID() to follow section 4.1 of RFC 5072 more closely is a longer term project.
The ongoing symptoms you are describing relate to the issue we were discussing earlier in the thread. I suspect I've already identified the fix:
If I was to change /etc/inc/interfaces.inc, I would suggest changing interface_configure() so that it does not call interface_dhcpv6_configure() for a PPPoE interface, allowing ppp-ipv6 to call interface_dhcpv6_configure() unconditionally once mpd5 signals IPv6 link up.
Edit to add: On further reflection, I'd make this change for all PPP interfaces, not just PPPoE.
I tried hard not to change /etc/inc/interfaces.inc, because my local systems run with the RFC 4638 patch, which changes /etc/inc/interfaces.inc in various places. I hope that the pull request to merge the RFC 4638 patch into pfSense 2.3 will be looked at soon.
I suspect that dhcp6c is somehow starting twice on your system following initial boot-up, presumably from a race condition. I know someone e-mailed me some debugging output a while back, and I can't remember whether it was you. In any event, I wonder if you would be kind enough to reboot and send me the output of the debugging commands I gave earlier in the thread (PM preferred):
clog /var/log/ppp.log | grep -A 1 -E -e 'IPV6CP: LayerUp' | tail -n 2
ifconfig pppoe0 inet6 | grep -E -e '( fe80::|nd6)'
ps -auwwx | grep -E -e '(dhcp6c|rtsold)'
clog /var/log/dhcpd.log | grep dhcp6c | tail -n 40Your interface appears from your screen shots to be pppoe1 - you will need to make the appropriate substitution.
If the expected lease renewal doesn't happen or you note there is more than one dhcp6c process running on the pppoe1 interface, try:
/usr/local/sbin/ppp-ipv6 pppoe0 down ; pkill -xf '^.*dhcp6c.*pppoe0$' ; sleep 2 ; /usr/local/sbin/ppp-ipv6 pppoe0 upAgain, if the problem is with pppoe1, make the three substitutions.
-
…
I suspect that dhcp6c is somehow starting twice on your system following initial boot-up, presumably from a race condition. I know someone e-mailed me some debugging output a while back, and I can't remember whether it was you. In any event, I wonder if you would be kind enough to reboot and send me the output of the debugging commands I gave earlier in the thread (PM preferred):
...I can confirm this all and have sent you the data. Thanks sofar. :)
EDIT:
/usr/local/sbin/ppp-ipv6 pppoe0 down ; pkill -xf '^.*dhcp6c.*pppoe0$' ; sleep 2 ; /usr/local/sbin/ppp-ipv6 pppoe0 up
This does the jobs on console shell as root, not with GUI command prompt as admin. Did it immediately after a reboot, keeps the addresses same (..:b371), makes an entry in system.log for rc.newwanipv6 and yeah after 1hr there is a renewal of the lease as I expect and know to work out OK until the next (re)boot. Thanks again David !