ipv6 broken: radvd: can't join ipv6-allrouters on <interface>
-
Having squashed a regression to 6RD a few days ago (See #9649). At least for the first day, I thought my IPv6 problems were solved. No such luck, looks like I have joined this party now.
I first eliminated the log spamming "IPv6 forwarding" message. In the pfSense version "check_ipv6_iface_forwarding" is stubbed out, forcing the log spam. I removed the call and log message in interface.c (could have easily been changed to a dlog call instead on retrospect).
I begin to receive the "can't join ipv6-routers" message after about 24 hours. Hoping to capture the cause, I restarted radvd with debug level 4. When the "can't join" message begins to appear, a debug message that my interface, em0 in my case, "em0 is ready" stops. That suggests to me that radvd is failing in the call to "setup_allrouters_membership" code patch of device-bsd44.c in the calls to setsockopt. Not sure why this is occurring after 24 hours yet. will have to modify the patch code to get more debugging information.
-
welcome to the party ! i was unable to say hello earlier due to force majeure ...
-
@rschell
I see only one user-defined setting that is exactly 24H, it's
AdvValidLifetime 86400;
I don't think it's related, but... -
this still not fixed.
-
@yon-0
To be fixed, there must be someone who will fix. I see only @rschell going deeper so far. -
@w0w I don't understand this technology, so I can only feedback the problem.
-
@w0w
not exactly as you can see here https://forum.netgate.com/post/847189
the test that rschell is doing as already been made by me at that time, there was nothing usefull to share, i found out the same.. you can prevent the log to appear but that would not solve the problem. -
@kiokoman said in ipv6 broken: radvd: can't join ipv6-allrouters on <interface>:
@w0w
not exactly as you can see here https://forum.netgate.com/post/847189
the test that rschell is doing as already been made by me at that time, there was nothing usefull to share, i found out the same.. you can prevent the log to appear but that would not solve the problem.So you think that debugging
@rschell said in ipv6 broken: radvd: can't join ipv6-allrouters on <interface>:
Not sure why this is occurring after 24 hours yet. will have to modify the patch code to get more debugging information.*does not get us closer to the solution? You already did that with no luck?
-
yes, i already did that with no luck but i'm no expert so maybe he have more luck/knowledge than me
-
I improved the logging in "setup_allrouters_membership" routine with:
/* XXX: See pfSense ticket #2878 */ if (setsockopt(sock, IPPROTO_IPV6, IPV6_LEAVE_GROUP, &mreq, sizeof(mreq)) < 0) { dlog(LOG_ERR, 4, "can't leave ipv6-allrouters on %s, failed: %s(%d)", iface->props.name, strerror(errno), errno); } if (setsockopt(sock, IPPROTO_IPV6, IPV6_JOIN_GROUP, &mreq, sizeof(mreq)) < 0) { flog(LOG_ERR, "can't join ipv6-allrouters on %s, failed: %s(%d)", iface->props.name, strerror(errno), errno); return (-1); }
This code has been running for about 15 hours so far. The result of the first setdockopt call every radvd cycle is:
"can't leave ipv6-allrouters on em0, failed: Can't assign requested address(49)"
so I'm not sure what that call is trying to accomplish in Ticket #2878, but it doesn't appear to do/result in anything in version 12 of FreeBSD. Have to dig deeper in the kernel I'm afraid.
The second setsockopt call hasn't produced an error yet, still 9 hours or so to go.
-
yeah i don't remember well but i think that the "address" in question was ff02::1 or something,
you can easily find out if you readtruss -p pidofradvd
-
The second setsockopt call error results in:
"can't join ipv6-allrouters on em0, failed: Too many references: can't splice(59)"
-
Might focus some attention on was has changed in:
sys/netinet6/in6_mcast.c
with the error codes EADDRNOTAVAIL and ETOOMANYREFS
-
my guess at that time was some kind of buffer overrun like it's opening sockets until there are so many that is unable to go on and it stop working and depending on the hardware it could happen after 4 / 8 / 24 hours
setsockopt(4,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x7fffffffd3e0,20) ERR#49 'Can't assign requested address' setsockopt(4,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x7fffffffd3e0,20) = 0 (0x0)
it is joining but not leaving everytime
this is from ip6 man page for freebsd:
IPV6_LEAVE_GROUP struct ipv6_mreq *
Drop membership from the associated multicast group. Memberships are automatically dropped when the socket is closed or when the process exits.this is what made me think that restart radvd would temporary solve the problem
but i don't understand a damn about c/c++ or coding
but again that could be completely unrelated -
I've just grabbed radvd 2.17 binary from earlier version of 2.5 and testing it on latest (2.5.0-DEVELOPMENT (amd64) built on Thu Sep 19 17:07:24 EDT 2019). At least no SPAM "IPv6 forwarding on interface seems to be disabled, but continuing anyway". Will wait another 48 hours.
-
A couple of thoughts:
-
The call to “setup_allrouters_membership” is being called several hundred times an hour. Should it be more selective when it asks to join a group rather than repeated leaving and joining? Think that is just asking for trouble we are seeing.
-
There are a number of upstream commits that have been applied to stable/12 since releng/12 in in6_mcast.c that suggest there are issues lurking there.
-
-
yes, indeed..
i'm sure it will be ok sooner or later, it's a development snapshot after all, we have our workaround in the meantime. they are aware of the problem and i'm using it at home so it's not a priority for me at the moment -
@kiokoman the problem is within open source or kernel code, so who are "they" who will fix it?
-
If radvd 2.17_5 works well, then the issue is isolated to radvd or radvd patches, I think. Let me test it and we will know it soon, am I right/wrong.
The fact that there is no spam inspires me with hope. -
sorry but if you check this
https://redmine.pfsense.org/issues/9577
you will see that we had 2.17 until 2019-07-22 with the same problem, it was updated to 2.18 at the end of july@irata i don't think that netgate will release a product with a broken service, they will find a way, or they ask upstream or with a patch they made