ipv6 broken: radvd: can't join ipv6-allrouters on <interface>
-
@w0w I don't understand this technology, so I can only feedback the problem.
-
@w0w
not exactly as you can see here https://forum.netgate.com/post/847189
the test that rschell is doing as already been made by me at that time, there was nothing usefull to share, i found out the same.. you can prevent the log to appear but that would not solve the problem. -
@kiokoman said in ipv6 broken: radvd: can't join ipv6-allrouters on <interface>:
@w0w
not exactly as you can see here https://forum.netgate.com/post/847189
the test that rschell is doing as already been made by me at that time, there was nothing usefull to share, i found out the same.. you can prevent the log to appear but that would not solve the problem.So you think that debugging
@rschell said in ipv6 broken: radvd: can't join ipv6-allrouters on <interface>:
Not sure why this is occurring after 24 hours yet. will have to modify the patch code to get more debugging information.*does not get us closer to the solution? You already did that with no luck?
-
yes, i already did that with no luck but i'm no expert so maybe he have more luck/knowledge than me
-
I improved the logging in "setup_allrouters_membership" routine with:
/* XXX: See pfSense ticket #2878 */ if (setsockopt(sock, IPPROTO_IPV6, IPV6_LEAVE_GROUP, &mreq, sizeof(mreq)) < 0) { dlog(LOG_ERR, 4, "can't leave ipv6-allrouters on %s, failed: %s(%d)", iface->props.name, strerror(errno), errno); } if (setsockopt(sock, IPPROTO_IPV6, IPV6_JOIN_GROUP, &mreq, sizeof(mreq)) < 0) { flog(LOG_ERR, "can't join ipv6-allrouters on %s, failed: %s(%d)", iface->props.name, strerror(errno), errno); return (-1); }
This code has been running for about 15 hours so far. The result of the first setdockopt call every radvd cycle is:
"can't leave ipv6-allrouters on em0, failed: Can't assign requested address(49)"
so I'm not sure what that call is trying to accomplish in Ticket #2878, but it doesn't appear to do/result in anything in version 12 of FreeBSD. Have to dig deeper in the kernel I'm afraid.
The second setsockopt call hasn't produced an error yet, still 9 hours or so to go.
-
yeah i don't remember well but i think that the "address" in question was ff02::1 or something,
you can easily find out if you readtruss -p pidofradvd
-
The second setsockopt call error results in:
"can't join ipv6-allrouters on em0, failed: Too many references: can't splice(59)"
-
Might focus some attention on was has changed in:
sys/netinet6/in6_mcast.c
with the error codes EADDRNOTAVAIL and ETOOMANYREFS
-
my guess at that time was some kind of buffer overrun like it's opening sockets until there are so many that is unable to go on and it stop working and depending on the hardware it could happen after 4 / 8 / 24 hours
setsockopt(4,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x7fffffffd3e0,20) ERR#49 'Can't assign requested address' setsockopt(4,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x7fffffffd3e0,20) = 0 (0x0)
it is joining but not leaving everytime
this is from ip6 man page for freebsd:
IPV6_LEAVE_GROUP struct ipv6_mreq *
Drop membership from the associated multicast group. Memberships are automatically dropped when the socket is closed or when the process exits.this is what made me think that restart radvd would temporary solve the problem
but i don't understand a damn about c/c++ or coding
but again that could be completely unrelated -
I've just grabbed radvd 2.17 binary from earlier version of 2.5 and testing it on latest (2.5.0-DEVELOPMENT (amd64) built on Thu Sep 19 17:07:24 EDT 2019). At least no SPAM "IPv6 forwarding on interface seems to be disabled, but continuing anyway". Will wait another 48 hours.
-
A couple of thoughts:
-
The call to “setup_allrouters_membership” is being called several hundred times an hour. Should it be more selective when it asks to join a group rather than repeated leaving and joining? Think that is just asking for trouble we are seeing.
-
There are a number of upstream commits that have been applied to stable/12 since releng/12 in in6_mcast.c that suggest there are issues lurking there.
-
-
yes, indeed..
i'm sure it will be ok sooner or later, it's a development snapshot after all, we have our workaround in the meantime. they are aware of the problem and i'm using it at home so it's not a priority for me at the moment -
@kiokoman the problem is within open source or kernel code, so who are "they" who will fix it?
-
If radvd 2.17_5 works well, then the issue is isolated to radvd or radvd patches, I think. Let me test it and we will know it soon, am I right/wrong.
The fact that there is no spam inspires me with hope. -
sorry but if you check this
https://redmine.pfsense.org/issues/9577
you will see that we had 2.17 until 2019-07-22 with the same problem, it was updated to 2.18 at the end of july@irata i don't think that netgate will release a product with a broken service, they will find a way, or they ask upstream or with a patch they made
-
@kiokoman
One thing that looks quite different for me is that radvd can't join routers right after start, not hours later as we have it now.EDITED: Oh yes, epic fail.
-
@kiokoman I agree, 2.5 will be left in beta until someone fixes it.
-
@rschell good work about that update on redmine
personally i had decided to wait for 2.5 to advance before trying to do anything as any patch that we come out today it will probably be lost resulting in a waste of time as it is a kernel problem and not a pfsense's fault -
Ipv6 problem also caused problems with FRR
-
@rschell
I see you have some progress on 12.1 builds, but you did not report back on redmine, is it working?