Router solicitations not working on vlans (2.7.1-RC)
-
Hi,
Running 2.7.1-RC with ipv6 working, and upgraded to this RC as the release note states that:
Fixed: IPv6 neighbor discovery protocol (NDP) fails in some cases #13423
However I still think this bug exists when one is using vlans on a vtnet interface as I don't see any response to solicitation events and clients have to wait for the periodic RA every couple of minutes.
For example
ifmcstat
forvtnet.10
shows no entry forff02::1:ff00:1
(the solicited node multicast address) as per below, and this is also the case on all other interfaces exceptlo0
.vtnet0.10: inet 10.0.10.1 igmpv3 rv 2 qi 125 qri 10 uri 3 group 224.0.0.1 mode exclude mcast-macaddr 01:00:5e:00:00:01 inet6 fe80::e0d4:3fff:febd:39ef%vtnet0.10 scopeid 0x7 mldv2 flags=2<USEALLOW> rv 2 qi 125 qri 10 uri 3 group ff02::2%vtnet0.10 scopeid 0x7 mode exclude mcast-macaddr 33:33:00:00:00:02 group ff02::1:ff01:1%vtnet0.10 scopeid 0x7 mode exclude mcast-macaddr 33:33:ff:01:00:01 group ff01::1%vtnet0.10 scopeid 0x7 mode exclude mcast-macaddr 33:33:00:00:00:01 group ff02::2:1861:20ce%vtnet0.10 scopeid 0x7 mode exclude mcast-macaddr 33:33:18:61:20:ce group ff02::2:ff18:6120%vtnet0.10 scopeid 0x7 mode exclude mcast-macaddr 33:33:ff:18:61:20 group ff02::1%vtnet0.10 scopeid 0x7 mode exclude mcast-macaddr 33:33:00:00:00:01 group ff02::1:ffbd:39ef%vtnet0.10 scopeid 0x7 mode exclude mcast-macaddr 33:33:ff:bd:39:ef
-
Replying to myself here as I had another look at the issue, and something else is at play here because if I reboot the pfsense instance everything works.
e.g. from another machine on the lan I can run:
$ rdisc6 eth0 Soliciting ff02::2 (ff02::2) on eth0... Hop limit : 64 ( 0x40) Stateful address conf. : No Stateful other conf. : Yes Mobile home agent : No Router preference : medium Neighbor discovery proxy : No Router lifetime : 1800 (0x00000708) seconds ...snip... from fe80::e0d4:3fff:febd:39ef
However after about 5 minutes post bootup, without touching anything, it just stops working:
$ rdisc6 eth0 Soliciting ff02::2 (ff02::2) on eth0... Timed out. Timed out. Timed out. No response.
Any ideas how to debug this?
N.B. Same experience when running 2.7.0-RELEASE too.
-
@chill_out said in Router solicitations not working on vlans (2.7.1-RC):
However I still think this bug exists when one is using vlans on a vtnet interface as I don't see any response to solicitation events and clients have to wait for the periodic RA every couple of minutes.
Use Packet Capture, filtering on ICMP6, and post the capture file here. You can also use Wireshark on a computer attached to that VLAN.
-
@JKnott thanks, will have a look later.
What I have observed is stooping and restarting
radvd
means solicitations start being responded to again for about 5 minutes, then it's back to no response. -
Something screwy is definitely going on here.
I have a test vlan setup with 3 machines on it. [A] the pfsense box, [B] a client sending Router Solicitations, and [C] a 3rd machine listening in on
ff02:2
. I run tcpdump on nodes A and C filtering for icmpv6 types 133 or 134, and see the below:Scenario 1 - client (B) joins vlan and sends an RS just after pfsense (A) boots up
tcpdump on pfsense (A):
- source
fe80::5827:93a5:da8b:6dbc
(client-b), destinationff02::2
(all routers), message "Router Solicitation" - source
fe80::e0d4:3fff:febd:39ef
(pfsense-a), destinationfe80::5827:93a5:da8b:6dbc
(client-b), message "Router Advertisement from pfsense"
tcpdump on 3rd machine (C):
- source
fe80::5827:93a5:da8b:6dbc
(client-b), destinationff02::2
(all routers), message "Router Solicitation"
Everything is as expected. The RS is seen by all multicast members
ff02::2
and pfsense responds with RA to the client directly when asked. The client gets a global ipv6 address immediately on joining the vlan.Scenario 2 - client (B) joins vlan and sends an RS 10 minutes later after pfsense (A) boots up
tcpdump on pfsense (A):
- nothing!
tcpdump on 3rd machine (C):
- source
fe80::5827:93a5:da8b:6dbc
(client-b), destinationff02::2
(all routers), message "Router Solicitation"
This is strange, pfsense no longer dumps out the RS message even though others on the vlan can hear it, and hence it never responds with the RA. Client has no global ipv6 address.
Scenario 2 continued - client (B) waits until default announcement
tcpdump on pfsense (A):
- source
fe80::e0d4:3fff:febd:39ef
(pfsense-a), destinationff02::1
(all nodes), message "Router Advertisement from pfsense"
tcpdump on 3rd machine (C):
- source
fe80::e0d4:3fff:febd:39ef
(pfsense-a), destinationff02::1
(all nodes), message "Router Advertisement from pfsense"
I see the radvd timer send the periodic announcement to all nodes, and at this point the client finally gets a global ipv6 address.
Scenario 3 - stop radvd, start radvd, client sends RS again
When stopping
radvd
one can seeifmcstat
doesn't show membership offf02::2
, and after starting it one can see membership is returned. (Did this reset something, a buffer, a queue, etc??)tcpdump on pfsense (A):
- source
fe80::5827:93a5:da8b:6dbc
(client-b), destinationff02::2
(all routers), message "Router Solicitation" - source
fe80::e0d4:3fff:febd:39ef
(pfsense-a), destinationfe80::5827:93a5:da8b:6dbc
(client-b), message "Router Advertisement from pfsense"
tcpdump on 3rd machine (C):
- source
fe80::5827:93a5:da8b:6dbc
(client-b), destinationff02::2
(all routers), message "Router Solicitation"
Everything is back working as expected as per scenario 1 again for approximately 5 minutes, then it breaks down...
- source
-
Replying to myself here, but mystery solved.
Stepping back and thinking "multicast that stops working after a few minutes" well that's possibly a bridge forgetting membership and sure enough that was the case.
For anyone curious, it was a Proxmox bridge:
At
radvd
startup we seeff02::2
in the bridge:root@proxmox:~# bridge mdb show dev fwbr111i0 port fwln111i0 grp ff02::2 temp dev fwbr111i0 port fwln111i0 grp ff12::8384 temp dev fwbr111i0 port fwln111i0 grp ff02::1:ff31:7cc2 temp dev fwbr111i0 port fwln111i0 grp ff02::fb temp dev fwbr111i0 port fwln111i0 grp ff02::1:ff78:10d0 temp dev fwbr111i0 port fwln111i0 grp ff02::1:ffdc:db7f temp dev fwbr112i0 port fwln112i0 grp ff02::2 temp
Then 5 minutes later it is gone:
root@proxmox:~# bridge mdb show dev fwbr111i0 port fwln111i0 grp ff12::8384 temp dev fwbr111i0 port fwln111i0 grp ff02::1:ff31:7cc2 temp dev fwbr111i0 port fwln111i0 grp ff02::fb temp dev fwbr111i0 port fwln111i0 grp ff02::1:ff78:10d0 temp dev fwbr111i0 port fwln111i0 grp ff02::1:ffdc:db7f temp
A quick test by turning off multicast filtering on that bridge had router solicitations working again immediately, so the smoking gun was identified.
-
@chill_out - I have the same issue (I think) - IPv6 was working fine until move to 2.7.1-RC (latest) - I've tried both existing and new KEA DHCP backend but MACOS (or maybe just al mobile (as they're all Apple) devices not getting IPV6 addresses
what did you change on PFsense (to maek it work again ) as I'm not aware of ever settign up Multicast filtering (don't know how to on Pfsense bar adding rules in firewall?) ?
thanks -
@jonatremoteeyes @chill_out replying to myself too... - reocngised you weren't talking about PFsense setting - and checking out Multicast on my switches... - I had IGMP snooping enabled - so I disabled on both switches - toggled Wifi on a Macbook and now I've got IPv6 address again... - could you explain?
-
@jonatremoteeyes said in Router solicitations not working on vlans (2.7.1-RC):
could you explain
My understanding is that with ipv6 there's no more broadcasts, everything is either unicast or multicast and the latter is used for activities like Neighbor Discovery which is where nodes get their information about their local router and can use that to perform SLAAC configuration etc.
What I was observing was Router Solicitation messages were not getting to the pfsense host (at least not after the initial 5 minutes), and when turning off multicast snooping on the bridge pfsense was connected via (which effectively floods every port on mutlicast activity) it all started working.
What's supposed to happen is the bridge/switch/etc maintains a table of which port is subscribed to what multicast address - in ipv6 this is the mdb on a linux bridge - and that is populated by a "multicast querier" that periodically asks each node what multicast they are interested in. I suspect there's an issue in the linux bridge setup when the queries are via 802.1q tagged interfaces, but that's just a hunch.
TL;DR turning off multicast snooping goes back to broadcasts on your lan for ipv6
-
@chill_out said in Router solicitations not working on vlans (2.7.1-RC):
My understanding is that with ipv6 there's no more broadcasts, everything is either unicast or multicast
That is correct. The closest thing to a broadcast is the all hosts multicast. There are some differences, such as the scope can be specified and for some things, the hop count can be set to 255 as protection against a bogus packet being sent through a router.