Router solicitations not working on vlans (2.7.1-RC)

chill_out

Hi,

Running 2.7.1-RC with ipv6 working, and upgraded to this RC as the release note states that:

Fixed: IPv6 neighbor discovery protocol (NDP) fails in some cases #13423

However I still think this bug exists when one is using vlans on a vtnet interface as I don't see any response to solicitation events and clients have to wait for the periodic RA every couple of minutes.

For example ifmcstat for vtnet.10 shows no entry for ff02::1:ff00:1 (the solicited node multicast address) as per below, and this is also the case on all other interfaces except lo0.

vtnet0.10:
        inet 10.0.10.1
        igmpv3 rv 2 qi 125 qri 10 uri 3
                group 224.0.0.1 mode exclude
                        mcast-macaddr 01:00:5e:00:00:01
        inet6 fe80::e0d4:3fff:febd:39ef%vtnet0.10 scopeid 0x7
        mldv2 flags=2<USEALLOW> rv 2 qi 125 qri 10 uri 3
                group ff02::2%vtnet0.10 scopeid 0x7 mode exclude
                        mcast-macaddr 33:33:00:00:00:02
                group ff02::1:ff01:1%vtnet0.10 scopeid 0x7 mode exclude
                        mcast-macaddr 33:33:ff:01:00:01
                group ff01::1%vtnet0.10 scopeid 0x7 mode exclude
                        mcast-macaddr 33:33:00:00:00:01
                group ff02::2:1861:20ce%vtnet0.10 scopeid 0x7 mode exclude
                        mcast-macaddr 33:33:18:61:20:ce
                group ff02::2:ff18:6120%vtnet0.10 scopeid 0x7 mode exclude
                        mcast-macaddr 33:33:ff:18:61:20
                group ff02::1%vtnet0.10 scopeid 0x7 mode exclude
                        mcast-macaddr 33:33:00:00:00:01
                group ff02::1:ffbd:39ef%vtnet0.10 scopeid 0x7 mode exclude
                        mcast-macaddr 33:33:ff:bd:39:ef

chill_out

Replying to myself here as I had another look at the issue, and something else is at play here because if I reboot the pfsense instance everything works.

e.g. from another machine on the lan I can run:

$ rdisc6 eth0
Soliciting ff02::2 (ff02::2) on eth0...

Hop limit                 :           64 (      0x40)                
Stateful address conf.    :           No                             
Stateful other conf.      :          Yes                             
Mobile home agent         :           No                             
Router preference         :       medium                             
Neighbor discovery proxy  :           No                             
Router lifetime           :         1800 (0x00000708) seconds        

...snip...

 from fe80::e0d4:3fff:febd:39ef

However after about 5 minutes post bootup, without touching anything, it just stops working:

$ rdisc6 eth0
Soliciting ff02::2 (ff02::2) on eth0...
Timed out.
Timed out.
Timed out.
No response.

Any ideas how to debug this?

N.B. Same experience when running 2.7.0-RELEASE too.

JKnott

@chill_out said in Router solicitations not working on vlans (2.7.1-RC):

However I still think this bug exists when one is using vlans on a vtnet interface as I don't see any response to solicitation events and clients have to wait for the periodic RA every couple of minutes.

Use Packet Capture, filtering on ICMP6, and post the capture file here. You can also use Wireshark on a computer attached to that VLAN.

chill_out

@JKnott thanks, will have a look later.

What I have observed is stooping and restarting radvd means solicitations start being responded to again for about 5 minutes, then it's back to no response.

chill_out

Something screwy is definitely going on here.

I have a test vlan setup with 3 machines on it. [A] the pfsense box, [B] a client sending Router Solicitations, and [C] a 3rd machine listening in on ff02:2. I run tcpdump on nodes A and C filtering for icmpv6 types 133 or 134, and see the below:

Scenario 1 - client (B) joins vlan and sends an RS just after pfsense (A) boots up

tcpdump on pfsense (A):

source fe80::5827:93a5:da8b:6dbc (client-b), destination ff02::2 (all routers), message "Router Solicitation"
source fe80::e0d4:3fff:febd:39ef (pfsense-a), destination fe80::5827:93a5:da8b:6dbc (client-b), message "Router Advertisement from pfsense"

tcpdump on 3rd machine (C):

source fe80::5827:93a5:da8b:6dbc (client-b), destination ff02::2 (all routers), message "Router Solicitation"

Everything is as expected. The RS is seen by all multicast members ff02::2 and pfsense responds with RA to the client directly when asked. The client gets a global ipv6 address immediately on joining the vlan.

Scenario 2 - client (B) joins vlan and sends an RS 10 minutes later after pfsense (A) boots up

tcpdump on pfsense (A):

nothing!

tcpdump on 3rd machine (C):

source fe80::5827:93a5:da8b:6dbc (client-b), destination ff02::2 (all routers), message "Router Solicitation"

This is strange, pfsense no longer dumps out the RS message even though others on the vlan can hear it, and hence it never responds with the RA. Client has no global ipv6 address.

Scenario 2 continued - client (B) waits until default announcement

tcpdump on pfsense (A):

source fe80::e0d4:3fff:febd:39ef (pfsense-a), destination ff02::1 (all nodes), message "Router Advertisement from pfsense"

tcpdump on 3rd machine (C):

source fe80::e0d4:3fff:febd:39ef (pfsense-a), destination ff02::1 (all nodes), message "Router Advertisement from pfsense"

I see the radvd timer send the periodic announcement to all nodes, and at this point the client finally gets a global ipv6 address.

Scenario 3 - stop radvd, start radvd, client sends RS again

When stopping radvd one can see ifmcstat doesn't show membership of ff02::2, and after starting it one can see membership is returned. (Did this reset something, a buffer, a queue, etc??)

tcpdump on pfsense (A):

source fe80::5827:93a5:da8b:6dbc (client-b), destination ff02::2 (all routers), message "Router Solicitation"
source fe80::e0d4:3fff:febd:39ef (pfsense-a), destination fe80::5827:93a5:da8b:6dbc (client-b), message "Router Advertisement from pfsense"

tcpdump on 3rd machine (C):

source fe80::5827:93a5:da8b:6dbc (client-b), destination ff02::2 (all routers), message "Router Solicitation"

Everything is back working as expected as per scenario 1 again for approximately 5 minutes, then it breaks down...

chill_out

Replying to myself here, but mystery solved.

Stepping back and thinking "multicast that stops working after a few minutes" well that's possibly a bridge forgetting membership and sure enough that was the case.

For anyone curious, it was a Proxmox bridge:

At radvd startup we see ff02::2 in the bridge:

root@proxmox:~# bridge mdb show
dev fwbr111i0 port fwln111i0 grp ff02::2 temp
dev fwbr111i0 port fwln111i0 grp ff12::8384 temp
dev fwbr111i0 port fwln111i0 grp ff02::1:ff31:7cc2 temp
dev fwbr111i0 port fwln111i0 grp ff02::fb temp
dev fwbr111i0 port fwln111i0 grp ff02::1:ff78:10d0 temp
dev fwbr111i0 port fwln111i0 grp ff02::1:ffdc:db7f temp
dev fwbr112i0 port fwln112i0 grp ff02::2 temp

Then 5 minutes later it is gone:

root@proxmox:~# bridge mdb show
dev fwbr111i0 port fwln111i0 grp ff12::8384 temp
dev fwbr111i0 port fwln111i0 grp ff02::1:ff31:7cc2 temp
dev fwbr111i0 port fwln111i0 grp ff02::fb temp
dev fwbr111i0 port fwln111i0 grp ff02::1:ff78:10d0 temp
dev fwbr111i0 port fwln111i0 grp ff02::1:ffdc:db7f temp

A quick test by turning off multicast filtering on that bridge had router solicitations working again immediately, so the smoking gun was identified.

jonatremoteeyes

@chill_out - I have the same issue (I think) - IPv6 was working fine until move to 2.7.1-RC (latest) - I've tried both existing and new KEA DHCP backend but MACOS (or maybe just al mobile (as they're all Apple) devices not getting IPV6 addresses

what did you change on PFsense (to maek it work again ) as I'm not aware of ever settign up Multicast filtering (don't know how to on Pfsense bar adding rules in firewall?) ?
thanks

jonatremoteeyes

@jonatremoteeyes @chill_out replying to myself too... - reocngised you weren't talking about PFsense setting - and checking out Multicast on my switches... - I had IGMP snooping enabled - so I disabled on both switches - toggled Wifi on a Macbook and now I've got IPv6 address again... - could you explain?

chill_out

@jonatremoteeyes said in Router solicitations not working on vlans (2.7.1-RC):

could you explain

My understanding is that with ipv6 there's no more broadcasts, everything is either unicast or multicast and the latter is used for activities like Neighbor Discovery which is where nodes get their information about their local router and can use that to perform SLAAC configuration etc.

What I was observing was Router Solicitation messages were not getting to the pfsense host (at least not after the initial 5 minutes), and when turning off multicast snooping on the bridge pfsense was connected via (which effectively floods every port on mutlicast activity) it all started working.

What's supposed to happen is the bridge/switch/etc maintains a table of which port is subscribed to what multicast address - in ipv6 this is the mdb on a linux bridge - and that is populated by a "multicast querier" that periodically asks each node what multicast they are interested in. I suspect there's an issue in the linux bridge setup when the queries are via 802.1q tagged interfaces, but that's just a hunch.

TL;DR turning off multicast snooping goes back to broadcasts on your lan for ipv6

JKnott

@chill_out said in Router solicitations not working on vlans (2.7.1-RC):

My understanding is that with ipv6 there's no more broadcasts, everything is either unicast or multicast

That is correct. The closest thing to a broadcast is the all hosts multicast. There are some differences, such as the scope can be specified and for some things, the hop count can be set to 255 as protection against a bogus packet being sent through a router.