IPv6: CARP VIP with Route Advertisements?

techstone

What is the proper way to combine a CARP VIP with IPv6 Route Advertisements?

For example, I've defined an IPv6 CARP VIP for a LAN interface and when I configured Router Advertisements for that LAN interface I made sure to choose the IPv6 VIP value in the "RA Interface" pulldown. My expectation was that the RA daemon would only advertise a virtual link-local address out of the interface on the MASTER node (as I've seen on recent OpenBSD nodes with the rad daemon). However, on my pfSense cluster I've observed that both the MASTER and BACKUP nodes still simultaneously send out RAs with their own link-local addresses. The result is that a Windows laptop on that LAN subnet ends up with two default routes and sometimes sends traffic towards the pfSense box that is BACKUP, which results in assymetric routing and some delays in connectivity until both the MASTER and BACKUP state tables sync up via pfsync.

Is there a way to use RAs and still make sure IPv6 traffic only goes to the MASTER device?

Thanks,
-Martin

junicast

I have discovered the same problem. What I did was to set the priority of the master to high and the priority of the slave to low in the RA settings. This way the traffic runs over the same node, but in case of failover the client is offline for like 30 seconds in my case.

What I don't understand is why I have CARP when the client won't use a virtual CARP IP as default gateway. My understanding of the purpose of CARP is to have a virtual IP that is being moved between hosts. There is of course a virtual IP but it is Unique Global and not Link Local.
I tried to configure a CARP link local and use that one for router adv but it does have no effect.

Can someone please explain why pfSense is acting this way or how to set it up properly. The way pfSense is acting now I don't have any improvement over a setup that does not use CARP at all, right? I could just place two pfSense next to each other and let both of them send router ads.
The documentation is lacking that part though.

junicast

I tested a bit more and I would like to share my results with you, just in case someone else has the same questions.

The radvd just runs on the master so the client only gets a link local default gateway from that one. Only in case of failover or manual CARP maintenance mode the slave will get the new master and start to send start radvd. When doing the maintenance mode the old master will send his router advertisement with a lifetime of 0, announcing himself dead. The interruption for the client in that scenario is very short. When the old master leaves maintenance mode the other node will also send an advertisement with lifetime of 0 so everything can get back to normal again.
On the other hand when a real problem occurs and e.g. the master just vanishes it takes some time, till the slave will be the new master, send his router advertisement so that the client learns his new route.

I hoped that pfSense would maybe be able to craft an virtual link local IP that is being used for router advertisement but that's not the case. I attempted several times to create a CARP link local IP address myself like fe80::aaaa:bbbb:cccc:dddd/64 but such an IP seems to be master on both nodes.

techstone

The radvd just runs on the master so the client only gets a link local default gateway from that one.

In my pfSense HA pair I actually have the radvd daemon running on both devices. tcpdump also shows RAs being sent from both boxes. Are you sure it's only running in one box in your case?

In the Services > DHCPv6 Server & RA > (network) > Router Advertisements screen the RA Interface field is set to the CARP VIP, and not the physical interface for the subnet; do you have the same setup?

junicast

Yes techstone, same setup here. I just made sure that radvd is only running on the master device. The radvd process just isn't there on the slave and when I look into Status > Services on the slave machine it has this red circle with white cross in it, indicating it's not running. ps on command line supports that.
Beyond that I witnessed that radvd stops when a node gets into backup state whilst starting radvd on the new master.
Are you also on 2.4.4?

techstone

I had already confirmed that radvd was running both boxes using ps waux | grep radvd on both boxes, but I checked Status > Services as you suggested and it's green for me on both boxes as well.

I'm currently running 2.4.3-patch1 on both boxes.

I looked at the code in src/etc/inc/services.inc (from latest master branch commit), namely the function services_radvd_configure() function, and I couldn't readily see any code in there that checks if the node is master or not before starting radvd (unlike OpenVPN, which only starts if the CARP VIP Master check is true). So based on that code it seem normal that I see radvd running on both my boxes. What I don't understand is why that's not your case.

I have my router mode set to Unmanaged - RA Flags [none], prefix Flags [onlink, auto, router]; do you have the same?

junicast

No, mine is set to Assisted.
Did you check what lifetime your slave sends? If it sends 0 then it isn't being used by the clients. If it sends something else it would mean the client might use the slave as gateway. I don't think this is a problem, since pfsync seems to sync in both directions (states) but that would mean it's master/master for IPv6 instead of what the admin would expect, i.e. master/slave.

techstone

Both the master and the slave send a lifetime of 30 seconds, which is in accordance with the value set for the AdvDefaultLifetime parameter in the automatically-generated /var/etc/radvd.conf on both boxes. However, I have set the Router Priority to Normal on the master and Low on the slave, so traffic normally always goes to the master. It's just the 30 second delay between the time the master goes down and the route disappears from the client PC that bugs me. At least with a setup where you can point to a CARP VIP (like in IPv4) and the VIP can move from the master to the slave in a split second that's a much faster failover time.