IPv6 and Carp: Can't access slave from a publicly addressed network, tentative IP.

npr · Jun 7, 2024, 9:53 AM

I have the following situation:

3 hosts, 'client', 'pf1', 'pf2'. Situation akin to the https://docs.netgate.com/pfsense/en/latest/recipes/high-availability.html example. It's a bit more complicated than that though; there are two switches on the 'local' side (four cables) connected via LAGG.

Getting DNS out of the picture, I ping these directly via IP address.

+------------+------+------+--------+------+
| src/target | pf1  | pf2  | client | VIP  |
+------------+------+------+--------+------+
| pf1v4      | OK   | OK   | OK     | OK   |
| pf2v4      | OK   | OK   | OK     | FAIL |
| pf1v6      | OK   | FAIL | OK     | OK   |
| pf2v6      | FAIL | FAIL | FAIL   | OK   |
| client4    | OK   | OK   | OK     | OK   |
| client6    | OK   | FAIL | OK     | OK   |
+------------+------+------+--------+------+

My slave pfSense (CARP slave) is not using its primary address to send pings when using IPv6, instead trying to use the CARP address while it is a backup. This makes this host unreachable over IPv6.

In fact, it' can't even ping itself, which rules out any firewall problems or really problems concerning the lagg/switching, as it appears the pfsense is confused about what its IP address is.

PING6(56=40+8+8 bytes) xxxx:xxxx:xxxx:1::1 --> xxxx:xxxx:xxxx:1::3
---- xxxx:xxxx:xxxx:1::3 ping6 statistics ----
3 packets transmitted, 0 packets received, 100.0% packet loss

When trying to ping the VIP, the situation is reversed; it can't find the master's ipv4 VIP, but it can (erroneously) point to itself for the ipv6 VIP.

I get the following result from using ifconfig lagg0.203;

lagg0.203: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
	description: VM_Internet
	options=80003<RXCSUM,TXCSUM,LINKSTATE>
	ether mm:mm:mm:mm:mm:mm
	inet xxx.xxx.xxx.125 netmask 0xffffffc0 broadcast xxx.xxx.xxx.127
	inet xxx.xxx.xxx.126 netmask 0xffffffc0 broadcast xxx.xxx.xxx.127 vhid 9
	inet6<linklocal>%lagg0.203 prefixlen 64 scopeid 0xd
	inet6 xxxx:xxxx:xxxx:1::3 prefixlen 64 tentative
	inet6 xxxx:xxxx:xxxx:1::1 prefixlen 64 vhid 10
	groups: vlan
	carp: BACKUP vhid 9 advbase 1 advskew 100
	      peer 224.0.0.18 peer6 ff02::12
	carp: BACKUP vhid 10 advbase 1 advskew 100
	      peer 224.0.0.18 peer6 ff02::12
	vlan: 203 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0
	media: Ethernet autoselect
	status: active
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

While the ipv4 is assigned, the ipv6 address remains 'tentative'. How can I find out why it remains stuck in this state?

Edit; After random investigation, kernel logs show these messages;

lagg0.203: a looped back NS message is detected during DAD for xxxx:xxxx:xxxx:1::3. Another DAD probes are being sent.

Yet there isn't any other host with this address. I can instead of using 3 as the network address use a random value, and the error starts occurring using the new address, showing that in fact it somehow is looping back to itself, yet only over ipv6. Is it getting confused when interface pf2.igb2 sends out a packet that is layer-2 routed via sw1, sw2 then received by pf2.igb3 when both are in lagg0?