HA CARP - IPv6 Two masters
-
I don't know of any fixes regarding this, but if you are rebooting these units you should be on 2.4.3_1.
https://www.netgate.com/docs/pfsense/highavailability/redundant-firewalls-upgrade-guide.html
-
I've rebuild these units from scratch since the move from 2.3.5 to 2.4.x. We had issues with other things as well, so I decided to re-do them from blank(new) devices.
Haven't had time for a real investigation yet.
I just did a dump and viewed on Wireshark. It's complaining that the packet if "malformed"
IPv4 for comparison:
At least by the looks of it, the backup unit is receiving the advertisements, but failing to release "master.
-
That's probably because it is not VRRP, it is CARP. They are similar but different protocols.
I say again: Right-click on one of the packets, Select Decode As, and tell it to decode protocol 112 as CARP.
It would be nice if Wireshark did that automatically but it doesn't.
-
Also, try this:
In Firewall > Virtual IPs, when you define your IPv6 CARP VIPs, do NOT use any capital hex digits. (Do not enter FE80::EC4:7AFF:FEAB:3724, enter fe80::ec4:7aff:feab:3724).
Also, do NOT enter any insignificant, leading zeroes in any of the colon-delimited groups (Do not enter fe80::0ec4:7aff:feab:3724, enter fe80::ec4:7aff:feab:3724).
Thinking there is a parsing problem in the CARP code there. It looks like they are synced OK but something is happening on the ifconfig on the secondary when the CARP VIP is added to the interface. The above workarounds should clear whatever is happening up. (That's why many people never see this because they invariably enter IP addresses the way that works.) ETA: There it is: https://redmine.pfsense.org/issues/6579
ETA: When I created both of these problematic IPv6 address formats, I had to reboot the secondary to get them to go back into BACKUP status after "correcting" the VIP definitions on the primary and they synced over. It looks like the VIPs cannot be properly reapplied after they are in the broken state. It might work if the VHIDs are manually removed from the interfaces with ifconfig but I didn't pursue it. A simple restart of the secondary node is generally a hitless event if configured correctly.
-
You were correct. The backup unit goes off into some broken state unless you reboot it. I had initially the short-hand notation, but it seemed to not make a difference. This explains a lot. Any chance You can elaborate on how to delete VIPs/CARPs with ifconfig? I could not find any examples.
For other people struggling with this, easiest way I found was to past the IP here. Use the "compressed format":
https://subnettingpractice.com/ipv6_subnetting.html
-
I would have to research it as well. The easiest thing to do is just reboot the secondary.
The primary seems unaffected by this and rebooting the secondary should be a non-event.
The FreeBSD ifconfig man page states:
Whenever a last address that refers to a particular vhid is removed from an interface, the vhid is automatically removed from interface and destroyed.
So this should not be possible. I could not find a way to delete that VHID after it was in this state short of a reboot. I didn't look really closely at it.
-
I'm seeing exactly this problem on 2 pairs of (vm) firewalls running 2.4.3-RELEASE-p1 (amd64) - carp works perfectly on v4, but v6 gets into master-master state. Rebooting the secondary solves it.
Both pairs are new builds with config restored.
Happy to provide any debug info that would help.
Thanks
Ed
-
Something related to the issues outlined above or something that works then spontaneously goes MASTER/MASTER?
-
It seems to be exactly the same symptoms, but I've checked there's no leading 0's in the ipv6 address, and it's all in lower case. It got into the master-master state when doing a failover to the secondary and back again.
-
You are going to have to provide more details. You might consider starting another thread since you are probably looking at a different problem, a layer 2 issue, or a misconfiguration.
-
@whisk0r Just lurking by... I have seen this behavior (WAN IPv6 on router2 left as Master) for a while. I've been using the general process:
upgrade router2
Enter Persistent Maintenance Mode on router1
upgrade router1
Leave Persistent Maintenance Mode...and router2 has the one IPv6 stuck on Master and needs a restart.
I do know it happened several times on 2.3.x and 2.4.x upgrades when we were running pfSense under VMs, under Virtuozzo. Possibly not every time. We have since installed two Netgate SG-4860, and our last ticket to upgrade to 2.4.3 (the only upgrade since the 4860s) didn't specifically say we had this issue then.
-
I never see that. You probably want to check that VIP for any of the issues described above.
-
Since I opened my mouth I felt obligated to test this tonight. I entered persistent maintenance mode a couple times and did not see issues switching back. So I suppose it might be related to our prior setup.
It didn't happen every time, but I'd say a majority of the time. Then again I seem to recall it happening occasionally just entering and leaving persistent maintenance mode so I don't think it's related to the upgrading process.
The VIPs are lower case and have no leading zero, however the LAN IP is "2607:xxxx:0:4c::1/64 (vhid: 154)" with a lone zero in there. Note it was the WAN IP that got stuck in dual Master (2607:xxxx::12/125 (vhid: 153)).
-
@derelict I have just experienced an interesting mutation of the issue https://redmine.pfsense.org/issues/6579 . My IPv6 CARP virtual address was ending with zero: fddf:c8:4011:13:: . Writing it exactly so was not possible in "Firewall / Virtual IPs / Edit" - I got the following error message:
The following input errors were detected: * The network address cannot be used for this VIP
so, I had to put down fddf:c8:4011:13::0 . It caused the described problem. After changing the CARP address to fddf:c8:4011:13::100 the problem went away.
I added this information also to the issue.
P.S. I am using the latest pfSense: 2.4.4-RELEASE-p2
-
This post is deleted! -
There is no network/broadcast address in IPv6. PREFIX::0/64 is a valid host address. It is possible there is a problem with a validation code in the gui.
-
@awebster that was exactly what i tought too!!!