One VLAN is master on both HA's??? Strange networking issue
-
This is a strange one.
Context: I have a number of VLANs. They are handled using a single trunk ethernet interfacing to a smart switch that breaks things out as needed. I have two hardware-identical boxes for HA/CARP.
I've got an issue I have never seen before.
My CARP is fine, except:
- While VLAN 19 is 100% fine on the primary
- The secondary thinks the primary host is down on that VLAN
- And therefore it too is Primary CARP for that VLAN :(
Running tcpdump at both ends of the link shows:
- the secondary is sending but not receiving packets on ONLY that VLAN
- all other VLANs are fine (and share the exact same cable)
I down/up'd that interface
Checked smart switch config (yes that VLAN is enabled)...
I even checked cables ;).Not sure how long ago this changed. I don't see the issue in any log so far.
Ideas MOST welcome!
-
@mrpete Sounds like the switches are dropping the CARP packets in one direction...
Lots of reasons why this could be happening...CARP is very similar to VRRP, in fact it uses the same Protocol ID (112) as VRRP (insert big discussion about why the overlap here), thus, if you have VRRP running on your switches and the VID for CARP and VRRP are the same, they will conflict with each other.
Thus, CARP VID must be unique and distinct from any VRRP VID.
Use tcpdump -T carp to see protocol 112 decoded as CARP and not VRRP.
CARP uses the same multicast address, 224.0.0.18 as VRRP as well. Make sure nothing else is using it.
Next if you are running IPv4 and IPv6, I've found it works best if each CARP Virtual IP uses a different VID, ie: different ones for IPv4 and IPv6,.
The VRRP/CARP VID is mapped into the sending MAC address as: 00:00:5e:00:xx:xx where xx:xx = VID number, so these need to be kept separate to prevent mayhem. This is true for both IPv4 and IPv6.
Check that you don't have any VLANs "short circuited" together or you'll also have issues because the IPv4 CARP packets are broadcast to the multicast address 01:00:5e:00:00:12 which will be seen by all devices reachable in the L2 broadcast domain. IPv6 CARP packets to 33:33:00:00:00:12 but for the same effect.
Finally, check that there isn't some sort of IGMP configured on the switches that is filtering the multicast packets sent to 224.0.0.18 on that the affected VLAN. -
@awebster Thanks for that good list.
I've not solved the problem so far... mostly have new questions.Progress:
- Confirmed the list doesn't reveal issues: Not using VRRP; vhid's are all unique; no shared use of 224.0.0.18; vlans not interlinked; not IGMP filtering.
- NOTE: underneath, on a (VLAN) interface where both Prim/Sec are Master, the primary sees secondary as Up, but secondary thinks primary is Down. Can't send packets from Secondary to Primary, period. :(
Additional lessons learned:
- This is a Very Dangerous problem. Any interface with two Masters means that both will receive and respond to LAN packets... thus destroying the integrity of various LAN communications. :(
- While testing, a second interface suddenly went into this "mode" of both being Master. :(
My temporary workaround for now: I've shut down my secondary HA machine until I have time and a strategy to diagnose or fully rebuild the setup.
One QUESTION: @awebster you wrote "Next if you are running IPv4 and IPv6, I've found it works best if each CARP Virtual IP uses a different VID, ie: different ones for IPv4 and IPv6." -- where is the Vid for ipv6 separately configurable? I don't find this.
-
@mrpete Curious problem to be sure...
Perhaps share output of ifconfig -a to have a look at what the underlying OS has actually got configured on the interfaces.In regards to the question about where you set the vhid, you do it in the Aliases when defining a CARP IP.
For example:
I also use BASE = 1, Skew = 0 on the primary, and Base = 1, Skew = 100 on the backup
You can use the command tcpdump -e -s0 -nn -i interface -T carp proto 112 command to look at the actual packets you're receiving / sending to ensure that the everything is working as expected.
You should see something similar to this:00:00:5e:00:01:xx > 01:00:5e:00:00:12, ethertype IPv4 (0x0800), length 70: Primary_REAL_IP > 224.0.0.18: CARPv2-advertise 36: vhid=xx advbase=1 advskew=0 authlen=7 counter=some_long_number
IPv6 is a little less interesting, it looks like this, but note that the source MAC address should be different between the IPv4 and IPv6 versions based on differing VHID values:
00:00:5e:00:01:yy > 33:33:00:00:00:12, ethertype IPv6 (0x86dd), length 90: fe80::link-local > ff02::12: ip-proto-112 36
You would only see one sender of the CARPv2-advertise packets, unfortunately since the REAL mac address is not revealed, you need to rely on the source IP address to determine if it is indeed the correct system sending the packets. Similarly with the IPv6 version you need to look at the link-local IP address on the actual interface (use ipconfig in the shell to see this)
Here is a dump of an interface on both my primary and backup systems, hopefully that will provide some clues:
In this setup:
VIP: nnn.mmm.208.74/24 VHID 174 and xxxx:yyyy:zzzz:e0d0::74/64 VHID 175
Primary: nnn.mmm.208.174/24 and xxxx:yyyy:zzzz:e0d0::174/64
Backup: nnn.mmm.208.175/24 and xxxx:yyyy:zzzz:e0d0::175/64em3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM> ether 00:50:56:a6:89:3c hwaddr 00:50:56:a6:89:3c inet6 fe80::250:56ff:fea6:893c%em3 prefixlen 64 scopeid 0x4 inet6 xxxx:yyyy:zzzz:e0d0::174 prefixlen 64 inet6 xxxx:yyyy:zzzz:e0d0::74 prefixlen 64 vhid 175 inet nnn.mmm.208.174 netmask 0xffffff00 broadcast nnn.mmm.208.255 inet nnn.mmm.208.74 netmask 0xffffff00 broadcast nnn.mmm.208.255 vhid 174 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active carp: MASTER vhid 174 advbase 1 advskew 0 carp: MASTER vhid 175 advbase 1 advskew 0 em3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM> ether 00:50:56:a6:1d:39 hwaddr 00:50:56:a6:1d:39 inet6 fe80::250:56ff:fea6:1d39%em3 prefixlen 64 scopeid 0x4 inet6 xxxx:yyyy:zzzz:e0d0::175 prefixlen 64 inet6 xxxx:yyyy:zzzz:e0d0::74 prefixlen 64 vhid 175 inet nnn.mmm.208.175 netmask 0xffffff00 broadcast nnn.mmm.208.255 inet nnn.mmm.208.74 netmask 0xffffff00 broadcast nnn.mmm.208.255 vhid 174 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active carp: BACKUP vhid 174 advbase 1 advskew 100 carp: BACKUP vhid 175 advbase 1 advskew 100
-
@mrpete This is invariably a switching issue. If the secondary does not receive the heartbeats sent from the primary it will think there is a failure and assume the MASTER role.
Even if the primary receives the resulting heartbeats from the secondary, it will remain MASTER too since it is advskew 0 and the secondary is advskew 100.
-
@derelict said in One VLAN is master on both HA's??? Strange networking issue:
@mrpete This is invariably a switching issue. If the secondary does not receive the heartbeats sent from the primary it will think there is a failure and assume the MASTER role.
Exactly, my thoughts are that there is MAC address confusion at the switching level hence the verification necessary to make sure there is no incorrect configs as they'd be very hard to spot given that the CARP packets don't emanate with the NIC's real MAC address.
-
@awebster Ah HA! Key to IPV6 CARP is you create TWO CARP Virtual IP's :)
-
@derelict Understood. What's so strange is that most VLAN's are working just fine and DO see the heartbeats.
I'm digging in on it further...
-
@mrpete Maybe try changing the VID on the problematic VLAN on both sides to see if that makes a difference since we know this will cause the source MAC address to change.
-
@awebster pfSense's tcpdump groks CARP. If you pcap for it you can generally tell primary from secondary advertisements by the advskew (0 and 100 respectively by default).
-
@awebster and @Derelict My problem: secondary does not see ANY packets from primary on that VLAN, period. This presumably has nothing to do with CARP??
Quite confusing to me, how a single VLAN on a trunked ethernet wire can be nonfunctional like that.
I'll soon rip into this at a more detailed level. Have a monitoring switch or two I can use to observe ... something... in the wire. ;)
-
@mrpete It must be something on that VLAN. Blocking multicast. Something.
-
Maybe your STP topology is different in that VLAN, so traffic goes on an unexpected path
-
Thanks all for the suggestions. Digging into it...