Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    One VLAN is master on both HA's??? Strange networking issue

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    14 Posts 4 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • MrPeteM
      MrPete
      last edited by

      This is a strange one.

      Context: I have a number of VLANs. They are handled using a single trunk ethernet interfacing to a smart switch that breaks things out as needed. I have two hardware-identical boxes for HA/CARP.

      I've got an issue I have never seen before.

      My CARP is fine, except:

      • While VLAN 19 is 100% fine on the primary
      • The secondary thinks the primary host is down on that VLAN
      • And therefore it too is Primary CARP for that VLAN :(

      Running tcpdump at both ends of the link shows:

      • the secondary is sending but not receiving packets on ONLY that VLAN
      • all other VLANs are fine (and share the exact same cable)

      I down/up'd that interface
      Checked smart switch config (yes that VLAN is enabled)...
      I even checked cables ;).

      Not sure how long ago this changed. I don't see the issue in any log so far.

      Ideas MOST welcome!

      awebsterA DerelictD 2 Replies Last reply Reply Quote 0
      • awebsterA
        awebster @MrPete
        last edited by

        @mrpete Sounds like the switches are dropping the CARP packets in one direction...
        Lots of reasons why this could be happening...

        CARP is very similar to VRRP, in fact it uses the same Protocol ID (112) as VRRP (insert big discussion about why the overlap here), thus, if you have VRRP running on your switches and the VID for CARP and VRRP are the same, they will conflict with each other.
        Thus, CARP VID must be unique and distinct from any VRRP VID.
        Use tcpdump -T carp to see protocol 112 decoded as CARP and not VRRP.
        CARP uses the same multicast address, 224.0.0.18 as VRRP as well. Make sure nothing else is using it.
        Next if you are running IPv4 and IPv6, I've found it works best if each CARP Virtual IP uses a different VID, ie: different ones for IPv4 and IPv6,.
        The VRRP/CARP VID is mapped into the sending MAC address as: 00:00:5e:00:xx:xx where xx:xx = VID number, so these need to be kept separate to prevent mayhem. This is true for both IPv4 and IPv6.
        Check that you don't have any VLANs "short circuited" together or you'll also have issues because the IPv4 CARP packets are broadcast to the multicast address 01:00:5e:00:00:12 which will be seen by all devices reachable in the L2 broadcast domain. IPv6 CARP packets to 33:33:00:00:00:12 but for the same effect.
        Finally, check that there isn't some sort of IGMP configured on the switches that is filtering the multicast packets sent to 224.0.0.18 on that the affected VLAN.

        –A.

        MrPeteM 1 Reply Last reply Reply Quote 0
        • MrPeteM
          MrPete @awebster
          last edited by

          @awebster Thanks for that good list.
          I've not solved the problem so far... mostly have new questions.

          Progress:

          • Confirmed the list doesn't reveal issues: Not using VRRP; vhid's are all unique; no shared use of 224.0.0.18; vlans not interlinked; not IGMP filtering.
          • NOTE: underneath, on a (VLAN) interface where both Prim/Sec are Master, the primary sees secondary as Up, but secondary thinks primary is Down. Can't send packets from Secondary to Primary, period. :(

          Additional lessons learned:

          • This is a Very Dangerous problem. Any interface with two Masters means that both will receive and respond to LAN packets... thus destroying the integrity of various LAN communications. :(
          • While testing, a second interface suddenly went into this "mode" of both being Master. :(

          My temporary workaround for now: I've shut down my secondary HA machine until I have time and a strategy to diagnose or fully rebuild the setup.

          One QUESTION: @awebster you wrote "Next if you are running IPv4 and IPv6, I've found it works best if each CARP Virtual IP uses a different VID, ie: different ones for IPv4 and IPv6." -- where is the Vid for ipv6 separately configurable? I don't find this.

          awebsterA 1 Reply Last reply Reply Quote 0
          • awebsterA
            awebster @MrPete
            last edited by

            @mrpete Curious problem to be sure...
            Perhaps share output of ifconfig -a to have a look at what the underlying OS has actually got configured on the interfaces.

            In regards to the question about where you set the vhid, you do it in the Aliases when defining a CARP IP.
            For example:
            717c6bf0-e5d2-4368-b3e7-ae618694d3ae-image.png
            8aa0af9b-7d8b-4fef-9544-c23ce3caf043-image.png

            I also use BASE = 1, Skew = 0 on the primary, and Base = 1, Skew = 100 on the backup

            You can use the command tcpdump -e -s0 -nn -i interface -T carp proto 112 command to look at the actual packets you're receiving / sending to ensure that the everything is working as expected.
            You should see something similar to this:

            00:00:5e:00:01:xx > 01:00:5e:00:00:12,  ethertype IPv4 (0x0800), length 70: Primary_REAL_IP > 224.0.0.18: CARPv2-advertise 36: vhid=xx  advbase=1 advskew=0 authlen=7 counter=some_long_number
            
            

            IPv6 is a little less interesting, it looks like this, but note that the source MAC address should be different between the IPv4 and IPv6 versions based on differing VHID values:

            00:00:5e:00:01:yy > 33:33:00:00:00:12, ethertype IPv6 (0x86dd), length 90: fe80::link-local > ff02::12: ip-proto-112 36
            

            You would only see one sender of the CARPv2-advertise packets, unfortunately since the REAL mac address is not revealed, you need to rely on the source IP address to determine if it is indeed the correct system sending the packets. Similarly with the IPv6 version you need to look at the link-local IP address on the actual interface (use ipconfig in the shell to see this)

            Here is a dump of an interface on both my primary and backup systems, hopefully that will provide some clues:
            In this setup:
            VIP: nnn.mmm.208.74/24 VHID 174 and xxxx:yyyy:zzzz:e0d0::74/64 VHID 175
            Primary: nnn.mmm.208.174/24 and xxxx:yyyy:zzzz:e0d0::174/64
            Backup: nnn.mmm.208.175/24 and xxxx:yyyy:zzzz:e0d0::175/64

            em3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
                    options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
                    ether 00:50:56:a6:89:3c
                    hwaddr 00:50:56:a6:89:3c
                    inet6 fe80::250:56ff:fea6:893c%em3 prefixlen 64 scopeid 0x4
                    inet6 xxxx:yyyy:zzzz:e0d0::174 prefixlen 64
                    inet6 xxxx:yyyy:zzzz:e0d0::74 prefixlen 64 vhid 175
                    inet nnn.mmm.208.174 netmask 0xffffff00 broadcast nnn.mmm.208.255
                    inet nnn.mmm.208.74 netmask 0xffffff00 broadcast nnn.mmm.208.255 vhid 174
                    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                    media: Ethernet autoselect (1000baseT <full-duplex>)
                    status: active
                    carp: MASTER vhid 174 advbase 1 advskew 0
                    carp: MASTER vhid 175 advbase 1 advskew 0
            		
            		
            em3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
                    options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM>
                    ether 00:50:56:a6:1d:39
                    hwaddr 00:50:56:a6:1d:39
                    inet6 fe80::250:56ff:fea6:1d39%em3 prefixlen 64 scopeid 0x4
                    inet6 xxxx:yyyy:zzzz:e0d0::175 prefixlen 64
                    inet6 xxxx:yyyy:zzzz:e0d0::74 prefixlen 64 vhid 175
                    inet nnn.mmm.208.175 netmask 0xffffff00 broadcast nnn.mmm.208.255
                    inet nnn.mmm.208.74 netmask 0xffffff00 broadcast nnn.mmm.208.255 vhid 174
                    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                    media: Ethernet autoselect (1000baseT <full-duplex>)
                    status: active
                    carp: BACKUP vhid 174 advbase 1 advskew 100
                    carp: BACKUP vhid 175 advbase 1 advskew 100	
            

            –A.

            1 Reply Last reply Reply Quote 0
            • DerelictD
              Derelict LAYER 8 Netgate @MrPete
              last edited by

              @mrpete This is invariably a switching issue. If the secondary does not receive the heartbeats sent from the primary it will think there is a failure and assume the MASTER role.

              Even if the primary receives the resulting heartbeats from the secondary, it will remain MASTER too since it is advskew 0 and the secondary is advskew 100.

              Chattanooga, Tennessee, USA
              A comprehensive network diagram is worth 10,000 words and 15 conference calls.
              DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
              Do Not Chat For Help! NO_WAN_EGRESS(TM)

              awebsterA MrPeteM 2 Replies Last reply Reply Quote 0
              • awebsterA
                awebster @Derelict
                last edited by

                @derelict said in One VLAN is master on both HA's??? Strange networking issue:

                @mrpete This is invariably a switching issue. If the secondary does not receive the heartbeats sent from the primary it will think there is a failure and assume the MASTER role.

                Exactly, my thoughts are that there is MAC address confusion at the switching level hence the verification necessary to make sure there is no incorrect configs as they'd be very hard to spot given that the CARP packets don't emanate with the NIC's real MAC address.

                –A.

                DerelictD 1 Reply Last reply Reply Quote 0
                • MrPeteM
                  MrPete
                  last edited by

                  @awebster Ah HA! Key to IPV6 CARP is you create TWO CARP Virtual IP's :)

                  1 Reply Last reply Reply Quote 0
                  • MrPeteM
                    MrPete @Derelict
                    last edited by

                    @derelict Understood. What's so strange is that most VLAN's are working just fine and DO see the heartbeats.

                    I'm digging in on it further...

                    awebsterA 1 Reply Last reply Reply Quote 0
                    • awebsterA
                      awebster @MrPete
                      last edited by

                      @mrpete Maybe try changing the VID on the problematic VLAN on both sides to see if that makes a difference since we know this will cause the source MAC address to change.

                      –A.

                      MrPeteM 1 Reply Last reply Reply Quote 0
                      • DerelictD
                        Derelict LAYER 8 Netgate @awebster
                        last edited by

                        @awebster pfSense's tcpdump groks CARP. If you pcap for it you can generally tell primary from secondary advertisements by the advskew (0 and 100 respectively by default).

                        Chattanooga, Tennessee, USA
                        A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                        DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                        Do Not Chat For Help! NO_WAN_EGRESS(TM)

                        1 Reply Last reply Reply Quote 0
                        • MrPeteM
                          MrPete @awebster
                          last edited by MrPete

                          @awebster and @Derelict My problem: secondary does not see ANY packets from primary on that VLAN, period. This presumably has nothing to do with CARP??

                          Quite confusing to me, how a single VLAN on a trunked ethernet wire can be nonfunctional like that.

                          I'll soon rip into this at a more detailed level. Have a monitoring switch or two I can use to observe ... something... in the wire. ;)

                          DerelictD 1 Reply Last reply Reply Quote 0
                          • DerelictD
                            Derelict LAYER 8 Netgate @MrPete
                            last edited by

                            @mrpete It must be something on that VLAN. Blocking multicast. Something.

                            Chattanooga, Tennessee, USA
                            A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                            DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                            Do Not Chat For Help! NO_WAN_EGRESS(TM)

                            R 1 Reply Last reply Reply Quote 0
                            • R
                              RobertK 1 @Derelict
                              last edited by RobertK 1

                              Maybe your STP topology is different in that VLAN, so traffic goes on an unexpected path

                              1 Reply Last reply Reply Quote 0
                              • MrPeteM
                                MrPete
                                last edited by

                                Thanks all for the suggestions. Digging into it...

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.