• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Backup node taking over CARP Virtual IP

Scheduled Pinned Locked Moved HA/CARP/VIPs
11 Posts 2 Posters 1.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    Derelict LAYER 8 Netgate @jypsilantis
    last edited by Derelict Mar 17, 2021, 3:51 AM Mar 17, 2021, 3:50 AM

    @jypsilantis CARP/pfSense HA is incompatible with dynamic addresses like those obtained via DHCP. I would say if it ever worked it was a fluke.

    In a normal CARP setup the only reason a VIP in the BACKUP state would go MASTER is if that interface stopped receiving "better" advertisements from the MASTER node.

    Chattanooga, Tennessee, USA
    A comprehensive network diagram is worth 10,000 words and 15 conference calls.
    DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
    Do Not Chat For Help! NO_WAN_EGRESS(TM)

    J 1 Reply Last reply Mar 17, 2021, 3:57 AM Reply Quote 0
    • J
      jypsilantis @Derelict
      last edited by jypsilantis Mar 17, 2021, 3:59 AM Mar 17, 2021, 3:57 AM

      @derelict thank you for the quick reply.

      My LAN NICs are set to static IP addresses and the same is happening on these as well. I can try changing over to statics on the WAN side as well, but I think it won't make much difference.

      The strange thing is that the backup is still showing "backup" even though it has control of the VIP.

      It is almost as if there is some kind of load balancing happening - the backup appears to be slightly less loaded on the most part compared to the primary.

      Everything seems to be working properly so I am not too concerned on that part, just a bit confusing when you try to log onto the active master and end up on the backup.

      1 Reply Last reply Reply Quote 0
      • D
        Derelict LAYER 8 Netgate
        last edited by Mar 17, 2021, 4:01 AM

        @jypsilantis In general, unless the primary node is in maintenance mode, all CARP VIPs on the primary should be MASTER and all CARP VIPs on the secondary should be BACKUP.

        If that is not the case the problem is generally a layer 2 / multicast/broadcast domain problem in the path between the nodes on that network.

        There is a sticky at the top of this category in which I attempted to explain the various parts of an HA cluster.

        Chattanooga, Tennessee, USA
        A comprehensive network diagram is worth 10,000 words and 15 conference calls.
        DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
        Do Not Chat For Help! NO_WAN_EGRESS(TM)

        J 1 Reply Last reply Mar 17, 2021, 4:10 AM Reply Quote 0
        • J
          jypsilantis @Derelict
          last edited by Mar 17, 2021, 4:10 AM

          @derelict thanks for this.

          I looked at the persistent article that you mentioned. The symptoms in my case are different - I do not have a master/master situation, so it looks like the nodes are correctly resolving and establishing priority orders.

          I have a managed switch on the LAN side, and the modem has handled CARP without missing a beat for at least 3 years now. The problem appears to have occurred concurrently with the upgrade to the latest version of pfsense that I installed a few days ago.

          D 1 Reply Last reply Mar 17, 2021, 4:14 AM Reply Quote 0
          • D
            Derelict LAYER 8 Netgate @jypsilantis
            last edited by Mar 17, 2021, 4:14 AM

            @jypsilantis Like I said, CARP is not compatible with interfaces that obtain their addressing from DHCP and never has been. I am probably misunderstanding what you actually have there.

            Chattanooga, Tennessee, USA
            A comprehensive network diagram is worth 10,000 words and 15 conference calls.
            DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
            Do Not Chat For Help! NO_WAN_EGRESS(TM)

            J 1 Reply Last reply Mar 17, 2021, 4:25 AM Reply Quote 0
            • J
              jypsilantis @Derelict
              last edited by Mar 17, 2021, 4:25 AM

              @derelict, one pair of interfaces (WAN) are on DHCP and the others (LAN) are true statically addressed (not DHCP pseudo static). The problem occurs on both.

              If DHCP were the issue on the WAN, I would see, for example

                                           CARP state             Ownership of VIP
              

              Primary LAN NIC PRIMARY Yes
              Backup LAN NIC. BACKUP No
              Primary WAN NIC BACKUP No
              Backup WAN NIC. PRIMARY Yes

              What I am actually seeing:

                                           CARP state             Ownership of VIP
              

              Primary LAN NIC PRIMARY No
              Backup LAN NIC. BACKUP Yes
              Primary WAN NIC PRIMARY No
              Backup WAN NIC. BACKUP Yes

              D 1 Reply Last reply Mar 17, 2021, 4:28 AM Reply Quote 0
              • D
                Derelict LAYER 8 Netgate @jypsilantis
                last edited by Mar 17, 2021, 4:28 AM

                @jypsilantis That doesn't make much sense. You might want to just post screen shots of the CARP status pages or, better, output from both nodes of ifconfig -vvvvma

                Some terminology so everyon'e on the same page: Nodes are primary/secondary, VIPs are MASTER/BACKUP.

                Chattanooga, Tennessee, USA
                A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                Do Not Chat For Help! NO_WAN_EGRESS(TM)

                J 1 Reply Last reply Mar 17, 2021, 7:07 AM Reply Quote 0
                • J
                  jypsilantis @Derelict
                  last edited by jypsilantis Mar 17, 2021, 7:37 AM Mar 17, 2021, 7:07 AM

                  @derelict thanks for this.

                  Here are some screenshots.

                  fw1.local is the primary member of the HA cluster, and fw2.local is the backup

                  The WAN-side NICs share address 10.1.0.10, which is presented by the active member to the broadband modem/router. The modem/router assigns "primary" IP addresses to each member via DHCP: 10.1.0.97 for fw1 and 10.1.0.85 for fw2

                  The LAN-side NICs share address 10.0.0.3. fw1 has an intrinsic static IP of 10.0.0.1 and fw2 has 10.0.0.2.

                  From the screenshots, you can see that fw2 is running backup CARP on both of its NICs, and conversely fw1 is running MASTER for both of its interfaces. As such, there is no split master/backup or dual master/master. These statuses appear to persist.

                  However, when I access the web interface via 10.0.0.3, I land on fw2. Similarly, the modem/router reports that fw2 has control of 10.1.0.10.

                  If I reboot the backup, then fw1 takes over 10.0.0.3 and I get to its web interface via this address. However, several minutes after fw2 comes back up, it resumes control of 10.0.0.3 and the status as per the screenshots returns.

                  This is counter intuitive, but strangely everything seems to be working fine in all other respects.

                  [edit: just noticed that the net mask for the LAN side CARP is wrong - should be /16. I have made the changes. However, no effect to the above behaviour, fw2 took over 10.0.0.3 shortly after reboot]

                  Screen Shot 2021-03-17 at 3.30.16 pm.png Screen Shot 2021-03-17 at 3.30.41 pm.png Screen Shot 2021-03-17 at 3.31.09 pm.png

                  D 1 Reply Last reply Mar 17, 2021, 12:36 PM Reply Quote 0
                  • D
                    Derelict LAYER 8 Netgate @jypsilantis
                    last edited by Mar 17, 2021, 12:36 PM

                    @jypsilantis You'll need to look at layer 2 and see what is happening with the CARP MAC address. Everything there looks fine. Be sure you're also not doing something like port forwarding the webgui connections around.

                    Chattanooga, Tennessee, USA
                    A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                    DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                    Do Not Chat For Help! NO_WAN_EGRESS(TM)

                    J 1 Reply Last reply Mar 24, 2021, 3:24 AM Reply Quote 0
                    • J
                      jypsilantis @Derelict
                      last edited by Mar 24, 2021, 3:24 AM

                      @derelict I may have found the problem. Possibly a corrupt or failing disk.

                      I replaced the disk on the backup node today, rebuilt and and restored configs from a previous (recent) backup file. Everything looks fine now.

                      I will keep monitoring in case the problem reoccurs, but it may be something as simple as this.

                      A really strange symptom if it is in fact a failing disk. SMART status was OK, so perhaps some corruption from the recent power outage that took out my primary firewall disk.

                      For anyone else who may experience this issue, try rebooting with the disk repair option, and/or change out the disk and rebuild/restore.

                      Thanks for your help and guidance.

                      1 Reply Last reply Reply Quote 0
                      11 out of 11
                      • First post
                        11/11
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                        This community forum collects and processes your personal information.
                        consent.not_received