Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    Backup node taking over CARP Virtual IP

    HA/CARP/VIPs
    2
    11
    95
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      jypsilantis last edited by jypsilantis

      I run a 1 x main/1 x standby HA cluster. Recently, I rebuilt the primary node following a disk failure. Replication and services all look OK, the cluster works as expected, but I have noticed that the backup occasionally takes over the LAN CARP IP whilst it is still showing that it is in backup mode from CARP's perspective. When I try to access the web interface via the CARP VIP address, I am directed to the backup

      Of particular interest, my OpenVPN endpoint still appears to work, even though the service is not running on the backup (when it holds the CARP VIP). So it looks as if the primary still has control of the VIP, but the behaviour of the web interface is counter intuitive.

      XMLRPC replication is set up from the primary to the backup only - no reverse configuration as per the documentation.

      I am running the latest stable release of the software on both nodes: "2.5.0-RELEASE"

      Any pointers or assistance would be really appreciated.

      (Ed: I have checked the WAN CARP VIP which is DHCP-allocated by my broadband mode,. The modem, reports that the WAN VIP is held by the backup as well. Nevertheless, the backup node is showing "backup" for both LAN and WAN CARP statuses)

      Derelict 1 Reply Last reply Reply Quote 0
      • Derelict
        Derelict LAYER 8 Netgate @jypsilantis last edited by Derelict

        @jypsilantis CARP/pfSense HA is incompatible with dynamic addresses like those obtained via DHCP. I would say if it ever worked it was a fluke.

        In a normal CARP setup the only reason a VIP in the BACKUP state would go MASTER is if that interface stopped receiving "better" advertisements from the MASTER node.

        J 1 Reply Last reply Reply Quote 0
        • J
          jypsilantis @Derelict last edited by jypsilantis

          @derelict thank you for the quick reply.

          My LAN NICs are set to static IP addresses and the same is happening on these as well. I can try changing over to statics on the WAN side as well, but I think it won't make much difference.

          The strange thing is that the backup is still showing "backup" even though it has control of the VIP.

          It is almost as if there is some kind of load balancing happening - the backup appears to be slightly less loaded on the most part compared to the primary.

          Everything seems to be working properly so I am not too concerned on that part, just a bit confusing when you try to log onto the active master and end up on the backup.

          1 Reply Last reply Reply Quote 0
          • Derelict
            Derelict LAYER 8 Netgate last edited by

            @jypsilantis In general, unless the primary node is in maintenance mode, all CARP VIPs on the primary should be MASTER and all CARP VIPs on the secondary should be BACKUP.

            If that is not the case the problem is generally a layer 2 / multicast/broadcast domain problem in the path between the nodes on that network.

            There is a sticky at the top of this category in which I attempted to explain the various parts of an HA cluster.

            J 1 Reply Last reply Reply Quote 0
            • J
              jypsilantis @Derelict last edited by

              @derelict thanks for this.

              I looked at the persistent article that you mentioned. The symptoms in my case are different - I do not have a master/master situation, so it looks like the nodes are correctly resolving and establishing priority orders.

              I have a managed switch on the LAN side, and the modem has handled CARP without missing a beat for at least 3 years now. The problem appears to have occurred concurrently with the upgrade to the latest version of pfsense that I installed a few days ago.

              Derelict 1 Reply Last reply Reply Quote 0
              • Derelict
                Derelict LAYER 8 Netgate @jypsilantis last edited by

                @jypsilantis Like I said, CARP is not compatible with interfaces that obtain their addressing from DHCP and never has been. I am probably misunderstanding what you actually have there.

                J 1 Reply Last reply Reply Quote 0
                • J
                  jypsilantis @Derelict last edited by

                  @derelict, one pair of interfaces (WAN) are on DHCP and the others (LAN) are true statically addressed (not DHCP pseudo static). The problem occurs on both.

                  If DHCP were the issue on the WAN, I would see, for example

                                               CARP state             Ownership of VIP
                  

                  Primary LAN NIC PRIMARY Yes
                  Backup LAN NIC. BACKUP No
                  Primary WAN NIC BACKUP No
                  Backup WAN NIC. PRIMARY Yes

                  What I am actually seeing:

                                               CARP state             Ownership of VIP
                  

                  Primary LAN NIC PRIMARY No
                  Backup LAN NIC. BACKUP Yes
                  Primary WAN NIC PRIMARY No
                  Backup WAN NIC. BACKUP Yes

                  Derelict 1 Reply Last reply Reply Quote 0
                  • Derelict
                    Derelict LAYER 8 Netgate @jypsilantis last edited by

                    @jypsilantis That doesn't make much sense. You might want to just post screen shots of the CARP status pages or, better, output from both nodes of ifconfig -vvvvma

                    Some terminology so everyon'e on the same page: Nodes are primary/secondary, VIPs are MASTER/BACKUP.

                    J 1 Reply Last reply Reply Quote 0
                    • J
                      jypsilantis @Derelict last edited by jypsilantis

                      @derelict thanks for this.

                      Here are some screenshots.

                      fw1.local is the primary member of the HA cluster, and fw2.local is the backup

                      The WAN-side NICs share address 10.1.0.10, which is presented by the active member to the broadband modem/router. The modem/router assigns "primary" IP addresses to each member via DHCP: 10.1.0.97 for fw1 and 10.1.0.85 for fw2

                      The LAN-side NICs share address 10.0.0.3. fw1 has an intrinsic static IP of 10.0.0.1 and fw2 has 10.0.0.2.

                      From the screenshots, you can see that fw2 is running backup CARP on both of its NICs, and conversely fw1 is running MASTER for both of its interfaces. As such, there is no split master/backup or dual master/master. These statuses appear to persist.

                      However, when I access the web interface via 10.0.0.3, I land on fw2. Similarly, the modem/router reports that fw2 has control of 10.1.0.10.

                      If I reboot the backup, then fw1 takes over 10.0.0.3 and I get to its web interface via this address. However, several minutes after fw2 comes back up, it resumes control of 10.0.0.3 and the status as per the screenshots returns.

                      This is counter intuitive, but strangely everything seems to be working fine in all other respects.

                      [edit: just noticed that the net mask for the LAN side CARP is wrong - should be /16. I have made the changes. However, no effect to the above behaviour, fw2 took over 10.0.0.3 shortly after reboot]

                      Screen Shot 2021-03-17 at 3.30.16 pm.png Screen Shot 2021-03-17 at 3.30.41 pm.png Screen Shot 2021-03-17 at 3.31.09 pm.png

                      Derelict 1 Reply Last reply Reply Quote 0
                      • Derelict
                        Derelict LAYER 8 Netgate @jypsilantis last edited by

                        @jypsilantis You'll need to look at layer 2 and see what is happening with the CARP MAC address. Everything there looks fine. Be sure you're also not doing something like port forwarding the webgui connections around.

                        J 1 Reply Last reply Reply Quote 0
                        • J
                          jypsilantis @Derelict last edited by

                          @derelict I may have found the problem. Possibly a corrupt or failing disk.

                          I replaced the disk on the backup node today, rebuilt and and restored configs from a previous (recent) backup file. Everything looks fine now.

                          I will keep monitoring in case the problem reoccurs, but it may be something as simple as this.

                          A really strange symptom if it is in fact a failing disk. SMART status was OK, so perhaps some corruption from the recent power outage that took out my primary firewall disk.

                          For anyone else who may experience this issue, try rebooting with the disk repair option, and/or change out the disk and rebuild/restore.

                          Thanks for your help and guidance.

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post

                          Products

                          • Platform Overview
                          • TNSR
                          • pfSense
                          • Appliances

                          Services

                          • Training
                          • Professional Services

                          Support

                          • Subscription Plans
                          • Contact Support
                          • Product Lifecycle
                          • Documentation

                          News

                          • Media Coverage
                          • Press
                          • Events

                          Resources

                          • Blog
                          • FAQ
                          • Find a Partner
                          • Resource Library
                          • Security Information

                          Company

                          • About Us
                          • Careers
                          • Partners
                          • Contact Us
                          • Legal
                          Our Mission

                          We provide leading-edge network security at a fair price - regardless of organizational size or network sophistication. We believe that an open-source security model offers disruptive pricing along with the agility required to quickly address emerging threats.

                          Subscribe to our Newsletter

                          Product information, software announcements, and special offers. See our newsletter archive to sign up for future newsletters and to read past announcements.

                          © 2021 Rubicon Communications, LLC | Privacy Policy