Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    CARP triggered on the BACKUP only without obvious reason

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    13 Posts 2 Posters 4.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • awebsterA Offline
      awebster
      last edited by

      In this thread https://forum.pfsense.org/index.php?topic=90758.0, Jimp gives some good advice, which I had also stumbled upon by trial and error.

      Set ADVBASE higher on the backup.

      Infact, there is a clue to this in the DHCP server Failover peer configuration

      Leave blank to disable. Enter the interface IP address of the other machine. 
      Machines must be using CARP. Interface's advskew determines whether the DHCPd process is Primary or Secondary. 
      Ensure one machine's advskew<20 (and the other is >20).
      

      –A.

      1 Reply Last reply Reply Quote 0
      • B Offline
        bennyc
        last edited by

        hmm, interesting tip. When I increase in the BACKUP for a certain CARP the Base setting to anything higher (like "22"), after a while (I assume a next sync?) it returns to value "1".
        And this leads also to the issue that my dhcpd failover states are in error. Until I change it back. And then after a while we go back to "1". (manual loop  ::)

        So either I'm doing it wrong, or there is something wrong?

        4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
        1x PC Engines APU2C4, 1x PC Engines APU1C4

        1 Reply Last reply Reply Quote 0
        • awebsterA Offline
          awebster
          last edited by

          @bennyc:

          hmm, interesting tip. When I increase in the BACKUP for a certain CARP the Base setting to anything higher (like "22"), after a while (I assume a next sync?) it returns to value "1".
          And this leads also to the issue that my dhcpd failover states are in error. Until I change it back. And then after a while we go back to "1". (manual loop  ::)

          So either I'm doing it wrong, or there is something wrong?

          Sorry, I forgot one critical detail.

          You must uncheck Synchronize Vitual IPs in the System -> High Avail. Sync, otherwise the MASTER will keep overwriting the ADVBASE value.
          This also means you must manually configure the VIP address on each box, initially check the Synchronize Vitual IPs when you do the setup, then uncheck it to go into production, and never check it again.

          I guess this is a bug, because ADVBASE, and SKEW should not be overwritten on sync.

          –A.

          1 Reply Last reply Reply Quote 0
          • B Offline
            bennyc
            last edited by

            First of all; thanks for your hints here!

            Ok, that makes sense. In meantime, this is driving me nuts.
            I disabled the sync for dhcpd. Then modified each CARP IP on the backup with a Adv. of 22. Then I had one failover group, that for no obvious reason remained in unknown state. Reboot, went back to settings, compaired dhcpd.conf from each member, they seemed ok.

            Restarted the dhcpd from the backup, all groups on my Master reported in recover, and on my backup they are all (but one) normal  :(
            Then restarted the dhcpd on the master, seems all groups on my Master are stuck in recover  >:(

            sigh… ::)

            4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
            1x PC Engines APU2C4, 1x PC Engines APU1C4

            1 Reply Last reply Reply Quote 0
            • awebsterA Offline
              awebster
              last edited by

              You are certain there are no duplicate CARP IDs or HSRP/VRRP sessions using the same CARP IDs?
              From shell, use tcpdump -s 0 -n -i <interface>-T carp proto 112
              Check each interface individually to be sure.</interface>

              –A.

              1 Reply Last reply Reply Quote 0
              • B Offline
                bennyc
                last edited by

                Went back to my initial state (full sync incl VIPs), removed even the failover ip's from dhcp's.

                Removed sync for CARP again. Then I updated the VIP CARP Adv. to 22 on the backup.

                Started the packet capture as requested. Set the FailOver IP again on the problematic interface (only here). Immediately the dhcp failover group shows on the backup, but also immediately in state "recover" and peer-state "unknown-state"

                I went over my config, I only have 7 CARP IP's, and they each have a unique hvid. Did a capture on both nodes, on the IF which is giving me troubles, and I only see CARPv2-advertise 36 for the configured vhid:
                22:50:58.137661 IP 10.100.200.251 > 224.0.0.18: CARPv2-advertise 36: vhid=200 advbase=1 advskew=0 authlen=7 counter=10858211938054085703

                Changed the vhid just to be sure, repeated the test (remove failover, add failover), same result.

                I must add, it justed dawned to me this IF happens to be the one connected on a Cisco stack with hsrp configured on it. It seems now a good idea for me to review it's config…

                4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
                1x PC Engines APU2C4, 1x PC Engines APU1C4

                1 Reply Last reply Reply Quote 0
                • B Offline
                  bennyc
                  last edited by

                  @awebster

                  Tnx for all hints here, solved the issue in meantime.
                  It required a reboot of the Master to get things running again, and a config error from my part for that last dhcp group.

                  4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
                  1x PC Engines APU2C4, 1x PC Engines APU1C4

                  1 Reply Last reply Reply Quote 0
                  • B Offline
                    bennyc
                    last edited by

                    @awebster:

                    I guess this is a bug, because ADVBASE, and SKEW should not be overwritten on sync.

                    Perhaps silly question, but did you report this? Seem indeed unwanted behavior that could be fixed as a bug…

                    4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
                    1x PC Engines APU2C4, 1x PC Engines APU1C4

                    1 Reply Last reply Reply Quote 0
                    • awebsterA Offline
                      awebster
                      last edited by

                      Thanks, here it is: https://redmine.pfsense.org/issues/5528

                      –A.

                      1 Reply Last reply Reply Quote 0
                      • B Offline
                        bennyc
                        last edited by

                        Great! (& tnx!) I would have done it otherwise, but credits go to you…

                        4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
                        1x PC Engines APU2C4, 1x PC Engines APU1C4

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.