Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Feature request: on startup make CARP start initially for 60s in temporary maintenance mode

    Scheduled Pinned Locked Moved Routing and Multi WAN
    6 Posts 3 Posters 337 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • nzkiwi68N
      nzkiwi68
      last edited by

      The problem;
      With HA failover clustered pfSense;

      1. if the primary firewall reboots
      2. the backup firewall takes over quickly (good)
      3. But when the primary comes back, CARP starts up early in the startup sequence and usurps the backup fw (bad)
      4. The fail back takes quite a while because the firewall was ready to start packages like FRR and they take a while to come up, much longer for than if the firewall was already running

      Suggested Feature
      Allow us to set a timer function under such as:

      Always Startup CARP in temporary maintenance mode for xxx seconds

      How would this work?
      It would work great, because if only the master or backup firewall was running and no other firewall was present, even though during a reboot CARP would come up in temporary maintenance mode, it would still be instantly the CARP master because no other CARP was present.

      If the backup was present and running, the primary then has xxx seconds (I'm think that would be set at 60 or even longer, 300/600/900 seconds) to allow the primary to come up, stabilize and then take over CARP.

      Notes
      I know, if it's a "structured reboot" you put the primary into persistent CARP maintenance mode, then restart, then once up and all stable, then leave persistent CARP maintenance mode and fail back cleanly and quite fast.

      In a way, this feature mostly automates that.

      It protects against a number of scenarios that I have encountered

      • Rebooting the primary without putting it into persistent CARP maintenance mode (oops)
      • HA primary unintended reboot (hardware or power failure)
      • Firewall unstable hardware fault, reboots, runs for 2 minutes, crashes, reboots again (a longer timer will catch an unstable HA firewall and not allow it to take over, you could have that timer quite high at 600 or 900 seconds)

      Advanced Idea
      You could also have a fail back time option;

      Exit CARP temporary maintenance mode between hours xxxx - yyyy

      That way, no interruption until the middle of the night. But if the backup failed, since no other CARP would be present, it would instantly take over.

      1 Reply Last reply Reply Quote 0
      • JeGrJ
        JeGr LAYER 8 Moderator
        last edited by

        @nzkiwi68 said in Feature request: on startup make CARP start initially for 60s in temporary maintenance mode:

        The fail back takes quite a while because the firewall was ready to start packages like FRR and they take a while to come up, much longer for than if the firewall was already running

        I don't exactly know what your problem is with 3 and 4. Do you have problems when the master takes back it's services? We ware running a big cluster setup and have a multitude of VPN tunnels and OVPN Servers for remote access as well as Radius etc. and never had a problem with node1 (master) taking over after rebooting. What's your scenario that you'd find a timed take over so much faster/better?

        Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

        If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

        nzkiwi68N 1 Reply Last reply Reply Quote 0
        • dotdashD
          dotdash
          last edited by

          If your problem is with package behavior, (FRR?) perhaps it is better addressed as a feature request for FRR. Like JeGr, I haven't seen any problem with multi-wan clusters running standard functions such as IPSec and OpenVPN.

          1 Reply Last reply Reply Quote 0
          • nzkiwi68N
            nzkiwi68 @JeGr
            last edited by

            @JeGr

            Scenario 1 - primary and backup already running
            Reboot primary - backup takes over quickly

            Scenario 2 - primary starting up and backup already running
            Primary comes up, CARP takes over from backup firewall, then takes as long as 1-2 minutes for primary to take over FRR routing, VPNs etc.

            Perhaps it is an FRR type issue...

            1 Reply Last reply Reply Quote 0
            • JeGrJ
              JeGr LAYER 8 Moderator
              last edited by

              @nzkiwi68 said in Feature request: on startup make CARP start initially for 60s in temporary maintenance mode:

              Primary comes up, CARP takes over from backup firewall, then takes as long as 1-2 minutes for primary to take over FRR routing, VPNs etc.

              Nope, nothing the sort here and we run multiple packages on the nodes - but no FRR. OpenVPN, IPSEC, FreeRadius etc. have no problem whatsoever with primary coming back and taking over, seconds later the first VPN connections authenticating via FR are already connected again so I think that could very well point to FRR. As FRR (OSPF?) can take a bit to sort out any other peers, exchange routes etc. that could probably be the culprit - or something slowing the process down.

              Don't forget to upvote ๐Ÿ‘ those who kindly offered their time and brainpower to help you!

              If you're interested, I'm available to discuss details of German-speaking paid support (for companies) if needed.

              1 Reply Last reply Reply Quote 0
              • nzkiwi68N
                nzkiwi68
                last edited by

                I still got this issue, now I can replicate it easily at 2 completely sites, all 2.4.4_p3 and both using;

                • FRR and OSPF

                • list itemHA pair

                • list itemIPSEC VTI tunnels bound to a CARP IP address

                • list itemFRR set to fllow the lan CARP address (so FRR off on the backup firewall)

                Here's a continuous ping across the VPN from site A to site B.

                Reply from 10.10.40.1: bytes=32 time=4ms TTL=253
                Reply from 10.10.40.1: bytes=32 time=7ms TTL=253
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Reply from 10.10.40.1: bytes=32 time=4ms TTL=253
                Reply from 10.10.40.1: bytes=32 time=3ms TTL=253
                Reply from 10.10.40.1: bytes=32 time=4ms TTL=253
                Reply from 10.10.40.1: bytes=32 time=3ms TTL=253

                First timeed out, that's the primary firewall being rebooted, 4 pings lost and the backup completely takes over. Very acceptable. Excellent.

                Now the slow bit... The primary comes up, CARP takes over and takes ages for things to settle and go online.

                Reply from 10.10.40.1: bytes=32 time=3ms TTL=253
                Reply from 10.10.40.1: bytes=32 time=17ms TTL=253
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Request timed out.
                Reply from 10.10.40.1: bytes=32 time=3ms TTL=253
                Reply from 10.10.40.1: bytes=32 time=4ms TTL=253

                After digging, I think the cause is the VPN, IPSEC, it's just not getting released from the backup firewall in a timely manner, it seems to hold on and on and on and keeps running IPSEC VPN tunnels. I can speed up the fail back by logging onto the backup firewall and in IPSEC status stopping the IPSEC tunnels.

                I wonder if the issue is because my IPSEC tunnels are using a CARP IP address?

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.