Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    HA randomly BACKUP goes to MASTER state

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    21 Posts 4 Posters 4.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      B_IT @m4rek11
      last edited by B_IT

      @m4rek11 In earlier version 2.4.5 I saw sometimes (not always) that saving the rules from the firewall caused the change of one of several CARP interfaces from MASTER to BACKUP but it lasted a fraction of a second and did not cause any additional problems. This was only one, small issue with CARP I got with this older version.
      This Saturday I installed version 2.6.0 CE and I noticed that just restarting one of the firewalls causes several minutes of "agreeing" who is MASTER and who is BACKUP - got similar logs as you and @Przemyslaw85
      At first, I thought it is happening only when restarting one of 2 firewalls, because firewalls were working fine together for more than a day, but during the half of production day it happened again and do know why.

      Our firewall is two physical machines connected via two switches

      1 Reply Last reply Reply Quote 0
      • P
        Przemyslaw85 @m4rek11
        last edited by Przemyslaw85

        @m4rek11 I have internet connection monitoring configured. As soon as the roles of Router Backup are changed, the link status is lost.

        @b_it Restarting the backup router also causes such problems for me.

        My pfSense box w HA:
        Master: HP DL360G8 1x E5-2670, 64GB ECC RAM, 8x NIC (17x VLan)
        Slave: HP DL360G5, 2x E5410, 64GB ECC RAM, 6x NIC (17x VLan)

        B 1 Reply Last reply Reply Quote 0
        • B
          B_IT @Przemyslaw85
          last edited by

          @przemyslaw85 I also have Internet connection monitoring enabled. I found this discussion on https://redmine.pfsense.org/issues/12961 regarding CARP storm. I will try to play with this solution this saturday and see what will happen.

          M 1 Reply Last reply Reply Quote 0
          • M
            m4rek11 @B_IT
            last edited by

            @Przemyslaw85, Im using monitoring system too and when the problem occuring lots of hosts (vlans) are not visible for a while within it.

            @b_it, thank you for link, please, let us to known about your test.

            B 1 Reply Last reply Reply Quote 0
            • B
              B_IT @m4rek11
              last edited by

              @m4rek11 @Przemyslaw85 Sure I will. I hope that I will back with good news. Stay tuned

              B 1 Reply Last reply Reply Quote 0
              • B
                B_IT @B_IT
                last edited by

                On last Saturday I added the two patches I mentioned in the previous comment and so far it looks much better. I don't see too many unnecessary messages, both firewalls are stable after these few days. Here are all the patches I added directly to mitigate CARP issue:

                Fix CARP event storm when leaving persistent CARP maintenance mode 1/2
                https://github.com/pfsense/pfsense/commit/8a906fba5e42d391227dfc39311d02b570576d50.patch

                Fix CARP event storm when leaving persistent CARP maintenance mode 2/2
                https://github.com/pfsense/pfsense/commit/3c15b353c6968801cfffb7d3b30a7069d2330a3e.patch

                during patching Saturday I also manually added this one:
                Fix Clicking Save & Force Update on a Dynamic DNS entry results in a GUI timeout
                https://github.com/pfsense/pfsense/commit/bdffb77d1aa21770b23ef408ad9fba79d0825ec5.patch

                and I applied this three patches from recommended section:
                Disable pf counter data preservation to temporarily work around latency when reloading large rulesets (Redmine #12827)

                Fix Captive Portal handling of non-TCP traffic after login (Redmine #12834)

                Fix OpenVPN dashboard widget client termination (Redmine #12817)

                to sum up: for now I will stay with 2.6.0 version with patches

                P 1 Reply Last reply Reply Quote 0
                • P
                  Przemyslaw85 @B_IT
                  last edited by

                  @b_it I understand I have made changes for mode 1/2 and mode 2/2.
                  For mode 1/2 I have to do steps for server 1 or both.

                  My pfSense box w HA:
                  Master: HP DL360G8 1x E5-2670, 64GB ECC RAM, 8x NIC (17x VLan)
                  Slave: HP DL360G5, 2x E5410, 64GB ECC RAM, 6x NIC (17x VLan)

                  B 1 Reply Last reply Reply Quote 0
                  • B
                    B_IT @Przemyslaw85
                    last edited by

                    @przemyslaw85 I think that every node should have the same set of patches. So I patched first node, and than the second node.

                    this name is just my own convention name:

                    Fix CARP event storm when leaving persistent CARP maintenance mode 1/2
                    Fix CARP event storm when leaving persistent CARP maintenance mode 2/2

                    For CARP issue the second patch is not going to apply without the first one. This the view from one node (the second has the same set o patches)
                    4bc2fdee-0f66-45d5-a658-dfb4ca325c88-obraz.png

                    P 1 Reply Last reply Reply Quote 0
                    • P
                      Przemyslaw85 @B_IT
                      last edited by

                      @b_it I confirm the operation of the patches.
                      Yesterday I made a few changes to the original files using the file editor. I didn't know there was such a module as patches. I had to revert to the original changes from a copy made before editing.
                      As I added 1/2 2/2 patches and Dynamin DNS I did not notice any improvement. Only after I added patches # 12827, # 12834, # 12816 and # 12817 I can say that now the system works as it should.

                      My pfSense box w HA:
                      Master: HP DL360G8 1x E5-2670, 64GB ECC RAM, 8x NIC (17x VLan)
                      Slave: HP DL360G5, 2x E5410, 64GB ECC RAM, 6x NIC (17x VLan)

                      B 1 Reply Last reply Reply Quote 0
                      • B
                        B_IT @Przemyslaw85
                        last edited by

                        @przemyslaw85 Seems to me that when I started to patch (CARP) I saw that firewall is more responsive making later changes (patching) but I didn't wait too long - just rebooted both nodes to be sure that all selected patches are fully applied.
                        I have to admit that I started to make more thorough tests after I rebooting FWs (with mentioned patch set), so I can't be sure what really helped and how much.
                        BTW; The patching mechanism was introduced around version 2.5, and I've already learned from his beginning that I have to be careful selecting patches.

                        M 1 Reply Last reply Reply Quote 0
                        • M
                          m4rek11 @B_IT
                          last edited by

                          @Przemyslaw85, @B_IT, after that changes did you have carp storm in logs and that MASTER -> BACKUP, BACKUP ->MASTER change for little time?

                          B P 2 Replies Last reply Reply Quote 0
                          • B
                            B_IT @m4rek11
                            last edited by

                            @m4rek11 I am looking into logs I see that during applying patch there are some entries, but after patching I see only a few, and they all looks as they should (at least for me) and they have reason (eg. rebooted node). I wouldn't call them storm and definitely I don't see flipping MASTER - BACKUP entries now.

                            1 Reply Last reply Reply Quote 0
                            • P
                              Przemyslaw85 @m4rek11
                              last edited by Przemyslaw85

                              @m4rek11 After applying the patches, I did not notice that the routers changed the roles of Master-> Backup, Backup-> Master.
                              All the problems went with those when I made any changes to the rules, dns or DHCP.

                              I found my configuration error early. For unknown reason, for 2 different networks I sent the same vhid for Virtual IP. But the problems were still there. After applying the patches, the problem was gone.

                              My pfSense box w HA:
                              Master: HP DL360G8 1x E5-2670, 64GB ECC RAM, 8x NIC (17x VLan)
                              Slave: HP DL360G5, 2x E5410, 64GB ECC RAM, 6x NIC (17x VLan)

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.