Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    CARP switch Master/Backup every 15 minutes

    HA/CARP/VIPs
    2
    12
    1.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      marcolefo
      last edited by marcolefo

      Hi

      We have the same problem described here :
      Topic 172700

      (PFSense 2.6.0)
      We recieved these mails every 15 minutes from the slave pfsense (no mail from master)

      21:45:12 HA cluster member "(X.X.13.254@lagg0.1113): (V1113)" has resumed CARP state "MASTER" for vhid 16
      21:45:12 HA cluster member "(X.X.11.254@lagg0.1411): (V1411)" has resumed CARP state "BACKUP" for vhid 64
      21:45:12 HA cluster member "(X.X.166.254@lagg0.1266): (V1266)" has resumed CARP state "MASTER" for vhid 35
      21:45:12 HA cluster member "(X.X.50.254@lagg0.1429): (V1429)" has resumed CARP state "MASTER" for vhid 34
      21:45:12 HA cluster member "(X.X.18.254@lagg0.1118): (V1118)" has resumed CARP state "BACKUP" for vhid 21
      21:45:12 HA cluster member "(X.X.18.254@lagg0.1118): (V1118)" has resumed CARP state "MASTER" for vhid 21
      21:45:12 HA cluster member "(X.X.39.254@lagg0.1139): (V1139)" has resumed CARP state "MASTER" for vhid 68
      21:45:12 HA cluster member "(X.X.22.254@lagg0.1122): (V1122)" has resumed CARP state "MASTER" for vhid 25
      21:45:12 HA cluster member "(X.X.16.254@lagg0.1116): (V1116)" has resumed CARP state "MASTER" for vhid 19
      21:45:12 HA cluster member "(X.X.2.254@lagg0.1402): (V1402)" has resumed CARP state "MASTER" for vhid 31
      21:45:12 HA cluster member "(X.X.20.254@lagg0.1120): (V1120)" has resumed CARP state "MASTER" for vhid 23
      21:45:12 HA cluster member "(X.X.1.254@lagg0.1401): (V1401)" has resumed CARP state "MASTER" for vhid 30
      21:45:12 HA cluster member "(X.X.10.254@lagg0.1410): (V1410)" has resumed CARP state "MASTER" for vhid 61
      21:45:12 HA cluster member "(X.X.1.254@lagg0.1101): (V1101)" has resumed CARP state "MASTER" for vhid 4
      21:45:12 HA cluster member "(X.X.7.254@lagg0.1407): (V1407)" has resumed CARP state "MASTER" for vhid 42
      21:45:12 HA cluster member "(X.X.166.254@lagg0.1266): (V1266)" has resumed CARP state "BACKUP" for vhid 35
      21:45:12 HA cluster member "(X.X.20.254@lagg0.1120): (V1120)" has resumed CARP state "BACKUP" for vhid 23
      21:45:12 HA cluster member "(X.X.2.254@lagg0.1402): (V1402)" has resumed CARP state "BACKUP" for vhid 31
      21:45:12 HA cluster member "(X.X.10.254@lagg0.1410): (V1410)" has resumed CARP state "BACKUP" for vhid 61
      21:45:12 HA cluster member "(X.X.22.254@lagg0.1122): (V1122)" has resumed CARP state "BACKUP" for vhid 25
      21:45:12 HA cluster member "(X.X.1.254@lagg0.1401): (V1401)" has resumed CARP state "BACKUP" for vhid 30
      21:45:12 HA cluster member "(X.X.1.254@lagg0.1101): (V1101)" has resumed CARP state "BACKUP" for vhid 4
      21:45:12 HA cluster member "(X.X.7.254@lagg0.1407): (V1407)" has resumed CARP state "BACKUP" for vhid 42
      21:45:12 HA cluster member "(X.X.50.254@lagg0.1429): (V1429)" has resumed CARP state "BACKUP" for vhid 34
      21:45:12 HA cluster member "(X.X.39.254@lagg0.1139): (V1139)" has resumed CARP state "BACKUP" for vhid 68
      21:45:12 HA cluster member "(X.X.13.254@lagg0.1113): (V1113)" has resumed CARP state "BACKUP" for vhid 16
      21:45:12 HA cluster member "(X.X.16.254@lagg0.1116): (V1116)" has resumed CARP state "BACKUP" for vhid 19
      

      We found nothing in the slave logs at 21:45
      On the master at this time we found only

      php-fpm.log:Nov 24 21:45:12 X.X.X.X php-fpm[31372]: /rc.carpmaster: HA cluster member "(X.X.22.254@lagg0.1122): (V1122)" has resumed CARP state "MASTER" for vhid 25
      

      And we lost something like 4 or 5 pings (enough to freeze a video session)

      The lagg0 is LACP on a HPE 5700. We have tried on a HP A5500, same result.
      We tried to change the hash of the lacp from default to destination-ip source-ip.

      We have upgraded BIOS of our DELL R440 and network firmware.

      We have 122 CARP interfaces and 52 IP aliases on one of them.
      CARP interfaces are gateways for VLANs and NAT address (outbound). IP aliases are public adress for servers.

      We are going to reinstall slave first then the master.
      But perhaps someone have an idea to help us before ;) ?

      M 1 Reply Last reply Reply Quote 0
      • M
        marcolefo @marcolefo
        last edited by

        So, we have reinstalled the BACKUP.
        Then we have shutdowned the MASTER.

        The BACKUP became MASTER with no problem.

        But the issue is still present. Every 15 minutes we lost network during 4 ou 5 ping and then the network comes back.

        So I don't think it's a HA issue.

        In pfsense log, we see nothing.

        We are lost.

        S 1 Reply Last reply Reply Quote 0
        • S
          SteveITS Galactic Empire @marcolefo
          last edited by SteveITS

          @marcolefo So in your initial description they are both master? Sounds like a connectivity loss between them every 15 minutes. Switch problem?

          Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
          When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
          Upvote 👍 helpful posts!

          M 2 Replies Last reply Reply Quote 0
          • M
            marcolefo @SteveITS
            last edited by marcolefo

            @steveits no the backup become master (as the log says) but on the GUI it's still BACKUP.

            And note that the network happens too when the MASTER has been shutdowned.

            I don't know if there is a link there is a cron every 15 minutes that launch /etc/rc.filter_configure_sync/rc.filter_configure_sync

            M 1 Reply Last reply Reply Quote 0
            • M
              marcolefo @marcolefo
              last edited by

              We have powered on MASTER.
              Since BACKUP reinstall we have notications XMLRPC Error with Operations timed out

              A communications error occurred while attempting to call XMLRPC method host_firmware_version: Unable to connect to tls://X.X.X.252:443·. Error: Operation timed out @ 2022-12-01 17:33:24
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:33:26
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:34:10
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:34:53
              A communications error occurred while attempting to call XMLRPC method host_firmware_version: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:35:04
              A communications error occurred while attempting to call XMLRPC method host_firmware_version: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:35:05
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:35:08
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:35:37
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:35:51
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:36:21
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:36:34
              A communications error occurred while attempting to call XMLRPC method merge_installedpackages_section: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:37:04
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:37:18
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:37:48
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:38:01
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:38:31
              A communications error occurred while attempting to call XMLRPC method merge_installedpackages_section: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:38:45
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:39:28
              A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://X.X.X.252:443. Error: Operation timed out @ 2022-12-01 17:40:12 
              
              1 Reply Last reply Reply Quote 0
              • S
                SteveITS Galactic Empire
                last edited by

                @marcolefo said in CARP switch Master/Backup every 15 minutes:

                no the backup become master (as the log says)

                But then there should be a log on the primary that it became backup? If you're saying it successfully/correctly moves?

                but on the GUI it's still BACKUP

                The log entries you posted are all within the same second so it would be tough to catch in the GUI.

                /etc/rc.filter_configure_sync/rc.filter_configure_sync

                Not /etc/rc.filter_configure_sync? That shorter path is a normal file and is from time based rules: https://forum.netgate.com/topic/137911/cron-job-etc-rc-filter_configure_sync. If that causes a connectivity break that could cause the flapping.

                Do you have a lot of rules? The System Patches package has a patch for https://redmine.pfsense.org/issues/12827.

                Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                Upvote 👍 helpful posts!

                M 2 Replies Last reply Reply Quote 1
                • M
                  marcolefo @SteveITS
                  last edited by

                  @steveits we have changed the switch with no change.

                  Perhaps LACP problem ?

                  1 Reply Last reply Reply Quote 0
                  • M
                    marcolefo @SteveITS
                    last edited by

                    @steveits said in CARP switch Master/Backup every 15 minutes:

                    @marcolefo said in CARP switch Master/Backup every 15 minutes:

                    no the backup become master (as the log says)

                    But then there should be a log on the primary that it became backup? If you're saying it successfully/correctly moves?

                    but on the GUI it's still BACKUP

                    The log entries you posted are all within the same second so it would be tough to catch in the GUI.

                    /etc/rc.filter_configure_sync/rc.filter_configure_sync

                    Not /etc/rc.filter_configure_sync? That shorter path is a normal file and is from time based rules: https://forum.netgate.com/topic/137911/cron-job-etc-rc-filter_configure_sync. If that causes a connectivity break that could cause the flapping.

                    Yes it's a bug of my fingers ;). The cron is exactly :

                    0,15,30,45 	* 	* 	* 	* 	root 	/etc/rc.filter_configure_sync
                    

                    Do you have a lot of rules? The System Patches package has a patch for https://redmine.pfsense.org/issues/12827.

                    Yes a lot of rules we have. I will take a look at this link now

                    1 Reply Last reply Reply Quote 0
                    • M
                      marcolefo @SteveITS
                      last edited by

                      @steveits said in CARP switch Master/Backup every 15 minutes:

                      Do you have a lot of rules? The System Patches package has a patch for https://redmine.pfsense.org/issues/12827.

                      I am quite a noob... I don't find the way to install patch...

                      M 1 Reply Last reply Reply Quote 0
                      • M
                        marcolefo @marcolefo
                        last edited by

                        @marcolefo ok RTFM : https://docs.netgate.com/pfsense/en/latest/development/system-patches.html

                        Need to sleep sorry for the flood.

                        M 1 Reply Last reply Reply Quote 0
                        • M
                          marcolefo @marcolefo
                          last edited by marcolefo

                          @steveits thanks a lot !

                          The patch is working. No more ping lost no more zoom freeze, happiness.

                          I have another problem: no more sync between MASTER and BACKUP. I will make another thread ;).

                          Thanks again.

                          1 Reply Last reply Reply Quote 0
                          • M
                            marcolefo
                            last edited by

                            Ok everything is ok now.
                            The sync problem was a bad rule on pfsync interface.
                            Thanks again for your help and have a nice week end

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.