Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    [SOLVED] VIP fails over to slave but does not go back to master

    HA/CARP/VIPs
    2
    7
    1.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      darkconz
      last edited by

      I've searched the forum and internet for 2 days already and I still cannot find the solution to my problem. This setup was working in 2.2 and it was later upgraded to 2.3 and now it is on 2.3.4

      I have two VMs (master and slave) each with 3 interfaces (WAN/SYNC/LAN). This was a working setup but recently broke and I do not know why.. So the order of all interfaces match, all settings in each interface match and net.inet.carp.demotion = 0.

      When both VMs are up, master being the master has all VIPs and everything is going through master. When I do "tcpdump -i <ifname>-ttt -n proto CARP", I can clearly see the master VIPs are broadcasting.

      When I hit reboot button on master, everything fails over to slave perfectly fine. However, when master comes back online, there is no internet connectivity. I can see the master is broadcasting in tcpdump, no errors in system log. In system log, I see the slave is changing status back to Backup and master's CARP status changing back to Master with this: carp: VHID 130@hn0: BACKUP -> MASTER (preempting a slower master).

      The only way to get everything back in order is turn off BOTH master and slave. Turn on master first, then follow by slave so master has enough time to claim the master status.

      Most of the cases I see in here do not have fail over working. In my case, failing master does pass the baton to slave. However, when master returns, master claims to be master, slave returns to be backup but the VIP is not reachable.

      I think this only happens to my LAN VIP and my WAN VIPs are not affected because I can connect to the VPN, it's just I can't get anywhere because the LAN VIP is not responding (which acts as the GW for all internal network).</ifname>

      1 Reply Last reply Reply Quote 0
      • D
        darkconz
        last edited by

        I just want to provide an update.. I did a "wait" test and there seems to be a 15 minute timer somewhere… after I reboot the master and lost connectivity to the internet, it would resume in 15 minutes...

        1 Reply Last reply Reply Quote 0
        • D
          darkconz
          last edited by

          Another thing I did was, I tried to change something on master and hit Apply after it resumed in 15 minutes, this caused the master VM to freeze. On the slave VM, I looked up the tcpdump and nothing is being broadcasted

          1 Reply Last reply Reply Quote 0
          • D
            darkconz
            last edited by

            Here are the system logs when I applied new settings and caused both systems to panic:

            Master VM:

            
            06/01/2017 8:19	php-fpm	99885	/rc.filter_synchronize: Beginning XMLRPC sync to https://"SYNC.INTERFACE":443.
            06/01/2017 8:19	check_reload_status		Syncing firewall
            06/01/2017 8:20	check_reload_status		Syncing firewall
            06/01/2017 8:20	check_reload_status		Syncing firewall
            06/01/2017 8:20	php-fpm	7014	/rc.filter_synchronize: Beginning XMLRPC sync to https://"SYNC.INTERFACE":443.
            06/01/2017 8:20	php-fpm	15977	/rc.filter_synchronize: Beginning XMLRPC sync to https://"SYNC.INTERFACE":443.
            06/01/2017 8:20	php-fpm	16151	/rc.filter_synchronize: Beginning XMLRPC sync to https://"SYNC.INTERFACE":443.
            06/01/2017 8:20	php-fpm	99885	/rc.filter_synchronize: XMLRPC sync successfully completed with https://"SYNC.INTERFACE":443.
            06/01/2017 8:20	php-fpm	7014	/rc.filter_synchronize: XMLRPC sync successfully completed with https://"SYNC.INTERFACE":443.
            06/01/2017 8:21	php-fpm	15977	/rc.filter_synchronize: XML_RPC_Client: RPC server did not send response before timeout. 103
            06/01/2017 8:21	php-fpm	15977	/rc.filter_synchronize: A communications error occurred while attempting XMLRPC sync with username admin https://"SYNC.INTERFACE":443.
            06/01/2017 8:21	php-fpm	15977	/rc.filter_synchronize: New alert found: A communications error occurred while attempting XMLRPC sync with username admin https://"SYNC.INTERFACE":443.
            06/01/2017 8:21	php-fpm	15977	/rc.filter_synchronize: Beginning XMLRPC sync to https://"SYNC.INTERFACE":443.
            06/01/2017 8:21	php-fpm	16151	/rc.filter_synchronize: XML_RPC_Client: RPC server did not send response before timeout. 103
            06/01/2017 8:21	php-fpm	16151	/rc.filter_synchronize: A communications error occurred while attempting XMLRPC sync with username admin https://"SYNC.INTERFACE":443.
            06/01/2017 8:21	php-fpm	16151	/rc.filter_synchronize: New alert found: A communications error occurred while attempting XMLRPC sync with username admin https://"SYNC.INTERFACE":443.
            06/01/2017 8:21	php-fpm	16151	/rc.filter_synchronize: Beginning XMLRPC sync to https://"SYNC.INTERFACE":443.
            06/01/2017 8:21	kernel		hn1: promiscuous mode disabled
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	php-fpm	99885	/rc.filter_synchronize: XML_RPC_Client: RPC server did not send response before timeout. 103
            06/01/2017 8:21	php-fpm	99885	/rc.filter_synchronize: A communications error occurred while attempting Filter sync with username admin https://"SYNC.INTERFACE":443.
            06/01/2017 8:21	php-fpm	99885	/rc.filter_synchronize: New alert found: A communications error occurred while attempting Filter sync with username admin https://"SYNC.INTERFACE":443.
            
            

            Slave VM:

            
            06/01/2017 8:20	check_reload_status		Syncing firewall
            06/01/2017 8:20	check_reload_status		Carp backup event
            06/01/2017 8:20	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:20	kernel		carp: VHID 1@hn1: INIT -> BACKUP
            06/01/2017 8:20	check_reload_status		Carp backup event
            06/01/2017 8:20	check_reload_status		Carp backup event
            06/01/2017 8:20	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:20	kernel		hn1: promiscuous mode disabled
            06/01/2017 8:20	php-fpm	44358	/rc.carpbackup: HA cluster member "("LAN CARP VIP"@hn1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
            06/01/2017 8:20	php-fpm	44358	/rc.carpbackup: HA cluster member "("LAN CARP VIP"@hn1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
            06/01/2017 8:20	php-fpm	48909	/xmlrpc.php: waiting for pfsync...
            06/01/2017 8:21	php-fpm	48909	/xmlrpc.php: pfsync done in 30 seconds.
            06/01/2017 8:21	php-fpm	48909	/xmlrpc.php: Configuring CARP settings finalize...
            06/01/2017 8:21	check_reload_status		Syncing firewall
            06/01/2017 8:21	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:21	kernel		carp: VHID 130@hn0: INIT -> BACKUP
            06/01/2017 8:21	kernel		hn1: promiscuous mode enabled
            06/01/2017 8:21	kernel		carp: VHID 1@hn1: INIT -> BACKUP
            06/01/2017 8:21	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:21	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:21	kernel		hn1: promiscuous mode disabled
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	php-fpm	55680	/xmlrpc.php: waiting for pfsync...
            06/01/2017 8:21	php-fpm	55680	/xmlrpc.php: pfsync done in 30 seconds.
            06/01/2017 8:21	php-fpm	55680	/xmlrpc.php: Configuring CARP settings finalize...
            06/01/2017 8:21	check_reload_status		Syncing firewall
            06/01/2017 8:21	kernel		carp: VHID 130@hn0: INIT -> BACKUP
            06/01/2017 8:21	kernel		hn1: promiscuous mode enabled
            06/01/2017 8:21	kernel		carp: VHID 1@hn1: INIT -> BACKUP
            06/01/2017 8:21	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:21	kernel		carp: VHID 135@hn0: INIT -> BACKUP
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	kernel		hn1: promiscuous mode disabled
            06/01/2017 8:21	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	check_reload_status		Carp backup event
            06/01/2017 8:21	php-fpm	55680	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.130@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 130
            06/01/2017 8:21	php-fpm	55680	/rc.carpbackup: HA cluster member "("LAN CARP VIP"@hn1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
            06/01/2017 8:21	php-fpm	55680	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.130@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 130
            06/01/2017 8:21	php-fpm	55680	/rc.carpbackup: HA cluster member "("LAN CARP VIP"@hn1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
            06/01/2017 8:22	php-fpm	63143	/xmlrpc.php: waiting for pfsync...
            06/01/2017 8:22	kernel		carp: VHID 97@hn0: BACKUP -> MASTER (master down)
            06/01/2017 8:22	check_reload_status		Carp master event
            06/01/2017 8:22	php-fpm	63143	/xmlrpc.php: pfsync done in 30 seconds.
            06/01/2017 8:22	php-fpm	63143	/xmlrpc.php: Configuring CARP settings finalize...
            06/01/2017 8:22	check_reload_status		Syncing firewall
            06/01/2017 8:22	kernel		carp: VHID 130@hn0: INIT -> BACKUP
            06/01/2017 8:22	kernel		hn1: promiscuous mode enabled
            06/01/2017 8:22	kernel		carp: VHID 1@hn1: INIT -> BACKUP
            06/01/2017 8:22	kernel		carp: VHID 135@hn0: INIT -> BACKUP
            06/01/2017 8:22	check_reload_status		Carp backup event
            06/01/2017 8:22	check_reload_status		Carp backup event
            06/01/2017 8:22	check_reload_status		Carp backup event
            06/01/2017 8:22	kernel		carp: VHID 140@hn0: INIT -> BACKUP
            06/01/2017 8:22	check_reload_status		Carp backup event
            06/01/2017 8:22	check_reload_status		Carp backup event
            06/01/2017 8:22	check_reload_status		Carp backup event
            06/01/2017 8:22	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:22	check_reload_status		Carp backup event
            06/01/2017 8:22	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:22	check_reload_status		Carp backup event
            06/01/2017 8:22	kernel		hn1: promiscuous mode disabled
            06/01/2017 8:22	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:22	check_reload_status		Carp backup event
            06/01/2017 8:22	kernel		ifa_del_loopback_route: deletion failed: 3
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.130@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 130
            06/01/2017 8:22	kernel		hn0: promiscuous mode disabled
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "("LAN CARP VIP"@hn1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.135@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 135
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.130@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 130
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "("LAN CARP VIP"@hn1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.135@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 135
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.130@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 130
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "("LAN CARP VIP"@hn1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.135@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 135
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.140@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 140
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.130@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 130
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "("LAN CARP VIP"@hn1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.135@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 135
            06/01/2017 8:22	php-fpm	63143	/rc.carpbackup: HA cluster member "(xxx.xxx.xxx.140@hn0): (WAN)" has resumed CARP state "BACKUP" for vhid 140
            06/01/2017 8:22	php-fpm	44358	/xmlrpc.php: waiting for pfsync...
            06/01/2017 8:22	php-fpm	44358	/xmlrpc.php: pfsync done in 0 seconds.
            06/01/2017 8:22	php-fpm	44358	/xmlrpc.php: Configuring CARP settings finalize...
            06/01/2017 8:22	php-fpm	48909	/xmlrpc.php: ROUTING: setting default route to xxx.xxx.xxx.129
            06/01/2017 8:22	php-fpm	48909	/xmlrpc.php: Resyncing OpenVPN instances.
            06/01/2017 8:22	check_reload_status		Reloading filter
            06/01/2017 8:23	kernel		hn1: promiscuous mode enabled
            06/01/2017 8:26	kernel		hn1: promiscuous mode disabled
            06/01/2017 8:26	php-cgi		rc.initial.halt: Stopping all packages.
            06/01/2017 8:26	shutdown		power-down by root:
            
            

            hn0 = WAN
            hn1 = LAN
            hn2 = SYNC

            I noticed interfaces had promiscuous mode disabled/enabled.

            1 Reply Last reply Reply Quote 0
            • dotdashD
              dotdash
              last edited by

              You may want to check in the Virtualization forum to confirm your hypervisor settings are correct.

              1 Reply Last reply Reply Quote 0
              • D
                darkconz
                last edited by

                I am just updating the status of the issue here. From what I remembered, when it was working, these VMs were in the same versions of HyperV. However, I moved the slave to a new host few months ago so these had different versions of Hyper V hosts (one 2016 and one 2012R2). I moved the master to the new host and now the fail over and fail back work fine.

                I am still uncertain right now it is working because both VMs are residing on the same host or different versions of hosts. I will perform an upgrade to one of my older nodes to bring it to 2016 and perform this test.

                Will report back once I get an update.

                1 Reply Last reply Reply Quote 0
                • D
                  darkconz
                  last edited by

                  Just reporting back my success.

                  I successfully brought up a temporary node to the same version as my other node. Moved the slave VM over and tested the fail over. Both way worked.

                  I later had a look at the event logs and I saw the incompatibility of the integration software on VM on my host.

                  All these trouble and it was because the VM on the node didn't have the right version of integration software….

                  I hope this can help others too... If you are running pfSense on a VM, make sure you check the integration software and have the correct version installed. Sometimes when you migrate back and forth, you lose track on the software version and it may not be compatible with the host's version!

                  Thank you.

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.