Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Long Delays in CARP WAN Failover

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    17 Posts 4 Posters 3.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      Ranch
      last edited by

      Sent PM with CAP

      1 Reply Last reply Reply Quote 0
      • DerelictD
        Derelict LAYER 8 Netgate
        last edited by

        What version are you running that it isn't decoding CARP as CARP  but VRRP in the packet capture?

        Please do this:

        On the SECONDARY node in BACKUP status, start a packet capture. Using Protocol, select CARP. Set the Count to something like 10000 so it doesn't stop.

        Note the time to the second if possible and unplug WAN on the primary.

        Let it fail over completely.

        Note the time to the second and reconnect WAN on Primary.

        You should see:

        CARP from the primary

        CARP from the primary should stop immediately after you unplug.

        CARP from the secondary should start nearly immediately after that.

        CARP from the primary should start nearly immediately after you plug it back in.

        CARP from the secondary should stop immediately after that.

        Problems like this are almost always something with the switch.

        Also include the system log on both nodes for the same time period.

        Happy to look at the details in a PM. The one you provided didn't really show anything other than CARP from the secondary for 10 seconds.

        It should look something like this:

        07:06:58.037963 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
        07:06:59.039006 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
        07:07:00.040021 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
        07:07:01.040995 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
        07:07:02.041957 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
        07:07:03.043097 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
        07:07:06.434500 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
        07:07:07.826450 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
        07:07:09.218469 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
        07:07:10.610462 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
        07:07:12.002465 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
        07:07:13.394449 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
        07:07:14.786470 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
        07:07:16.178465 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
        07:07:17.570444 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
        07:07:17.571449 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614
        07:07:18.571896 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614
        07:07:19.572983 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614
        07:07:20.573897 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614
        07:07:21.575048 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614

        Chattanooga, Tennessee, USA
        A comprehensive network diagram is worth 10,000 words and 15 conference calls.
        DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
        Do Not Chat For Help! NO_WAN_EGRESS(TM)

        1 Reply Last reply Reply Quote 0
        • R
          Ranch
          last edited by

          @Derelict:

          What version are you running that it isn't decoding CARP as CARP  but VRRP in the packet capture?

          2.3.2-RELEASE-p1

          I will do the test steps today during lunch and let you know.  What is the best way to get you the log files?

          1 Reply Last reply Reply Quote 0
          • R
            Ranch
            last edited by

            Pulled the WAN from Node1 for 30 seconds and interestingly enough it failed over in about 20 seconds which is WAY less time than usual.

            Things I noted from the test Node 1 System log nor the CAP seems to even register the CARP event.

            Attached is the log files.

            CAPS_LOGS.txt

            1 Reply Last reply Reply Quote 0
            • DerelictD
              Derelict LAYER 8 Netgate
              last edited by

              The logs from the two nodes aren't from the same time period.

              Chattanooga, Tennessee, USA
              A comprehensive network diagram is worth 10,000 words and 15 conference calls.
              DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
              Do Not Chat For Help! NO_WAN_EGRESS(TM)

              1 Reply Last reply Reply Quote 0
              • R
                Ranch
                last edited by

                Sorry didn't catch that Node1 was wrong it is listed below now for 11:14:59-11:22:09

                The others were right, right?
                Node2 Log 11:20:39-11:21:28
                Node1 CAP 11:20:28-11:21:54
                Node2 CAP 11:20:26-11:21:52

                Nov 10 11:22:09 check_reload_status updating dyndns WANGW
                Nov 10 11:21:41 php-fpm 26622 /rc.carpmaster: HA cluster member "(10.41.41.1@em0_vlan41): (CU_STUDENT_VLAN41)" has resumed CARP state "MASTER" for vhid 13
                Nov 10 11:21:41 php-fpm 26622 /rc.carpmaster: Message sent to jwilhite@raschools.com OK
                Nov 10 11:21:39 php-fpm 18153 /rc.carpmaster: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "MASTER" for vhid 10
                Nov 10 11:21:39 php-fpm 18153 /rc.carpmaster: Message sent to jwilhite@raschools.com OK
                Nov 10 11:21:39 php-fpm 26987 /rc.carpmaster: HA cluster member "(10.41.31.1@em0_vlan31): (CU_STAFF_VLAN31)" has resumed CARP state "MASTER" for vhid 12
                Nov 10 11:21:39 php-fpm 26987 /rc.carpmaster: Message sent to jwilhite@raschools.com OK
                Nov 10 11:21:37 php-fpm 24092 /rc.carpmaster: HA cluster member "(10.41.43.1@em0): (LAN)" has resumed CARP state "MASTER" for vhid 11
                Nov 10 11:21:37 php-fpm 24092 /rc.carpmaster: Message sent to jwilhite@raschools.com OK
                Nov 10 11:21:35 xinetd 17858 Reconfigured: new=0 old=23 dropped=0 (services)
                Nov 10 11:21:35 xinetd 17858 readjusting service 19010-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19010-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19009-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19009-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19008-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19008-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19007-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19007-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19006-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19006-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19005-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19005-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19004-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19004-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19003-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19003-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19002-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19002-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19001-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19001-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19000-udp
                Nov 10 11:21:35 xinetd 17858 readjusting service 19000-tcp
                Nov 10 11:21:35 xinetd 17858 readjusting service 6969-udp
                Nov 10 11:21:35 xinetd 17858 Swapping defaults
                Nov 10 11:21:35 xinetd 17858 Starting reconfiguration
                Nov 10 11:21:35 check_reload_status Reloading filter
                Nov 10 11:21:35 php-fpm 18153 /rc.newwanip: Configuring CARP settings finalize…
                Nov 10 11:21:35 php-fpm 18153 /rc.newwanip: pfsync done in 0 seconds.
                Nov 10 11:21:35 php-fpm 18153 /rc.newwanip: waiting for pfsync...
                Nov 10 11:21:34 php-fpm 18153 /rc.newwanip: rc.newwanip: on (IP address: 67.239.73.182) (interface: WAN[wan]) (real interface: em1).
                Nov 10 11:21:34 php-fpm 18153 /rc.newwanip: rc.newwanip: Info: starting on em1.
                Nov 10 11:21:34 php-fpm 18153 /rc.carpbackup: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "BACKUP" for vhid 10
                Nov 10 11:21:34 php-fpm 18153 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                Nov 10 11:21:29 kernel ifa_add_loopback_route: insertion failed: 17
                Nov 10 11:21:29 kernel carp: VHID 10@em1: BACKUP -> MASTER (master down)
                Nov 10 11:21:29 check_reload_status Carp master event
                Nov 10 11:21:28 xinetd 17858 Reconfigured: new=0 old=23 dropped=0 (services)
                Nov 10 11:21:27 xinetd 17858 readjusting service 19010-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19010-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19009-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19009-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19008-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19008-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19007-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19007-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19006-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19006-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19005-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19005-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19004-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19004-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19003-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19003-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19002-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19002-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19001-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19001-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19000-udp
                Nov 10 11:21:27 xinetd 17858 readjusting service 19000-tcp
                Nov 10 11:21:27 xinetd 17858 readjusting service 6969-udp
                Nov 10 11:21:27 xinetd 17858 Swapping defaults
                Nov 10 11:21:27 xinetd 17858 Starting reconfiguration
                Nov 10 11:21:27 check_reload_status Reloading filter
                Nov 10 11:21:27 check_reload_status rc.newwanip starting em1
                Nov 10 11:21:27 php-fpm 24092 /rc.linkup: Hotplug event detected for WAN(wan) static IP (67.239.73.182 )
                Nov 10 11:21:27 check_reload_status Carp master event
                Nov 10 11:21:27 check_reload_status Carp master event
                Nov 10 11:21:27 check_reload_status Carp master event
                Nov 10 11:21:27 kernel carp: VHID 12@em0_vlan31: BACKUP -> MASTER (preempting a slower master)
                Nov 10 11:21:27 kernel carp: VHID 13@em0_vlan41: BACKUP -> MASTER (preempting a slower master)
                Nov 10 11:21:27 kernel carp: VHID 11@em0: BACKUP -> MASTER (preempting a slower master)
                Nov 10 11:21:26 check_reload_status Reloading filter
                Nov 10 11:21:26 check_reload_status Restarting OpenVPN tunnels/interfaces
                Nov 10 11:21:26 check_reload_status Restarting ipsec tunnels
                Nov 10 11:21:26 check_reload_status updating dyndns WANGW
                Nov 10 11:21:26 check_reload_status Linkup starting em1
                Nov 10 11:21:26 kernel em1: link state changed to UP
                Nov 10 11:21:26 kernel carp: demoted by -240 to 0 (interface up)
                Nov 10 11:21:26 kernel carp: VHID 10@em1: INIT -> BACKUP
                Nov 10 11:21:26 check_reload_status Carp backup event
                Nov 10 11:21:10 php-fpm 17632 /rc.carpbackup: HA cluster member "(10.41.43.1@em0): (LAN)" has resumed CARP state "BACKUP" for vhid 11
                Nov 10 11:21:10 php-fpm 18153 /rc.carpbackup: HA cluster member "(10.41.31.1@em0_vlan31): (CU_STAFF_VLAN31)" has resumed CARP state "BACKUP" for vhid 12
                Nov 10 11:21:10 php-fpm 17941 /rc.carpbackup: HA cluster member "(10.41.41.1@em0_vlan41): (CU_STUDENT_VLAN41)" has resumed CARP state "BACKUP" for vhid 13
                Nov 10 11:21:10 php-fpm 17632 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                Nov 10 11:21:10 php-fpm 18153 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                Nov 10 11:21:10 php-fpm 17941 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                Nov 10 11:21:07 xinetd 17858 Reconfigured: new=0 old=23 dropped=0 (services)
                Nov 10 11:21:07 xinetd 17858 readjusting service 19010-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19010-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19009-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19009-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19008-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19008-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19007-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19007-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19006-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19006-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19005-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19005-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19004-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19004-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19003-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19003-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19002-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19002-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19001-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19001-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19000-udp
                Nov 10 11:21:07 xinetd 17858 readjusting service 19000-tcp
                Nov 10 11:21:07 xinetd 17858 readjusting service 6969-udp
                Nov 10 11:21:07 xinetd 17858 Swapping defaults
                Nov 10 11:21:07 xinetd 17858 Starting reconfiguration
                Nov 10 11:21:07 php-fpm 15607 /rc.carpbackup: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "BACKUP" for vhid 10
                Nov 10 11:21:07 php-fpm 15607 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                Nov 10 11:21:00 check_reload_status Reloading filter
                Nov 10 11:21:00 php-fpm 17632 /rc.linkup: Hotplug event detected for WAN(wan) static IP (67.239.73.182 )
                Nov 10 11:20:59 check_reload_status Carp backup event
                Nov 10 11:20:59 check_reload_status Carp backup event
                Nov 10 11:20:59 check_reload_status Carp backup event
                Nov 10 11:20:59 check_reload_status Linkup starting em1
                Nov 10 11:20:59 check_reload_status Carp backup event
                Nov 10 11:20:59 kernel carp: VHID 12@em0_vlan31: MASTER -> BACKUP (more frequent advertisement received)
                Nov 10 11:20:59 kernel carp: VHID 13@em0_vlan41: MASTER -> BACKUP (more frequent advertisement received)
                Nov 10 11:20:59 kernel carp: VHID 11@em0: MASTER -> BACKUP (more frequent advertisement received)
                Nov 10 11:20:59 kernel em1: link state changed to DOWN
                Nov 10 11:20:59 kernel carp: demoted by 240 to 240 (interface down)
                Nov 10 11:20:59 kernel em1: RX Next to Refresh = 348
                Nov 10 11:20:59 kernel em1: RX Next to Check = 349
                Nov 10 11:20:59 kernel em1: RX discarded packets = 0
                Nov 10 11:20:59 kernel em1: hw rdh = 349, hw rdt = 348
                Nov 10 11:20:59 kernel em1: RX Queue 0 –----
                Nov 10 11:20:59 kernel em1: Tx Descriptors avail failure = 2320
                Nov 10 11:20:59 kernel em1: TX descriptors avail = 40
                Nov 10 11:20:59 kernel em1: Tx Queue Status = -2147483648
                Nov 10 11:20:59 kernel em1: hw tdh = 30, hw tdt = 1014
                Nov 10 11:20:59 kernel em1: TX Queue 0 ------
                Nov 10 11:20:59 kernel Interface is RUNNING and ACTIVE
                Nov 10 11:20:59 kernel em1: Watchdog timeout Queue[0]– resetting
                Nov 10 11:20:52 xinetd 17858 Reconfigured: new=0 old=23 dropped=0 (services)
                Nov 10 11:20:52 xinetd 17858 readjusting service 19010-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19010-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19009-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19009-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19008-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19008-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19007-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19007-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19006-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19006-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19005-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19005-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19004-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19004-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19003-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19003-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19002-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19002-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19001-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19001-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19000-udp
                Nov 10 11:20:52 xinetd 17858 readjusting service 19000-tcp
                Nov 10 11:20:52 xinetd 17858 readjusting service 6969-udp
                Nov 10 11:20:52 xinetd 17858 Swapping defaults
                Nov 10 11:20:52 xinetd 17858 Starting reconfiguration
                Nov 10 11:20:51 check_reload_status Reloading filter
                Nov 10 11:20:51 check_reload_status Restarting OpenVPN tunnels/interfaces
                Nov 10 11:20:51 check_reload_status Restarting ipsec tunnels
                Nov 10 11:20:51 check_reload_status updating dyndns WANGW
                Nov 10 11:14:59 php-fpm 63346 /widgets/widgets/ipsec.widget.php: Unable to find IPsec daemon leases file. Could not display mobile user stats!

                1 Reply Last reply Reply Quote 0
                • DerelictD
                  Derelict LAYER 8 Netgate
                  last edited by

                  You sure you don't need something like spanning-tree portfast on your switch ports?

                  This stuff is - seriously - almost always something with the switch.

                  Chattanooga, Tennessee, USA
                  A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                  DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                  Do Not Chat For Help! NO_WAN_EGRESS(TM)

                  1 Reply Last reply Reply Quote 0
                  • DerelictD
                    Derelict LAYER 8 Netgate
                    last edited by

                    Looks like it's taking quite some time for the system to detect the cable unplug event.

                    The secondary loses CARP from the primary an 11:20:39

                    Then 20 seconds later you get this on the primary:

                    Nov 10 11:20:59  kernel      em1: Watchdog timeout Queue[0]– resetting

                    At 11:21:26 the link comes back up. Failing back to master is immediate.

                    This is also strange:

                    Nov 10 11:21:34  php-fpm  18153  /rc.carpbackup: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "BACKUP" for vhid 10
                    Nov 10 11:21:34  php-fpm  18153  /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                    Nov 10 11:21:29  kernel      ifa_add_loopback_route: insertion failed: 17
                    Nov 10 11:21:29  kernel      carp: VHID 10@em1: BACKUP -> MASTER (master down)

                    Probably want to capture CARP on primary to see what you see there.

                    Chattanooga, Tennessee, USA
                    A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                    DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                    Do Not Chat For Help! NO_WAN_EGRESS(TM)

                    1 Reply Last reply Reply Quote 0
                    • dotdashD
                      dotdash
                      last edited by

                      Do you have a lot of people unplugging your WAN cables? I've never had a cable failure on a CARP node.
                      I've seen- power supply failure, hard disk failure, OS crash… I know you want to test things, but I don't think your test is very likely to mirror an actual failure. Try putting the master in maintenance mode, or powering it off. If that works it should work in a real failure. If you really want to dig into the cable thing, watch the CARP status on each node when you unplug the WAN, preemption should trigger the rest of the CARP VIPs to go into backup after the WAN goes.

                      1 Reply Last reply Reply Quote 0
                      • R
                        Ranch
                        last edited by

                        @Derelict:

                        Looks like it's taking quite some time for the system to detect the cable unplug event.

                        The secondary loses CARP from the primary an 11:20:39

                        Then 20 seconds later you get this on the primary:

                        Nov 10 11:20:59  kernel      em1: Watchdog timeout Queue[0]– resetting

                        At 11:21:26 the link comes back up. Failing back to master is immediate.

                        This is also strange:

                        Nov 10 11:21:34  php-fpm  18153  /rc.carpbackup: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "BACKUP" for vhid 10
                        Nov 10 11:21:34  php-fpm  18153  /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                        Nov 10 11:21:29  kernel      ifa_add_loopback_route: insertion failed: 17
                        Nov 10 11:21:29  kernel      carp: VHID 10@em1: BACKUP -> MASTER (master down)

                        Probably want to capture CARP on primary to see what you see there.

                        I uploaded the CAP for CARP from the Primary and Secondary WAN, are you needing it from another PORT?

                        Thanks for noting the strange delay, is there a way to reduce the watchdog time?  I am pretty sure my config is pretty standard, I haven't done any tweaking to this install.

                        @dotdash:

                        Do you have a lot of people unplugging your WAN cables? I've never had a cable failure on a CARP node.
                        I've seen- power supply failure, hard disk failure, OS crash… I know you want to test things, but I don't think your test is very likely to mirror an actual failure. Try putting the master in maintenance mode, or powering it off. If that works it should work in a real failure. If you really want to dig into the cable thing, watch the CARP status on each node when you unplug the WAN, preemption should trigger the rest of the CARP VIPs to go into backup after the WAN goes.

                        Although I have had a unplug problem ONCE by accident that is not the norm.  My assumption was that from the perspective of the nodes I was testing a single Switch/Port Failure.

                        I will try a power failure of Node1 after hours and then the switch power failure Node1 is on.

                        1 Reply Last reply Reply Quote 0
                        • DerelictD
                          Derelict LAYER 8 Netgate
                          last edited by

                          He has already stated that everything works fine going into and out of maintenance mode.

                          I was going to also say that this is a very odd failure test. With that hardware that might be the best that it will do. Unplugging a cable should not result in a watchdog timeout. It should result in a link down. Not sure why that is happening.

                          Chattanooga, Tennessee, USA
                          A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                          DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                          Do Not Chat For Help! NO_WAN_EGRESS(TM)

                          1 Reply Last reply Reply Quote 0
                          • R
                            Ranch
                            last edited by

                            Couldn't do a switch fail too many people left on the network.  However the power fail for the Primary Node1 worked flawlessly, never missed a beat on pings.  I will have to test this weekend or next week for the switch fail test.

                            Probably a dumb question, why does pulling the cord not simulate the same failure?

                            Below is the CAP and Log from Node2 during the Node1 fail test.

                            Node2
                            15:50:02.942083 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:02.942125 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:04.333891 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:04.333904 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:05.725931 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:05.725943 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:07.117895 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:07.117909 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:08.509933 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:08.509945 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:09.901886 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:09.901898 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:11.293933 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:11.293945 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:12.696925 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:12.696937 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:14.095942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:14.095955 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:15.487932 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:15.487944 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:16.894938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:16.894950 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:18.286944 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:18.286956 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:19.678938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:19.678950 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:21.070930 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:21.070941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:22.481941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:22.481953 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:23.886938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:23.886955 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:25.287937 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:25.287948 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:26.690950 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:26.690969 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:28.095943 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:28.095960 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:29.493929 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:29.493941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:30.898938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:30.898951 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:32.304932 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:32.304945 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:33.703947 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:33.703961 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:35.101930 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:35.101942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:36.506943 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:36.506955 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:37.913927 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:37.913939 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:39.318943 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:39.318955 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:40.717936 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:40.717954 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:42.109945 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:42.109957 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:43.528942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:43.528960 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:44.926941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:44.926953 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:46.325926 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:46.325938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:47.717949 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:47.717966 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:49.109942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:49.109959 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:50.507937 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:50.507949 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:51.913932 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:51.913944 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:53.317941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                            15:50:53.317953 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:54.717942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                            15:50:54.717959 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866

                            Nov 10 15:51:30 check_reload_status Carp backup event
                            Nov 10 15:51:30 kernel carp: VHID 9@em1: MASTER -> BACKUP (more frequent advertisement received)
                            Nov 10 15:51:30 kernel carp: VHID 10@em1: MASTER -> BACKUP (more frequent advertisement received)
                            Nov 10 15:51:30 check_reload_status Carp backup event
                            Nov 10 15:51:26 check_reload_status Carp backup event
                            Nov 10 15:51:26 check_reload_status Carp backup event
                            Nov 10 15:51:26 kernel carp: VHID 12@em0_vlan31: MASTER -> BACKUP (more frequent advertisement received)
                            Nov 10 15:51:26 kernel carp: VHID 13@em0_vlan41: MASTER -> BACKUP (more frequent advertisement received)
                            Nov 10 15:51:26 kernel carp: VHID 11@em0: MASTER -> BACKUP (more frequent advertisement received)
                            Nov 10 15:51:26 check_reload_status Carp backup event
                            Nov 10 15:50:10 php-fpm 30196 /rc.carpmaster: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "MASTER" for vhid 10
                            Nov 10 15:50:10 php-fpm 30196 /rc.carpmaster: Message sent to admin@raschools.com OK
                            Nov 10 15:50:10 php-fpm 91222 /rc.carpmaster: HA cluster member "(67.239.73.178@em1): (WAN)" has resumed CARP state "MASTER" for vhid 9
                            Nov 10 15:50:10 php-fpm 91222 /rc.carpmaster: Message sent to admin@raschools.com OK
                            Nov 10 15:50:10 php-fpm 29856 /rc.carpmaster: HA cluster member "(10.41.31.1@em0_vlan31): (VLAN31_STAFF)" has resumed CARP state "MASTER" for vhid 12
                            Nov 10 15:50:10 php-fpm 29546 /rc.carpmaster: HA cluster member "(10.41.41.1@em0_vlan41): (VLAN41_CHROMEBOOKS)" has resumed CARP state "MASTER" for vhid 13
                            Nov 10 15:50:10 php-fpm 29856 /rc.carpmaster: Message sent to admin@raschools.com OK
                            Nov 10 15:50:10 php-fpm 29546 /rc.carpmaster: Message sent to admin@raschools.com OK
                            Nov 10 15:50:07 php-fpm 91222 /rc.carpmaster: HA cluster member "(10.41.43.1@em0): (LAN)" has resumed CARP state "MASTER" for vhid 11
                            Nov 10 15:50:07 php-fpm 91222 /rc.carpmaster: Message sent to admin@raschools.com OK
                            Nov 10 15:50:02 check_reload_status Carp master event
                            Nov 10 15:50:02 check_reload_status Carp master event
                            Nov 10 15:50:02 check_reload_status Carp master event
                            Nov 10 15:50:02 check_reload_status Carp master event
                            Nov 10 15:50:02 kernel carp: VHID 9@em1: BACKUP -> MASTER (master down)
                            Nov 10 15:50:02 kernel carp: VHID 10@em1: BACKUP -> MASTER (master down)
                            Nov 10 15:50:02 kernel carp: VHID 12@em0_vlan31: BACKUP -> MASTER (master down)
                            Nov 10 15:50:02 kernel carp: VHID 13@em0_vlan41: BACKUP -> MASTER (master down)
                            Nov 10 15:50:02 kernel carp: VHID 11@em0: BACKUP -> MASTER (master down)
                            Nov 10 15:50:02 check_reload_status Carp master event

                            1 Reply Last reply Reply Quote 0
                            • L
                              lexaz77757
                              last edited by

                              Hello, I have the same problem with my Pfsense CARP.
                              When I pull out the WAN cable, it takes very much time, before PfSense detects the cable failure.
                              Are there any ways to solve the problem?
                              Thanks

                              1 Reply Last reply Reply Quote 0
                              • R
                                Ranch
                                last edited by

                                If it was the exact same issue I had then the real problem ended up being my testing method.  Although I never understood why yanking a cable isn't a valid test.  When I simply unplug POWER from the primary router the failover is nearly instantaneous.  I never posted a follow up on simulating a switch failure but that also failed over as expected.

                                On another note, I have had to enter maintenance mode 5 times recently so that I could change out some other equipment inline with these.  Three out of five the switchover was nearly 30seconds and the WebInterface on both routers locks up completely which switching over.  Looking at top via ssh during the switchover doesn't show anything locked up nor maxing out RAM/CPU so not sure what the problem was there if it is related at all.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.