Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Long Delays in CARP WAN Failover

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    17 Posts 4 Posters 3.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      Ranch
      last edited by

      What is the delay I should expect on a WAN Port Fail?

      When I unplug the LAN from the Primary Node it takes about 10 seconds for the backup Node to resume answering pings on the LAN VIP addresses.  It takes another 3.5mins for the Backup Node to start responding to the WAN VIP.  This was not my expectation can anyone confirm if this is normal?

      Also if I unplug the WAN port only from the Primary Node it takes 70 seconds for the Backup Node to start responding to the WAN VIP.

      Basic config

      em1
      WAN VIP .180
      WAN PF1 .181
      WAN PF2 .182
      Skew PF1    0
      Skew PF2 100

      em0
      LAN VIP  10.41.43.1
      LAN PF1 10.41.43.11
      LAN PF2 10.41.43.12
      Skew PF1    0
      Skew PF2 100

      em0_vlan31
      LAN VIP  10.41.31.1
      LAN PF1 10.41.31.2
      LAN PF2 10.41.31.3
      Skew PF1    0
      Skew PF2 100

      em0_vlan41
      LAN VIP  10.41.41.1
      LAN PF1 10.41.41.2
      LAN PF2 10.41.41.3
      Skew PF1    0
      Skew PF2 100

      em3 - Plugged Directly from one box to another
      HA PF1 10.10.10.1
      HA PF2 10.10.10.2

      2.3.2-RELEASE-p1 (amd64)
      built on Tue Sep 27 12:13:07 CDT 2016
      FreeBSD 10.3-RELEASE-p9

      1 Reply Last reply Reply Quote 0
      • DerelictD
        Derelict LAYER 8 Netgate
        last edited by

        Definitely not normal.

        Are those typos on the IP addresses on em0_vlan31 and em0_vlan41?

        What happens when you just enter CARP maintenance mode on the primary in Status > CARP instead of yanking cables?

        I would packet capture on WAN on the secondary during a failover test and recovery and see what's really going on.

        This is a failover and a fail back on a bench WAN setup:

        $ ping 172.25.228.17
        PING 172.25.228.17 (172.25.228.17): 56 data bytes
        64 bytes from 172.25.228.17: icmp_seq=0 ttl=63 time=0.687 ms
        64 bytes from 172.25.228.17: icmp_seq=1 ttl=63 time=0.529 ms
        64 bytes from 172.25.228.17: icmp_seq=2 ttl=63 time=0.407 ms
        64 bytes from 172.25.228.17: icmp_seq=3 ttl=63 time=0.376 ms
        Request timeout for icmp_seq 4
        Request timeout for icmp_seq 5
        64 bytes from 172.25.228.17: icmp_seq=6 ttl=63 time=0.723 ms
        64 bytes from 172.25.228.17: icmp_seq=7 ttl=63 time=0.566 ms
        64 bytes from 172.25.228.17: icmp_seq=8 ttl=63 time=0.562 ms
        64 bytes from 172.25.228.17: icmp_seq=9 ttl=63 time=0.570 ms
        64 bytes from 172.25.228.17: icmp_seq=10 ttl=63 time=0.742 ms
        64 bytes from 172.25.228.17: icmp_seq=11 ttl=63 time=0.459 ms
        64 bytes from 172.25.228.17: icmp_seq=12 ttl=63 time=0.421 ms
        64 bytes from 172.25.228.17: icmp_seq=13 ttl=63 time=0.496 ms
        64 bytes from 172.25.228.17: icmp_seq=14 ttl=63 time=0.465 ms
        64 bytes from 172.25.228.17: icmp_seq=15 ttl=63 time=0.433 ms
        Request timeout for icmp_seq 16
        64 bytes from 172.25.228.17: icmp_seq=17 ttl=63 time=0.590 ms
        64 bytes from 172.25.228.17: icmp_seq=18 ttl=63 time=0.449 ms
        64 bytes from 172.25.228.17: icmp_seq=19 ttl=63 time=0.427 ms
        64 bytes from 172.25.228.17: icmp_seq=20 ttl=63 time=0.440 ms
        64 bytes from 172.25.228.17: icmp_seq=21 ttl=63 time=0.473 ms
        64 bytes from 172.25.228.17: icmp_seq=22 ttl=63 time=0.403 ms
        ^C
        –- 172.25.228.17 ping statistics ---
        23 packets transmitted, 20 packets received, 13.0% packet loss
        round-trip min/avg/max/stddev = 0.376/0.511/0.742/0.105 ms

        Chattanooga, Tennessee, USA
        A comprehensive network diagram is worth 10,000 words and 15 conference calls.
        DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
        Do Not Chat For Help! NO_WAN_EGRESS(TM)

        1 Reply Last reply Reply Quote 0
        • R
          Ranch
          last edited by

          @Derelict:

          Are those typos on the IP addresses on em0_vlan31 and em0_vlan41?

          Yes, they were, I corrected them in my original post now.

          @Derelict:

          What happens when you just enter CARP maintenance mode on the primary in Status > CARP instead of yanking cables?

          It never misses a beat, same thing if I restart the Primary

          @Derelict:

          I would packet capture on WAN on the secondary during a failover test and recovery and see what's really going on.

          I will have to wait until this evening to do this on our live network.

          Any other thoughts or suggestions to look at while I testing further?

          1 Reply Last reply Reply Quote 0
          • R
            Ranch
            last edited by

            Sent PM with CAP

            1 Reply Last reply Reply Quote 0
            • DerelictD
              Derelict LAYER 8 Netgate
              last edited by

              What version are you running that it isn't decoding CARP as CARP  but VRRP in the packet capture?

              Please do this:

              On the SECONDARY node in BACKUP status, start a packet capture. Using Protocol, select CARP. Set the Count to something like 10000 so it doesn't stop.

              Note the time to the second if possible and unplug WAN on the primary.

              Let it fail over completely.

              Note the time to the second and reconnect WAN on Primary.

              You should see:

              CARP from the primary

              CARP from the primary should stop immediately after you unplug.

              CARP from the secondary should start nearly immediately after that.

              CARP from the primary should start nearly immediately after you plug it back in.

              CARP from the secondary should stop immediately after that.

              Problems like this are almost always something with the switch.

              Also include the system log on both nodes for the same time period.

              Happy to look at the details in a PM. The one you provided didn't really show anything other than CARP from the secondary for 10 seconds.

              It should look something like this:

              07:06:58.037963 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
              07:06:59.039006 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
              07:07:00.040021 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
              07:07:01.040995 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
              07:07:02.041957 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
              07:07:03.043097 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=17557958983656044908
              07:07:06.434500 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
              07:07:07.826450 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
              07:07:09.218469 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
              07:07:10.610462 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
              07:07:12.002465 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
              07:07:13.394449 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
              07:07:14.786470 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
              07:07:16.178465 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
              07:07:17.570444 IP 172.25.228.19 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=100 authlen=7 counter=15385611900255070985
              07:07:17.571449 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614
              07:07:18.571896 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614
              07:07:19.572983 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614
              07:07:20.573897 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614
              07:07:21.575048 IP 172.25.228.18 > 224.0.0.18: CARPv2-advertise 36: vhid=1 advbase=1 advskew=0 authlen=7 counter=7980853803627081614

              Chattanooga, Tennessee, USA
              A comprehensive network diagram is worth 10,000 words and 15 conference calls.
              DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
              Do Not Chat For Help! NO_WAN_EGRESS(TM)

              1 Reply Last reply Reply Quote 0
              • R
                Ranch
                last edited by

                @Derelict:

                What version are you running that it isn't decoding CARP as CARP  but VRRP in the packet capture?

                2.3.2-RELEASE-p1

                I will do the test steps today during lunch and let you know.  What is the best way to get you the log files?

                1 Reply Last reply Reply Quote 0
                • R
                  Ranch
                  last edited by

                  Pulled the WAN from Node1 for 30 seconds and interestingly enough it failed over in about 20 seconds which is WAY less time than usual.

                  Things I noted from the test Node 1 System log nor the CAP seems to even register the CARP event.

                  Attached is the log files.

                  CAPS_LOGS.txt

                  1 Reply Last reply Reply Quote 0
                  • DerelictD
                    Derelict LAYER 8 Netgate
                    last edited by

                    The logs from the two nodes aren't from the same time period.

                    Chattanooga, Tennessee, USA
                    A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                    DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                    Do Not Chat For Help! NO_WAN_EGRESS(TM)

                    1 Reply Last reply Reply Quote 0
                    • R
                      Ranch
                      last edited by

                      Sorry didn't catch that Node1 was wrong it is listed below now for 11:14:59-11:22:09

                      The others were right, right?
                      Node2 Log 11:20:39-11:21:28
                      Node1 CAP 11:20:28-11:21:54
                      Node2 CAP 11:20:26-11:21:52

                      Nov 10 11:22:09 check_reload_status updating dyndns WANGW
                      Nov 10 11:21:41 php-fpm 26622 /rc.carpmaster: HA cluster member "(10.41.41.1@em0_vlan41): (CU_STUDENT_VLAN41)" has resumed CARP state "MASTER" for vhid 13
                      Nov 10 11:21:41 php-fpm 26622 /rc.carpmaster: Message sent to jwilhite@raschools.com OK
                      Nov 10 11:21:39 php-fpm 18153 /rc.carpmaster: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "MASTER" for vhid 10
                      Nov 10 11:21:39 php-fpm 18153 /rc.carpmaster: Message sent to jwilhite@raschools.com OK
                      Nov 10 11:21:39 php-fpm 26987 /rc.carpmaster: HA cluster member "(10.41.31.1@em0_vlan31): (CU_STAFF_VLAN31)" has resumed CARP state "MASTER" for vhid 12
                      Nov 10 11:21:39 php-fpm 26987 /rc.carpmaster: Message sent to jwilhite@raschools.com OK
                      Nov 10 11:21:37 php-fpm 24092 /rc.carpmaster: HA cluster member "(10.41.43.1@em0): (LAN)" has resumed CARP state "MASTER" for vhid 11
                      Nov 10 11:21:37 php-fpm 24092 /rc.carpmaster: Message sent to jwilhite@raschools.com OK
                      Nov 10 11:21:35 xinetd 17858 Reconfigured: new=0 old=23 dropped=0 (services)
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19010-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19010-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19009-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19009-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19008-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19008-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19007-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19007-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19006-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19006-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19005-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19005-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19004-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19004-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19003-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19003-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19002-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19002-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19001-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19001-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19000-udp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 19000-tcp
                      Nov 10 11:21:35 xinetd 17858 readjusting service 6969-udp
                      Nov 10 11:21:35 xinetd 17858 Swapping defaults
                      Nov 10 11:21:35 xinetd 17858 Starting reconfiguration
                      Nov 10 11:21:35 check_reload_status Reloading filter
                      Nov 10 11:21:35 php-fpm 18153 /rc.newwanip: Configuring CARP settings finalize…
                      Nov 10 11:21:35 php-fpm 18153 /rc.newwanip: pfsync done in 0 seconds.
                      Nov 10 11:21:35 php-fpm 18153 /rc.newwanip: waiting for pfsync...
                      Nov 10 11:21:34 php-fpm 18153 /rc.newwanip: rc.newwanip: on (IP address: 67.239.73.182) (interface: WAN[wan]) (real interface: em1).
                      Nov 10 11:21:34 php-fpm 18153 /rc.newwanip: rc.newwanip: Info: starting on em1.
                      Nov 10 11:21:34 php-fpm 18153 /rc.carpbackup: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "BACKUP" for vhid 10
                      Nov 10 11:21:34 php-fpm 18153 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                      Nov 10 11:21:29 kernel ifa_add_loopback_route: insertion failed: 17
                      Nov 10 11:21:29 kernel carp: VHID 10@em1: BACKUP -> MASTER (master down)
                      Nov 10 11:21:29 check_reload_status Carp master event
                      Nov 10 11:21:28 xinetd 17858 Reconfigured: new=0 old=23 dropped=0 (services)
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19010-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19010-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19009-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19009-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19008-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19008-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19007-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19007-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19006-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19006-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19005-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19005-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19004-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19004-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19003-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19003-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19002-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19002-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19001-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19001-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19000-udp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 19000-tcp
                      Nov 10 11:21:27 xinetd 17858 readjusting service 6969-udp
                      Nov 10 11:21:27 xinetd 17858 Swapping defaults
                      Nov 10 11:21:27 xinetd 17858 Starting reconfiguration
                      Nov 10 11:21:27 check_reload_status Reloading filter
                      Nov 10 11:21:27 check_reload_status rc.newwanip starting em1
                      Nov 10 11:21:27 php-fpm 24092 /rc.linkup: Hotplug event detected for WAN(wan) static IP (67.239.73.182 )
                      Nov 10 11:21:27 check_reload_status Carp master event
                      Nov 10 11:21:27 check_reload_status Carp master event
                      Nov 10 11:21:27 check_reload_status Carp master event
                      Nov 10 11:21:27 kernel carp: VHID 12@em0_vlan31: BACKUP -> MASTER (preempting a slower master)
                      Nov 10 11:21:27 kernel carp: VHID 13@em0_vlan41: BACKUP -> MASTER (preempting a slower master)
                      Nov 10 11:21:27 kernel carp: VHID 11@em0: BACKUP -> MASTER (preempting a slower master)
                      Nov 10 11:21:26 check_reload_status Reloading filter
                      Nov 10 11:21:26 check_reload_status Restarting OpenVPN tunnels/interfaces
                      Nov 10 11:21:26 check_reload_status Restarting ipsec tunnels
                      Nov 10 11:21:26 check_reload_status updating dyndns WANGW
                      Nov 10 11:21:26 check_reload_status Linkup starting em1
                      Nov 10 11:21:26 kernel em1: link state changed to UP
                      Nov 10 11:21:26 kernel carp: demoted by -240 to 0 (interface up)
                      Nov 10 11:21:26 kernel carp: VHID 10@em1: INIT -> BACKUP
                      Nov 10 11:21:26 check_reload_status Carp backup event
                      Nov 10 11:21:10 php-fpm 17632 /rc.carpbackup: HA cluster member "(10.41.43.1@em0): (LAN)" has resumed CARP state "BACKUP" for vhid 11
                      Nov 10 11:21:10 php-fpm 18153 /rc.carpbackup: HA cluster member "(10.41.31.1@em0_vlan31): (CU_STAFF_VLAN31)" has resumed CARP state "BACKUP" for vhid 12
                      Nov 10 11:21:10 php-fpm 17941 /rc.carpbackup: HA cluster member "(10.41.41.1@em0_vlan41): (CU_STUDENT_VLAN41)" has resumed CARP state "BACKUP" for vhid 13
                      Nov 10 11:21:10 php-fpm 17632 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                      Nov 10 11:21:10 php-fpm 18153 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                      Nov 10 11:21:10 php-fpm 17941 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                      Nov 10 11:21:07 xinetd 17858 Reconfigured: new=0 old=23 dropped=0 (services)
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19010-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19010-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19009-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19009-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19008-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19008-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19007-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19007-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19006-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19006-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19005-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19005-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19004-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19004-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19003-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19003-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19002-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19002-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19001-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19001-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19000-udp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 19000-tcp
                      Nov 10 11:21:07 xinetd 17858 readjusting service 6969-udp
                      Nov 10 11:21:07 xinetd 17858 Swapping defaults
                      Nov 10 11:21:07 xinetd 17858 Starting reconfiguration
                      Nov 10 11:21:07 php-fpm 15607 /rc.carpbackup: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "BACKUP" for vhid 10
                      Nov 10 11:21:07 php-fpm 15607 /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                      Nov 10 11:21:00 check_reload_status Reloading filter
                      Nov 10 11:21:00 php-fpm 17632 /rc.linkup: Hotplug event detected for WAN(wan) static IP (67.239.73.182 )
                      Nov 10 11:20:59 check_reload_status Carp backup event
                      Nov 10 11:20:59 check_reload_status Carp backup event
                      Nov 10 11:20:59 check_reload_status Carp backup event
                      Nov 10 11:20:59 check_reload_status Linkup starting em1
                      Nov 10 11:20:59 check_reload_status Carp backup event
                      Nov 10 11:20:59 kernel carp: VHID 12@em0_vlan31: MASTER -> BACKUP (more frequent advertisement received)
                      Nov 10 11:20:59 kernel carp: VHID 13@em0_vlan41: MASTER -> BACKUP (more frequent advertisement received)
                      Nov 10 11:20:59 kernel carp: VHID 11@em0: MASTER -> BACKUP (more frequent advertisement received)
                      Nov 10 11:20:59 kernel em1: link state changed to DOWN
                      Nov 10 11:20:59 kernel carp: demoted by 240 to 240 (interface down)
                      Nov 10 11:20:59 kernel em1: RX Next to Refresh = 348
                      Nov 10 11:20:59 kernel em1: RX Next to Check = 349
                      Nov 10 11:20:59 kernel em1: RX discarded packets = 0
                      Nov 10 11:20:59 kernel em1: hw rdh = 349, hw rdt = 348
                      Nov 10 11:20:59 kernel em1: RX Queue 0 –----
                      Nov 10 11:20:59 kernel em1: Tx Descriptors avail failure = 2320
                      Nov 10 11:20:59 kernel em1: TX descriptors avail = 40
                      Nov 10 11:20:59 kernel em1: Tx Queue Status = -2147483648
                      Nov 10 11:20:59 kernel em1: hw tdh = 30, hw tdt = 1014
                      Nov 10 11:20:59 kernel em1: TX Queue 0 ------
                      Nov 10 11:20:59 kernel Interface is RUNNING and ACTIVE
                      Nov 10 11:20:59 kernel em1: Watchdog timeout Queue[0]– resetting
                      Nov 10 11:20:52 xinetd 17858 Reconfigured: new=0 old=23 dropped=0 (services)
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19010-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19010-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19009-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19009-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19008-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19008-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19007-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19007-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19006-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19006-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19005-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19005-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19004-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19004-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19003-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19003-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19002-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19002-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19001-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19001-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19000-udp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 19000-tcp
                      Nov 10 11:20:52 xinetd 17858 readjusting service 6969-udp
                      Nov 10 11:20:52 xinetd 17858 Swapping defaults
                      Nov 10 11:20:52 xinetd 17858 Starting reconfiguration
                      Nov 10 11:20:51 check_reload_status Reloading filter
                      Nov 10 11:20:51 check_reload_status Restarting OpenVPN tunnels/interfaces
                      Nov 10 11:20:51 check_reload_status Restarting ipsec tunnels
                      Nov 10 11:20:51 check_reload_status updating dyndns WANGW
                      Nov 10 11:14:59 php-fpm 63346 /widgets/widgets/ipsec.widget.php: Unable to find IPsec daemon leases file. Could not display mobile user stats!

                      1 Reply Last reply Reply Quote 0
                      • DerelictD
                        Derelict LAYER 8 Netgate
                        last edited by

                        You sure you don't need something like spanning-tree portfast on your switch ports?

                        This stuff is - seriously - almost always something with the switch.

                        Chattanooga, Tennessee, USA
                        A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                        DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                        Do Not Chat For Help! NO_WAN_EGRESS(TM)

                        1 Reply Last reply Reply Quote 0
                        • DerelictD
                          Derelict LAYER 8 Netgate
                          last edited by

                          Looks like it's taking quite some time for the system to detect the cable unplug event.

                          The secondary loses CARP from the primary an 11:20:39

                          Then 20 seconds later you get this on the primary:

                          Nov 10 11:20:59  kernel      em1: Watchdog timeout Queue[0]– resetting

                          At 11:21:26 the link comes back up. Failing back to master is immediate.

                          This is also strange:

                          Nov 10 11:21:34  php-fpm  18153  /rc.carpbackup: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "BACKUP" for vhid 10
                          Nov 10 11:21:34  php-fpm  18153  /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                          Nov 10 11:21:29  kernel      ifa_add_loopback_route: insertion failed: 17
                          Nov 10 11:21:29  kernel      carp: VHID 10@em1: BACKUP -> MASTER (master down)

                          Probably want to capture CARP on primary to see what you see there.

                          Chattanooga, Tennessee, USA
                          A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                          DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                          Do Not Chat For Help! NO_WAN_EGRESS(TM)

                          1 Reply Last reply Reply Quote 0
                          • dotdashD
                            dotdash
                            last edited by

                            Do you have a lot of people unplugging your WAN cables? I've never had a cable failure on a CARP node.
                            I've seen- power supply failure, hard disk failure, OS crash… I know you want to test things, but I don't think your test is very likely to mirror an actual failure. Try putting the master in maintenance mode, or powering it off. If that works it should work in a real failure. If you really want to dig into the cable thing, watch the CARP status on each node when you unplug the WAN, preemption should trigger the rest of the CARP VIPs to go into backup after the WAN goes.

                            1 Reply Last reply Reply Quote 0
                            • R
                              Ranch
                              last edited by

                              @Derelict:

                              Looks like it's taking quite some time for the system to detect the cable unplug event.

                              The secondary loses CARP from the primary an 11:20:39

                              Then 20 seconds later you get this on the primary:

                              Nov 10 11:20:59  kernel      em1: Watchdog timeout Queue[0]– resetting

                              At 11:21:26 the link comes back up. Failing back to master is immediate.

                              This is also strange:

                              Nov 10 11:21:34  php-fpm  18153  /rc.carpbackup: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "BACKUP" for vhid 10
                              Nov 10 11:21:34  php-fpm  18153  /rc.carpbackup: Message sent to jwilhite@raschools.com OK
                              Nov 10 11:21:29  kernel      ifa_add_loopback_route: insertion failed: 17
                              Nov 10 11:21:29  kernel      carp: VHID 10@em1: BACKUP -> MASTER (master down)

                              Probably want to capture CARP on primary to see what you see there.

                              I uploaded the CAP for CARP from the Primary and Secondary WAN, are you needing it from another PORT?

                              Thanks for noting the strange delay, is there a way to reduce the watchdog time?  I am pretty sure my config is pretty standard, I haven't done any tweaking to this install.

                              @dotdash:

                              Do you have a lot of people unplugging your WAN cables? I've never had a cable failure on a CARP node.
                              I've seen- power supply failure, hard disk failure, OS crash… I know you want to test things, but I don't think your test is very likely to mirror an actual failure. Try putting the master in maintenance mode, or powering it off. If that works it should work in a real failure. If you really want to dig into the cable thing, watch the CARP status on each node when you unplug the WAN, preemption should trigger the rest of the CARP VIPs to go into backup after the WAN goes.

                              Although I have had a unplug problem ONCE by accident that is not the norm.  My assumption was that from the perspective of the nodes I was testing a single Switch/Port Failure.

                              I will try a power failure of Node1 after hours and then the switch power failure Node1 is on.

                              1 Reply Last reply Reply Quote 0
                              • DerelictD
                                Derelict LAYER 8 Netgate
                                last edited by

                                He has already stated that everything works fine going into and out of maintenance mode.

                                I was going to also say that this is a very odd failure test. With that hardware that might be the best that it will do. Unplugging a cable should not result in a watchdog timeout. It should result in a link down. Not sure why that is happening.

                                Chattanooga, Tennessee, USA
                                A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                1 Reply Last reply Reply Quote 0
                                • R
                                  Ranch
                                  last edited by

                                  Couldn't do a switch fail too many people left on the network.  However the power fail for the Primary Node1 worked flawlessly, never missed a beat on pings.  I will have to test this weekend or next week for the switch fail test.

                                  Probably a dumb question, why does pulling the cord not simulate the same failure?

                                  Below is the CAP and Log from Node2 during the Node1 fail test.

                                  Node2
                                  15:50:02.942083 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:02.942125 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:04.333891 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:04.333904 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:05.725931 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:05.725943 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:07.117895 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:07.117909 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:08.509933 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:08.509945 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:09.901886 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:09.901898 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:11.293933 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:11.293945 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:12.696925 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:12.696937 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:14.095942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:14.095955 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:15.487932 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:15.487944 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:16.894938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:16.894950 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:18.286944 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:18.286956 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:19.678938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:19.678950 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:21.070930 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:21.070941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:22.481941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:22.481953 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:23.886938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:23.886955 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:25.287937 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:25.287948 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:26.690950 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:26.690969 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:28.095943 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:28.095960 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:29.493929 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:29.493941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:30.898938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:30.898951 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:32.304932 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:32.304945 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:33.703947 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:33.703961 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:35.101930 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:35.101942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:36.506943 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:36.506955 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:37.913927 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:37.913939 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:39.318943 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:39.318955 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:40.717936 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:40.717954 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:42.109945 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:42.109957 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:43.528942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:43.528960 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:44.926941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:44.926953 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:46.325926 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:46.325938 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:47.717949 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:47.717966 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:49.109942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:49.109959 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:50.507937 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:50.507949 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:51.913932 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:51.913944 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:53.317941 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866
                                  15:50:53.317953 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:54.717942 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=9 advbase=1 advskew=100 authlen=7 counter=10048284322299381040
                                  15:50:54.717959 IP 67.239.73.181 > 224.0.0.18: CARPv2-advertise 36: vhid=10 advbase=1 advskew=100 authlen=7 counter=9060617201846148866

                                  Nov 10 15:51:30 check_reload_status Carp backup event
                                  Nov 10 15:51:30 kernel carp: VHID 9@em1: MASTER -> BACKUP (more frequent advertisement received)
                                  Nov 10 15:51:30 kernel carp: VHID 10@em1: MASTER -> BACKUP (more frequent advertisement received)
                                  Nov 10 15:51:30 check_reload_status Carp backup event
                                  Nov 10 15:51:26 check_reload_status Carp backup event
                                  Nov 10 15:51:26 check_reload_status Carp backup event
                                  Nov 10 15:51:26 kernel carp: VHID 12@em0_vlan31: MASTER -> BACKUP (more frequent advertisement received)
                                  Nov 10 15:51:26 kernel carp: VHID 13@em0_vlan41: MASTER -> BACKUP (more frequent advertisement received)
                                  Nov 10 15:51:26 kernel carp: VHID 11@em0: MASTER -> BACKUP (more frequent advertisement received)
                                  Nov 10 15:51:26 check_reload_status Carp backup event
                                  Nov 10 15:50:10 php-fpm 30196 /rc.carpmaster: HA cluster member "(67.239.73.180@em1): (WAN)" has resumed CARP state "MASTER" for vhid 10
                                  Nov 10 15:50:10 php-fpm 30196 /rc.carpmaster: Message sent to admin@raschools.com OK
                                  Nov 10 15:50:10 php-fpm 91222 /rc.carpmaster: HA cluster member "(67.239.73.178@em1): (WAN)" has resumed CARP state "MASTER" for vhid 9
                                  Nov 10 15:50:10 php-fpm 91222 /rc.carpmaster: Message sent to admin@raschools.com OK
                                  Nov 10 15:50:10 php-fpm 29856 /rc.carpmaster: HA cluster member "(10.41.31.1@em0_vlan31): (VLAN31_STAFF)" has resumed CARP state "MASTER" for vhid 12
                                  Nov 10 15:50:10 php-fpm 29546 /rc.carpmaster: HA cluster member "(10.41.41.1@em0_vlan41): (VLAN41_CHROMEBOOKS)" has resumed CARP state "MASTER" for vhid 13
                                  Nov 10 15:50:10 php-fpm 29856 /rc.carpmaster: Message sent to admin@raschools.com OK
                                  Nov 10 15:50:10 php-fpm 29546 /rc.carpmaster: Message sent to admin@raschools.com OK
                                  Nov 10 15:50:07 php-fpm 91222 /rc.carpmaster: HA cluster member "(10.41.43.1@em0): (LAN)" has resumed CARP state "MASTER" for vhid 11
                                  Nov 10 15:50:07 php-fpm 91222 /rc.carpmaster: Message sent to admin@raschools.com OK
                                  Nov 10 15:50:02 check_reload_status Carp master event
                                  Nov 10 15:50:02 check_reload_status Carp master event
                                  Nov 10 15:50:02 check_reload_status Carp master event
                                  Nov 10 15:50:02 check_reload_status Carp master event
                                  Nov 10 15:50:02 kernel carp: VHID 9@em1: BACKUP -> MASTER (master down)
                                  Nov 10 15:50:02 kernel carp: VHID 10@em1: BACKUP -> MASTER (master down)
                                  Nov 10 15:50:02 kernel carp: VHID 12@em0_vlan31: BACKUP -> MASTER (master down)
                                  Nov 10 15:50:02 kernel carp: VHID 13@em0_vlan41: BACKUP -> MASTER (master down)
                                  Nov 10 15:50:02 kernel carp: VHID 11@em0: BACKUP -> MASTER (master down)
                                  Nov 10 15:50:02 check_reload_status Carp master event

                                  1 Reply Last reply Reply Quote 0
                                  • L
                                    lexaz77757
                                    last edited by

                                    Hello, I have the same problem with my Pfsense CARP.
                                    When I pull out the WAN cable, it takes very much time, before PfSense detects the cable failure.
                                    Are there any ways to solve the problem?
                                    Thanks

                                    1 Reply Last reply Reply Quote 0
                                    • R
                                      Ranch
                                      last edited by

                                      If it was the exact same issue I had then the real problem ended up being my testing method.  Although I never understood why yanking a cable isn't a valid test.  When I simply unplug POWER from the primary router the failover is nearly instantaneous.  I never posted a follow up on simulating a switch failure but that also failed over as expected.

                                      On another note, I have had to enter maintenance mode 5 times recently so that I could change out some other equipment inline with these.  Three out of five the switchover was nearly 30seconds and the WebInterface on both routers locks up completely which switching over.  Looking at top via ssh during the switchover doesn't show anything locked up nor maxing out RAM/CPU so not sure what the problem was there if it is related at all.

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.