Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Master stops sending CARP advertisements after HTTP-based load test

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    2 Posts 1 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E
      echoranger
      last edited by

      pfSense  2.0.3-RELEASE (i386)
      built on Fri Apr 12 10:22:57 EDT 2013
      FreeBSD 8.1-RELEASE-p13

      We have 2 pfSense boxes (both Dell 1750s) configured in an active/passive scenario with CARP used on all interfaces (except the pfsync interface). Each box has 3 physical interfaces:

      em0: pfsync
      bge0: WAN, 100Mb link connected to a fiber converter
      bge1: VLAN trunk with 3 VLANs attached, 1, 10 and 20

      Ran into a situation today where it appears that the primary has stopped sending CARP advertisements on all interfaces. This happened about the same time we started a load test via BrowserMob to one of our HTTP servers behind a CARP-based 1:1 NAT on the WAN interface. At this time the load test has long since finished but our secondary pfSense box (pfsense02) is still showing as "MASTER" on all CARP interfaces while the primary (pfsense01) is showing as "BACKUP". I've verified the advskew is still correctly set on both boxes (set to 0 on primary, 100 on secondary). While doing a tcpdump on both pfSense boxes across all networks I now only see the secondary sending CARP advertisements:

      (pfsense01 is 192.168.1.11, pfsense02 is 192.168.1.12):

      [2.0.3-RELEASE][admin@pfsense01]/root(23): tcpdump -nlibge1_vlan1 vrrp
      tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
      listening on bge1_vlan1, link-type EN10MB (Ethernet), capture size 96 bytes
      17:23:46.571651 IP 192.168.1.12 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 100, authtype none, intvl 1s, length 36
      17:23:47.963499 IP 192.168.1.12 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 100, authtype none, intvl 1s, length 36
      17:23:49.355248 IP 192.168.1.12 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 100, authtype none, intvl 1s, length 36
      
      [2.0.3-RELEASE][admin@pfsense02]/root(6): tcpdump -nlibge1_vlan1 vrrp
      tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
      listening on bge1_vlan1, link-type EN10MB (Ethernet), capture size 96 bytes
      17:24:00.358484 IP 192.168.1.12 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 100, authtype none, intvl 1s, length 36
      17:24:01.750057 IP 192.168.1.12 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 100, authtype none, intvl 1s, length 36
      17:24:03.141663 IP 192.168.1.12 > 224.0.0.18: VRRPv2, Advertisement, vrid 2, prio 100, authtype none, intvl 1s, length 36
      
      

      Any idea why this may be happening and what I can do besides a reboot of the master to get it to advertise again? We didn't observe any network outage during the CARP failover (which is good) but I'm now concerned if I lose the backup we'll lose connectivity. Ideally I'd like to stop this from happening at all if we undergo another load-test.

      1 Reply Last reply Reply Quote 0
      • E
        echoranger
        last edited by

        Doing some more digging it appears a watchdog timeout may have been the initial culprit:

        pfsense01:/var/log/system.log:

        Jun 10 13:28:05 pfsense01 kernel: vip36: MASTER -> BACKUP (more frequent advertisement received)
        Jun 10 13:28:05 pfsense01 kernel: vip36: link state changed to DOWN
        Jun 10 13:28:05 pfsense01 kernel: vip4: MASTER -> BACKUP (more frequent advertisement received)
        Jun 10 13:28:05 pfsense01 kernel: vip4: link state changed to DOWN
        Jun 10 13:28:05 pfsense01 kernel: vip3: MASTER -> BACKUP (more frequent advertisement received)
        Jun 10 13:28:05 pfsense01 kernel: vip3: link state changed to DOWN
        Jun 10 13:28:05 pfsense01 kernel: vip11: MASTER -> BACKUP (more frequent advertisement received)
        Jun 10 13:28:05 pfsense01 kernel: vip11: link state changed to DOWN
        Jun 10 13:28:05 pfsense01 kernel: vip2: MASTER -> BACKUP (more frequent advertisement received)
        Jun 10 13:28:05 pfsense01 kernel: vip2: link state changed to DOWN
        Jun 10 13:28:08 pfsense01 kernel: bge0: watchdog timeout -- resetting
        Jun 10 13:28:08 pfsense01 check_reload_status: Linkup starting bge0
        Jun 10 13:28:08 pfsense01 kernel: vip1: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip5: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip6: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip7: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip8: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip9: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip12: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip14: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip15: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip16: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip17: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip18: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip20: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip21: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip22: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip23: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip24: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip25: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip26: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip27: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip28: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip29: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip31: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip32: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip33: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip34: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip35: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip37: INIT -> BACKUP
        Jun 10 13:28:08 pfsense01 kernel: vip38: INIT -> BACKUP
        
        

        Any ideas? I know watchdog timeouts are about as nebulous as ever on FreeBSD but I'm hoping there's a sysctl or something that I can tweak to help prevent this. I've already verified hw.bge.allow_asf is set to 0:

        [2.0.3-RELEASE][admin@pfsense01]/var/log(38): sysctl hw.bge
        hw.bge.allow_asf: 0
        [2.0.3-RELEASE][admin@pfsense01]/var/log(35): sysctl -a | grep bge
        <5>bge1: link state changed to UP
        <5>bge1_vlan1: link state changed to UP
        <5>bge1_vlan100: link state changed to UP
        <5>bge1_vlan20: link state changed to UP
        <5>bge1_vlan10: link state changed to UP
        bge0: watchdog timeout -- resetting
        bge0: 2 link states coalesced
        <5>bge0: link state changed to UP
        bge1: watchdog timeout -- resetting
        <5>bge1: link state changed to DOWN
        <5>bge1_vlan1: link state changed to DOWN
        <5>bge1_vlan100: link state changed to DOWN
        <5>bge1_vlan20: link state changed to DOWN
        <5>bge1_vlan10: link state changed to DOWN
        <5>bge1: link state changed to UP
        <5>bge1_vlan1: link state changed to UP
        <5>bge1_vlan100: link state changed to UP
        <5>bge1_vlan20: link state changed to UP
        <5>bge1_vlan10: link state changed to UP
        bge0: watchdog timeout -- resetting
        bge0: 2 link states coalesced
        <5>bge0: link state changed to UP
        bge1: watchdog timeout -- resetting
        <5>bge1: link state changed to DOWN
        <5>bge1_vlan1: link state changed to DOWN
        <5>bge1_vlan100: link state changed to DOWN
        <5>bge1_vlan20: link state changed to DOWN
        <5>bge1_vlan10: link state changed to DOWN
        <5>bge1: link state changed to UP
        <5>bge1_vlan1: link state changed to UP
        <5>bge1_vlan100: link state changed to UP
        <5>bge1_vlan20: link state changed to UP
        <5>bge1_vlan10: link state changed to UP
        bge1: watchdog timeout -- resetting
        <5>bge1: link state changed to DOWN
        <5>bge1_vlan1: link state changed to DOWN
        <5>bge1_vlan100: link state changed to DOWN
        <5>bge1_vlan20: link state changed to DOWN
        <5>bge1_vlan10: link state changed to DOWN
        <5>bge1: link state changed to UP
        <5>bge1_vlan1: link state changed to UP
        <5>bge1_vlan100: link state changed to UP
        <5>bge1_vlan20: link state changed to UP
        <5>bge1_vlan10: link state changed to UP
        bge1: watchdog timeout -- resetting
        <5>bge1: link state changed to DOWN
        <5>bge1_vlan1: link state changed to DOWN
        <5>bge1_vlan100: link state changed to DOWN
        <5>bge1_vlan20: link state changed to DOWN
        <5>bge1_vlan10: link state changed to DOWN
        <5>bge1: link state changed to UP
        <5>bge1_vlan1: link state changed to UP
        <5>bge1_vlan100: link state changed to UP
        <5>bge1_vlan20: link state changed to UP
        <5>bge1_vlan10: link state changed to UP
        bge1: watchdog timeout -- resetting
        <5>bge1: link state changed to DOWN
        <5>bge1_vlan1: link state changed to DOWN
        <5>bge1_vlan100: link state changed to DOWN
        <5>bge1_vlan20: link state changed to DOWN
        <5>bge1_vlan10: link state changed to DOWN
        <5>bge1: link state changed to UP
        <5>bge1_vlan1: link state changed to UP
        <5>bge1_vlan100: link state changed to UP
        <5>bge1_vlan20: link state changed to UP
        <5>bge1_vlan10: link state changed to UP
        <6>arp: 192.168.1.64 moved from 1c:e6:2b:5f:e2:dd to 20:16:d8:5d:c2:34 on bge1_vlan1
        <6>arp: 192.168.1.64 moved from 20:16:d8:5d:c2:34 to 1c:e6:2b:5f:e2:dd on bge1_vlan1
        <6>arp: 192.168.1.38 moved from 74:2f:68:e9:0b:ba to 54:04:a6:4a:38:b5 on bge1_vlan1
        <6>arp: 192.168.1.78 moved from 8c:70:5a:7b:39:1c to c4:85:08:1b:3b:a4 on bge1_vlan1
        <6>arp: 192.168.1.78 moved from c4:85:08:1b:3b:a4 to 8c:70:5a:7b:39:1c on bge1_vlan1
        bge1: watchdog timeout -- resetting
        <5>bge1: link state changed to DOWN
        <5>bge1_vlan1: link state changed to DOWN
        <5>bge1_vlan100: link state changed to DOWN
        <5>bge1_vlan20: link state changed to DOWN
        <5>bge1_vlan10: link state changed to DOWN
        <5>bge1: link state changed to UP
        <5>bge1_vlan1: link state changed to UP
        <5>bge1_vlan100: link state changed to UP
        <5>bge1_vlan20: link state changed to UP
        <5>bge1_vlan10: link state changed to UP
        bge1: watchdog timeout -- resetting
        <5>bge1: link state changed to DOWN
        <5>bge1_vlan1: link state changed to DOWN
        <5>bge1_vlan100: link state changed to DOWN
        <5>bge1_vlan20: link state changed to DOWN
        <5>bge1_vlan10: link state changed to DOWN
        <5>bge1: link state changed to UP
        <5>bge1_vlan1: link state changed to UP
        <5>bge1_vlan100: link state changed to UP
        <5>bge1_vlan20: link state changed to UP
        <5>bge1_vlan10: link state changed to UP
        bge0: watchdog timeout -- resetting
        bge0: 2 link states coalesced
        <5>bge0: link state changed to UP
        <6>arp: 192.168.1.78 moved from 8c:70:5a:7b:39:1c to c4:85:08:1b:3b:a4 on bge1_vlan1
        <6>arp: 192.168.1.78 moved from c4:85:08:1b:3b:a4 to 8c:70:5a:7b:39:1c on bge1_vlan1
        bge0: watchdog timeout -- resetting
        bge0: 2 link states coalesced
        <5>bge0: link state changed to UP
        hw.bge.allow_asf: 0
        dev.bge.0.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x002002
        dev.bge.0.%driver: bge
        dev.bge.0.%location: slot=0 function=0
        dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1648 subvendor=0x1028 subdevice=0x014a class=0x020000
        dev.bge.0.%parent: pci2
        dev.bge.0.forced_collapse: 0
        dev.bge.0.stats.FramesDroppedDueToFilters: 0
        dev.bge.0.stats.DmaWriteQueueFull: 0
        dev.bge.0.stats.DmaWriteHighPriQueueFull: 0
        dev.bge.0.stats.NoMoreRxBDs: 0
        dev.bge.0.stats.InputDiscards: 0
        dev.bge.0.stats.InputErrors: 1
        dev.bge.0.stats.RecvThresholdHit: 325594
        dev.bge.0.stats.DmaReadQueueFull: 505
        dev.bge.0.stats.DmaReadHighPriQueueFull: 0
        dev.bge.0.stats.SendDataCompQueueFull: 428
        dev.bge.0.stats.RingSetSendProdIndex: 250750
        dev.bge.0.stats.RingStatusUpdate: 467063
        dev.bge.0.stats.Interrupts: 467063
        dev.bge.0.stats.AvoidedInterrupts: 0
        dev.bge.0.stats.SendThresholdHit: 0
        dev.bge.0.stats.rx.Octets: 77286738
        dev.bge.0.stats.rx.Fragments: 0
        dev.bge.0.stats.rx.UcastPkts: 250315
        dev.bge.0.stats.rx.MulticastPkts: 324548
        dev.bge.0.stats.rx.FCSErrors: 1
        dev.bge.0.stats.rx.AlignmentErrors: 0
        dev.bge.0.stats.rx.xonPauseFramesReceived: 0
        dev.bge.0.stats.rx.xoffPauseFramesReceived: 0
        dev.bge.0.stats.rx.ControlFramesReceived: 0
        dev.bge.0.stats.rx.xoffStateEntered: 0
        dev.bge.0.stats.rx.FramesTooLong: 0
        dev.bge.0.stats.rx.Jabbers: 0
        dev.bge.0.stats.rx.UndersizePkts: 0
        dev.bge.0.stats.rx.inRangeLengthError: 0
        dev.bge.0.stats.rx.outRangeLengthError: 0
        dev.bge.0.stats.tx.Octets: 21715826
        dev.bge.0.stats.tx.Collisions: 0
        dev.bge.0.stats.tx.XonSent: 0
        dev.bge.0.stats.tx.XoffSent: 0
        dev.bge.0.stats.tx.flowControlDone: 0
        dev.bge.0.stats.tx.InternalMacTransmitErrors: 0
        dev.bge.0.stats.tx.SingleCollisionFrames: 0
        dev.bge.0.stats.tx.MultipleCollisionFrames: 0
        dev.bge.0.stats.tx.DeferredTransmissions: 0
        dev.bge.0.stats.tx.ExcessiveCollisions: 0
        dev.bge.0.stats.tx.LateCollisions: 0
        dev.bge.0.stats.tx.UcastPkts: 251179
        dev.bge.0.stats.tx.MulticastPkts: 88
        dev.bge.0.stats.tx.BroadcastPkts: 14
        dev.bge.0.stats.tx.CarrierSenseErrors: 0
        dev.bge.0.stats.tx.Discards: 0
        dev.bge.0.stats.tx.Errors: 0
        dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x002002
        dev.bge.1.%driver: bge
        dev.bge.1.%location: slot=0 function=1
        dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1648 subvendor=0x1028 subdevice=0x014a class=0x020000
        dev.bge.1.%parent: pci2
        dev.bge.1.forced_collapse: 0
        dev.bge.1.stats.FramesDroppedDueToFilters: 0
        dev.bge.1.stats.DmaWriteQueueFull: 288263
        dev.bge.1.stats.DmaWriteHighPriQueueFull: 0
        dev.bge.1.stats.NoMoreRxBDs: 0
        dev.bge.1.stats.InputDiscards: 5229
        dev.bge.1.stats.InputErrors: 647
        dev.bge.1.stats.RecvThresholdHit: 494125434
        dev.bge.1.stats.DmaReadQueueFull: 37978
        dev.bge.1.stats.DmaReadHighPriQueueFull: 2
        dev.bge.1.stats.SendDataCompQueueFull: 21
        dev.bge.1.stats.RingSetSendProdIndex: 860426035
        dev.bge.1.stats.RingStatusUpdate: 783311166
        dev.bge.1.stats.Interrupts: 783311166
        dev.bge.1.stats.AvoidedInterrupts: 0
        dev.bge.1.stats.SendThresholdHit: 0
        dev.bge.1.stats.rx.Octets: 1583585081
        dev.bge.1.stats.rx.Fragments: 73
        dev.bge.1.stats.rx.UcastPkts: 786142776
        dev.bge.1.stats.rx.MulticastPkts: 395883
        dev.bge.1.stats.rx.FCSErrors: 577
        dev.bge.1.stats.rx.AlignmentErrors: 0
        dev.bge.1.stats.rx.xonPauseFramesReceived: 3438
        dev.bge.1.stats.rx.xoffPauseFramesReceived: 147597
        dev.bge.1.stats.rx.ControlFramesReceived: 0
        dev.bge.1.stats.rx.xoffStateEntered: 0
        dev.bge.1.stats.rx.FramesTooLong: 0
        dev.bge.1.stats.rx.Jabbers: 0
        dev.bge.1.stats.rx.UndersizePkts: 0
        dev.bge.1.stats.rx.inRangeLengthError: 0
        dev.bge.1.stats.rx.outRangeLengthError: 0
        dev.bge.1.stats.tx.Octets: 181809985
        dev.bge.1.stats.tx.Collisions: 0
        dev.bge.1.stats.tx.XonSent: 0
        dev.bge.1.stats.tx.XoffSent: 0
        dev.bge.1.stats.tx.flowControlDone: 0
        dev.bge.1.stats.tx.InternalMacTransmitErrors: 0
        dev.bge.1.stats.tx.SingleCollisionFrames: 0
        dev.bge.1.stats.tx.MultipleCollisionFrames: 0
        dev.bge.1.stats.tx.DeferredTransmissions: 0
        dev.bge.1.stats.tx.ExcessiveCollisions: 0
        dev.bge.1.stats.tx.LateCollisions: 0
        dev.bge.1.stats.tx.UcastPkts: 857839350
        dev.bge.1.stats.tx.MulticastPkts: 2032515
        dev.bge.1.stats.tx.BroadcastPkts: 21415
        dev.bge.1.stats.tx.CarrierSenseErrors: 0
        dev.bge.1.stats.tx.Discards: 0
        dev.bge.1.stats.tx.Errors: 0
        dev.miibus.0.%parent: bge0
        dev.miibus.1.%parent: bge1
        
        
        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.