Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Primary node is not willing to become master. Keeps falling back to backup state

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    2 Posts 2 Posters 1.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 2
      2wheels
      last edited by

      I have 2 PFSense 2.2.6 boxes with IP's on the LAN interface and a VIP defined with type CARP.
      The backup node (10.0.0.3) is working as expected… it works as backup.
      The primary node (10.0.0.2) however is not willing to become master of the VIP (10.0.0.1).

      In the system.log I find the following messages :

      Feb  5 15:49:12 pfsense-box1 kernel: carp: VHID 3@vmx1: BACKUP -> MASTER (master down)
      Feb  5 15:49:12 pfsense-box1 kernel: carp: VHID 3@vmx1: MASTER -> BACKUP (more frequent advertisement received)
      Feb  5 15:49:12 pfsense-box1 check_reload_status: Carp master event
      Feb  5 15:49:12 pfsense-box1 check_reload_status: Carp backup event
      Feb  5 15:49:13 pfsense-box1 php-fpm[81635]: /rc.carpmaster: Carp cluster member "10.0.0.1 -  (3@vmx1)" has resumed the state "MASTER" for vhid 3@vmx1
      Feb  5 15:49:13 pfsense-box1 php-fpm[81635]: /rc.carpbackup: Carp cluster member "10.0.0.1 -  (3@vmx1)" has resumed the state "BACKUP" for vhid 3@vmx1
      

      From what I understand from this logging, the node finds out the master is down and promotes itself to Master.
      But, in the the same second it gets a "more frequent advertisement received", which is causing the server to return back to the backup state.

      I did isolate both nodes already to have both LAN interfaces of the pfsense boxes as the only remaining members in a lan.
      Also, both boxes are running as virtual machines on a VMWare server and the options "Promiscuous Mode", "MAC Address Changes" and "Forged Transmits" are set to "Accept"
      When doing a tcpdump, I can see the multicast message from the primary node arrive at the secondary node.
      The issue however might give some clues in this trace and I am hoping someone will have an "aha"-moment on this…
      What I see in the trace :

      | Source | Dest | Proto | Info |
      | 10.0.0.2 | 224.0.0.18 | VRRP | Announcement (Current master has stopped participating in VRRP) |
      | Vmware_mac_prim | Broadcast | ARP | Gratuitous ARP for 10.0.0.1 (Request) |
      | VMware_mac_back | IETF-VRRP-VRID_03 | ARP | Gratuitous ARP for 10.0.0.1 (Reply) (duplicate use of 10.0.0.1 detected!) |

      As you can see, there is a VRRP announcement from the primary box (10.0.0.2)
      Then, the primary box sends an ARP request for 10.0.0.1, which is answered by the backup node with an ARP reply telling a duplicate use of the VIP 10.0.0.1 is detected. Maybe this is the root-cause of this issue… but the question is "why".

      So... the main question is.... how do I get the primary to become master of the VIP.

      1 Reply Last reply Reply Quote 0
      • G
        Gloom
        last edited by

        Couple of questions.
        Is the VM host using load balancing over multiple nics? If so make sure it is set to IP Hash with the switch configured accordingly.
        Have you created separate port groups on the virtual switch with  promiscuous mode only enabled on the group that carries the VRRP? Port groups are probably the way forward.

        Carp, VRRP etc are notoriously idiosyncratic on VMWare

        Never underestimate the power of human stupidity

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.