Both CARP IPs in BACKUP state after failback



  • Hi,

    hopefully the last of my CARP issues after the upgrade to 2.0-BETA4 (i386) built on Mon Dec 20 22:18:43 EST 2010:

    After solving the problem with CARPs not leaving the INIT state (see http://forum.pfsense.org/index.php/topic,31352.msg161989.html) I now have the problem that sometimes after a failback operation from slave to master some VIPs on both nodes remain in BACKUP state.
    This happens consistently for one of my CARP IPs and has now occurred with a different one as well.
    All VIPs are on VLANs defined on the same LAGG interface.

    I double checked the /cf/conf/config.xml on both nodes, the vhid password is identical.

    Is there a way to force failover/failback for a CARP IP on the console?

    cheers

    Martin


    ifconfig output (see vip6 and vip12):

    On master:
    vip1: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 82.149.225.102 netmask 0xfffffff0
            carp: MASTER vhid 1 advbase 1 advskew 0
    vip2: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.16.62 netmask 0xffffffc0
            carp: MASTER vhid 2 advbase 1 advskew 0
    vip3: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.16.126 netmask 0xffffffc0
            carp: MASTER vhid 3 advbase 1 advskew 0
    vip4: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.20.250 netmask 0xffffff00
            carp: MASTER vhid 4 advbase 1 advskew 0
    vip5: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.21.250 netmask 0xffffff00
            carp: MASTER vhid 5 advbase 1 advskew 0
    vip6: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.22.250 netmask 0xffffff00
            carp: BACKUP vhid 6 advbase 1 advskew 0
    vip7: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.24.250 netmask 0xffffff00
            carp: MASTER vhid 7 advbase 1 advskew 0
    vip8: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.25.250 netmask 0xffffff00
            carp: MASTER vhid 8 advbase 1 advskew 0
    vip9: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.26.250 netmask 0xffffff00
            carp: MASTER vhid 9 advbase 1 advskew 0
    vip10: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.27.250 netmask 0xffffff00
            carp: MASTER vhid 10 advbase 1 advskew 0
    vip11: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.28.250 netmask 0xffffff00
            carp: MASTER vhid 11 advbase 1 advskew 0
    vip12: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.29.250 netmask 0xffffff00
            carp: BACKUP vhid 12 advbase 1 advskew 0

    On slave:

    vip1: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 82.149.225.102 netmask 0xfffffff0
            carp: BACKUP vhid 1 advbase 2 advskew 100
    vip2: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.16.62 netmask 0xffffffc0
            carp: BACKUP vhid 2 advbase 2 advskew 100
    vip3: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.16.126 netmask 0xffffffc0
            carp: BACKUP vhid 3 advbase 2 advskew 100
    vip4: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.20.250 netmask 0xffffff00
            carp: BACKUP vhid 4 advbase 2 advskew 100
    vip5: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.21.250 netmask 0xffffff00
            carp: BACKUP vhid 5 advbase 2 advskew 100
    vip6: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.22.250 netmask 0xffffff00
            carp: BACKUP vhid 6 advbase 2 advskew 100
    vip7: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.24.250 netmask 0xffffff00
            carp: BACKUP vhid 7 advbase 2 advskew 100
    vip8: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.25.250 netmask 0xffffff00
            carp: BACKUP vhid 8 advbase 2 advskew 100
    vip9: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.26.250 netmask 0xffffff00
            carp: BACKUP vhid 9 advbase 2 advskew 100
    vip10: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.27.250 netmask 0xffffff00
            carp: BACKUP vhid 10 advbase 2 advskew 100
    vip11: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.28.250 netmask 0xffffff00
            carp: BACKUP vhid 11 advbase 2 advskew 100
    vip12: flags=49 <up,loopback,running>metric 0 mtu 1500
            inet 172.16.29.250 netmask 0xffffff00
            carp: BACKUP vhid 12 advbase 2 advskew 100</up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running></up,loopback,running>



  • Hmm, the logs on the master say (repeatedly):

    Dec 22 14:06:01 kernel: vip6: MASTER -> BACKUP (more frequent advertisement received)
    Dec 22 14:06:01 kernel: vip6: 2 link states coalesced
    Dec 22 14:06:01 kernel: vip6: link state changed to DOWN
    Dec 22 14:06:01 kernel: vip12: MASTER -> BACKUP (more frequent advertisement received)
    Dec 22 14:06:01 kernel: vip12: 2 link states coalesced
    Dec 22 14:06:01 kernel: vip12: link state changed to DOWN

    this would explain it - however I do not understand the reason for this. All other CARP VIPs are fine, only these two got problems.

    cheers

    Martin



  • That means that the vlans are going up/down from what i can tell.
    You can check if you have problems with interface errors for those specific vips.

    Also if you can create only one carp vip and add all the other ips as ip aliases to the vip.



  • Sorry, false alarm. Problem was caused by braindead switches which forgot that they had been interconnected with two bonded LACP links, causing a Layer 2 loop.

    M.


Locked