CARP on Switch ports without portfast leading to double master-master problems ?



  • Any ideas how this issue (which has come up several times in the OpenBSD mailing-lists) can be handled under FreeBSD/pfSense ?

    http://marc.info/?l=openbsd-misc&m=137414729621596&w=2
    List:      openbsd-misc
    Subject:    CARP on Switch ports without port fast leading to double master-master problems
    From:      Andy <andy ()="" brandwatch="" !="" com="">Date:      2013-07-18 11:34:11
    Message-ID: 51E7D2B3.8050006 () brandwatch ! com

    Others have discussed our problem but I cannot see that this has been
    implement (I cannot find a man page referring to this).
    http://openbsd.7691.n7.nabble.com/carp-init-delay-td226187.html

    I.e. When a firewall boots up, the connected switch port starts STP and
    is initially blocked, causing the newly booting firewall to think it is
    master, the port then starts forwarding and I have double master.

    This causes issues with other daemons too which monitor the CARP state
    like sasynd, BGPD etc…

    I have enabled port fast where I can. However I cannot guarantee this
    and the WAN connections to our data centre network do not want to enable
    port past. This means I have to set a high advbase, but this ruins the
    response time.

    I could add "!sleep 5" to the top of carp interfaces as suggested in the
    link above but this really belongs in the kernel as this only helps with
    the firewall reboot condition and not all the other possible network
    state changes etc like the removal of a NIC and reconnection (which
    restarts STP etc).

    Has this been done? :)</andy>

    http://marc.info/?l=openbsd-misc&m=137459269317077&w=2
    List:      openbsd-misc
    Subject:    Re: CARP on Switch ports without port fast leading to double master-master problems
    From:      Andy <andy ()="" brandwatch="" !="" com="">Date:      2013-07-23 15:17:33
    Message-ID: 51EE9E8D.7050608 () brandwatch ! com

    Fantastic,

    Thanks Stuart, That was really helpful!

    Without even knowing it your thoughts (suggesting manipulating
    carpdemote) has also just helped me to resolve /another/ CARP issue I
    have been battling with when using a direct crossover cable between the
    firewalls.

    Same issue as:
    http://old.nabble.com/Unexpected-carp-failovers-when-using-crossover-cable-as-pfsync-syncdev-in-5.1-p33921868.html

    When the backup is rebooted, pfsync interface goes down, which causes
    carpdemote to increment on the primary;
    stfw1 kernel: carp: pfsync0 demoted group carp by 1 to 1 (pfsync link state down)
    stfw1 kernel: carp: pfsync0 demoted group pfsync by 1 to 1 (pfsync link state down)

    When the backup is rebooting the pfsync interface goes up and down a few
    times during POST'ing and NIC BIOS etc, before OpenBSD starts to load.
    This seems to cause the Primary to start the process of attempting a
    bulk update 'carp interlock' before the backup is ready.

    When the backup finally comes up and requests a bulk update (even though
    the primary is still attempting a bulk update in the opposite direction
    I think (CARP interlock in place)) which fails, the backup goes master
    as the Primary has carpdemote=1 while the backup has a carpdemote=0,
    thus multiple masters.

    On the Primary we saw;
    carp: pfsync0 demoted group carp by 1 to 1 (pfsync link state down)
    carp: pfsync0 demoted group pfsync by 1 to 1 (pfsync link state down)
    carp0: state transition: MASTER -> BACKUP                <- Due to multi-master!
    carp1: state transition: MASTER -> BACKUP                <- Due to multi-master!
    carp: pfsync0 demoted group carp by -1 to 0 (pfsync link state up)
    carp: pfsync0 demoted group pfsync by -1 to 0 (pfsync link state up)
    carp0: state transition: BACKUP -> MASTER                <- Later corrects itself
    carp1: state transition: BACKUP -> MASTER                <- Later corrects itself

    We can see the Primary firewall had to quickly drop to 'backup', as the
    seconadry firewall made itself master.

    On the secondary we saw;
    carp: carp1 demoted group carp by 1 to 149 (carpdev)
    carp: pfsync0 demoted group carp by 32 to 181 (pfsync init)
    carp: pfsync0 demoted group pfsync by 32 to 32 (pfsync init)
    carp: pfsync0 demoted group carp by 1 to 182 (pfsync bulk start)
    carp: pfsync0 demoted group pfsync by 1 to 33 (pfsync bulk start)
    carp: carp1 demoted group carp by -1 to 181 (carpdev)
    carp: pfsync0 demoted group carp by -1 to 180 (pfsync bulk done)
    carp: pfsync0 demoted group pfsync by -1 to 32 (pfsync bulk done)
    carp: pfsync0 demoted group carp by -32 to 148 (pfsync init)
    carp: pfsync0 demoted group pfsync by -32 to 0 (pfsync init)
    carp0: state transition: BACKUP -> MASTER
    carp1: state transition: BACKUP -> MASTER
    carp0: state transition: MASTER -> BACKUP
    carp1: state transition: MASTER -> BACKUP

    This was fixed by adding;
    !ifconfig -g carp carpdemote 1
    !ifconfig -g pfsync carpdemote 1

    To each physical interface 'hostname.if', and then adding

    sleep 120
    ifconfig -g carp -carpdemote 3
    ifconfig -g pfsync -carpdemote 3

    NB; There are 3 physical interfaces (INT, EXT, and PFSYNC's pysical
    interface).

    Completely stabilises a flapping pfsync interface during reboots :)

    Cheers, Andy.

    On 22/07/13 22:26, Stuart Henderson wrote:

    On 2013-07-22, Andy andy@brandwatch.comwrote:

    For example we are connected to a various providers in various
    locations (we have many OpenBSD firewalls and this is only a problem in
    some locations) where they won't enable port fast/configure as static
    access ports.
    I would think this is the minority, and that most places are either on switches
    not smart enough for STP, or where the admins can configure them appropriately
    for the connected devices, in either case the extra delay would be unwanted..
    (and how long would you delay for anyway? it depends on switch configuration).

    BTW an alternative to "sleep" in the network scripts would be to use
    "!ifconfig -g carp carpdemote" in a hostname.if file, then in rc.local
    maybe a sleep and then "ifconfig -g carp -carpdemote".. However neither of
    these account for the situation where you lose and re-gain link after boot./andy@brandwatch.com</andy>