MLAG switch reboot freaks out LACP & CARP
-
I have an HA FW pair connected via LACP LAG in fast mode to a pair of EdgeCore switches in an MCLAG/MLAG configuration.
If I pull FWA cable ix0 or ix1 LACP sees the link down, carp also notices and logs some info, but FWB does not become CARP master as the port channel does not go down. This is the behavior I expect.If I cold boot switch ECA (simulating a HW or power failure) one port goes down right way and is logged by pfsense, ~15 sends later the other port stops distributing and the lagg goes down and FWB becomes the master for all the CARPs even though ix1 never goes down.
Lagg comes up without issue and is stable until ECA is rebooted. Also odd that I can reboot ECB and FWA is not impacted other than seeing ix0 transitions down/up.
I have another piece of gear (HA controller storage) on another portchannel which does not react and failover. Still trying to figure out if things recover before it initiates a failover. I'm suspecting pfsense has an issue but I'm not fluent at this level or sure of this. I did a tcpdump on ix1, dumped into wireshark but I can't make heads or tails of it.
FWA---------FWB
| \ / |
| (ix1) / |
| \ / |
(ix0) |
| / \ |
| / \ |
ECA----------ECBTrying to determine if there is a switch issue or something to tune on the firewalls. Here is some debug output from when I cold boot ECA switch. Right away FWA sees something happening on the portchannel. ~15 secs later second interface reacts, lagg goes down and CARP master moves to FWB.
Latest release on NetGate/Supermicro.
Super Micro 1541
Version 23.05.1-RELEASE (amd64)
built on Wed Jun 28 03:57:27 UTC 2023
FreeBSD 14.0-CURRENTAnyone know if its pfsense and if so how to fix or debug this?
I was really hoping to keep LACP for the extra BW as opposed to going with Active/Passive Faoliver mode.
Jul 25 12:55:25 FWA kernel: ix0: Interface stopped DISTRIBUTING, possible flapping
Jul 25 12:55:40 FWA kernel: ix1: Interface stopped DISTRIBUTING, possible flapping
Jul 25 12:55:40 FWA kernel: lagg0: link state changed to DOWN
Jul 25 12:55:40 FWA kernel: carp: 24@lagg0.51: MASTER -> INIT (hardware interface down)
Jul 25 12:55:40 FWA kernel: carp: demoted by 240 to 240 (interface down)
Jul 25 12:55:40 FWA kernel: carp: 25@lagg0.51: MASTER -> INIT (hardware interface down)
Jul 25 12:55:40 FWA kernel: carp: demoted by 240 to 480 (interface down)
Jul 25 12:55:40 FWA kernel: lagg0.51: link state changed to DOWN
Jul 25 12:55:40 FWA kernel: carp: 71@lagg0.71: MASTER -> INIT (hardware interface down)
Jul 25 12:55:40 FWA kernel: carp: demoted by 240 to 720 (interface down)
Jul 25 12:55:40 FWA kernel: lagg0.71: link state changed to DOWN
Jul 25 12:55:40 FWA kernel: carp: 97@lagg0.11: MASTER -> INIT (hardware interface down)
Jul 25 12:55:40 FWA kernel: carp: demoted by 240 to 960 (interface down)
Jul 25 12:55:40 FWA kernel: lagg0.11: link state changed to DOWN
Jul 25 12:55:40 FWA kernel: carp: 68@igb4: MASTER -> BACKUP (more frequent advertisement received)
Jul 25 12:55:40 FWA kernel: carp: 208@igb5: MASTER -> BACKUP (more frequent advertisement received)
Jul 25 12:55:40 FWA check_reload_status[831]: Linkup starting lagg0
Jul 25 12:55:40 FWA check_reload_status[831]: Carp backup event
Jul 25 12:55:40 FWA check_reload_status[831]: Carp backup event
Jul 25 12:55:40 FWA check_reload_status[831]: Linkup starting lagg0.51
Jul 25 12:55:40 FWA check_reload_status[831]: Carp backup event
Jul 25 12:55:40 FWA check_reload_status[831]: Linkup starting lagg0.71
Jul 25 12:55:40 FWA check_reload_status[831]: Carp backup event
Jul 25 12:55:40 FWA check_reload_status[831]: Linkup starting lagg0.11
Jul 25 12:55:40 FWA check_reload_status[831]: Carp backup event
Jul 25 12:55:40 FWA check_reload_status[831]: Carp backup event
Jul 25 12:55:41 FWA php-fpm[84838]: /rc.linkup: Hotplug event detected for D1(opt1) static IP address (4: a.b.c.2)
Jul 25 12:55:41 FWA check_reload_status[831]: Reloading filter
Jul 25 12:55:41 FWA check_reload_status[831]: Reloading filter