Issues with LACP after upgrade
-
Last Friday I've upgraded our firewall to the 2.4.5-RELEASE-p1 and cleaned out the dust. Since the upgrade, our LACP to our switch (Ubuiqiti EdgeSwitch 16 XG) isn't working correctly anymore. The second interface won't come up. Even if I tried to switch the cable from the working port to the second one or the receiver and fiber cable from the working port to the second port on the firewall, it won't come up. If I switch the fiber cable at the switch side to the receiver on the second port of the switch it comes right up.
Also, since the upgrade a few times a day the following message "ix0: Interface stopped DISTRIBUTING, possible flapping" appears in the log and the firewall is completely unreachable for around 20-30 seconds. This is the working port of the LACP set.
Before the upgrade, I've created a backup of the pfSense config and also a backup of the switch config. I've already compared both configs with the current settings and they are the same as before the upgrade. On the switch side the LAG setting is not set to static and according to the system is set to dynamic.
The motherboard is an X10SDV-TP8F and the two 10Gb interfaces are connected to the switch. I've already tried another receiver and fiber cable on both ends but it didn't matter.
The weird thing is that the concerning interface isn't appearing in the logs when I pull out the receiver or something like that. Probably because the port is already down.
If I execute the command "pciconf -lv" the interface is mentioned just like the other one. Also when I execute the command "dmesg | grep ix1" there are a few mentions of the port going up and down but no date or times are mentioned so maybe that are old messages.
I haven't tried setting the system tunable "net.link.lagg.lacp.default_strict_mode" with value 0 yet, but won't the interface at least show some "signs of life" without that?
Last Friday I've tried a reboot already but haven't tried a complete power down and pulling the plug yet.
Does someone have an idea what could be the issue? It is strange that before the upgrade everything was working fine and directly after that the problems start to appear.
PS: English isn't my native language, so if something isn't clear just let me know.
-
I think I've solved both problems.
The first problem with the second port of the LACP not working is resolved by removing and then readding the specific interface to the LACP group. After I did that the port started working immediately.
After that, there were still messages of "Interface stopped DISTRIBUTING, possible flapping" but now on both interfaces of the LACP group. To resolve this I added the system tunable I already mentioned in my first post ("net.link.lagg.lacp.default_strict_mode" with value 0) and restarted the firewall. Since that moment (last Saturday evening) until this moment I'm writing this, there are zero log entries with that error and the link hasn't gone down either since that.