Unable to reassign network port
-
@stephenw10 Alrighty now i am very confident i know what the problem is. The issue is with the LAGG itself.
Create a LAGG, assign your vlans and attempt to make modifications. That will leave you with an outage.
I now moved vlans over to igc2 [really any interface] and no issue.
Move it back to the LAGG, and then the interface bounce, no vlan description changes. Its a total outage.
I have managed to reproduce this so far on two devices, specifically 6100s.
This explains why I can throw my config.xml on another 6100 and the problem follows. It's the LAGG not playing nice somehow. Driver related? Unsure.Not trying to throw shade here but i think thats why this wasnt tested at all even though my redmine and TAC case was closed with 'unable to reproduce' message. The testing methodology isnt good even though i did point out that the LAGG is part of the implementation.
Throw the interfaces in a LAGG and my suspicion is that you will see the trouble I'm seeing with VLAN changes. -
I'm testing using a VLAN on a lagg of igc NICs on a 6100. And I can't reproduce it.
Specifically:<vlans> <vlan> <if>lagg0</if> <tag>100</tag> <pcp></pcp> <descr><![CDATA[Not test]]></descr> <vlanif>lagg0.100</vlanif> </vlan> </vlans> <laggs> <lagg> <members>igc2,igc3</members> <descr><![CDATA[lagg0]]></descr> <laggif>lagg0</laggif> <proto>lacp</proto> <lacptimeout>slow</lacptimeout> <lagghash>l2,l3,l4</lagghash> </lagg> </laggs>
I agree the lagg likely is important there. And if it is some property that can't be applied to the lagg members the NIC type would also be important.
I did also consider whether having the lagg interface itself assigned would make a difference since the first test I ran did have that and you don't. But unassigning it did not change anything. You might try assigning it to see if that changes anything for you.
I'll test your config dircetly and see if that triggers anything ....
-
@stephenw10 If this really cant be replicated on your end then i honestly dont know.
Any changes related to a LAGG such as interface descriptions shouldnt bounce a LAGG, at least in practice. Is this what you are seeing on your end? Is part of my problem description accurate on that? -
Ok, first I have to apologise because I failed to notice your config had spoofed the WAN MAC. So for a very brief period after I uploaded it onto a 6100 here the block rule I had added was for the wrong IP and it had outbound access. So I hope it didn't cause you any issues if it updated your dyndns with my IP for example.
But, having corrected that, I'm still unable to replicate what you're seeing. I changed the description of VLAN 17 and it applied as expected:
--- /conf/backup/config-1675903864.xml 2023-02-08 20:10:42.339035000 -0500 +++ /conf/config.xml 2023-02-08 20:10:42.358714000 -0500 @@ -4274,7 +4274,7 @@ <if>lagg0</if> <tag>17</tag> <pcp></pcp> - <descr><![CDATA[Test]]></descr> + <descr><![CDATA[Test2]]></descr> <vlanif>lagg0.17</vlanif> </vlan> <vlan> @@ -4294,9 +4294,9 @@ </vlans> <qinqs></qinqs> <revision> - <time>1675903864</time> - <description><![CDATA[admin@192.168.50.200 (Local Database Fallback): VLAN interface added]]></description> - <username><![CDATA[admin@192.168.50.200 (Local Database Fallback)]]></username> + <time>1675905042</time> + <description><![CDATA[admin@172.21.16.8 (Local Database Fallback): VLAN interface added]]></description> + <username><![CDATA[admin@172.21.16.8 (Local Database Fallback)]]></username> </revision> <gateways> <defaultgw4>WAN_DHCP</defaultgw4>
However, because I have blocked the firewall access to an upstream connection there are several differences.
It is unable to access packages so none are installed and you have a lot of packages in that config.
It cannot access ACB and the ruleset takes a very long time to load, like 5mins, because of the unpopulated aliases from pfBlocker.I suspect what you're seeing is somehow caused by interaction with one or more packages you have. I'd have to guess Wireguard or pfBlocker as they both produce config changes when they see interface changes.
Are you able to test disabling or removing those?
-
@stephenw10 said in Unable to reassign network port:
Ok, first I have to apologise because I failed to notice your config had spoofed the WAN MAC. So for a very brief period after I uploaded it onto a 6100 here the block rule I had added was for the wrong IP and it had outbound access. So I hope it didn't cause you any issues if it updated your dyndns with my IP for example.
That was you?!?!? Just kidding i didnt notice any outage on my end
Im going to disable pfBlocker first. Bringing down the WireGuard tunnels i lose access to my VPC.
I'll test and let you know. -
@michmoor Disabled both packages and tested. No change.
Another thing i have found through my testing. If i disable the interface. Then go to the vlan and change the description, the changes stick in the GUI. -
I suppose we should close this out here and these are my conclusions. Just to note this can be replicated to another 6100 but others cannot replicate it. So this issue varies between devices.
-
VLAN description changes cannot happen while the Interface is enabled. I first have to disable the interface, go back to the VLAN assignments and change the description there. Enable the interface again.
-
Any changes to the LAGG [adding a vlan, deleting a vlan, changing description] causes the LAGG to bounce. Any connectivity riding that LAGG will experience an outage. It shouldn't be like this as other vendors do not function like this so this really should be looked at. Its possible for other vendors to write proprietary code for LAGGs to avoid this issue?
-
Late last night i discovered that if you do not use a LAG but instead use a single interface trunked. any changes made to that trunk interface will result in a link bounce. Again, it shouldnt be this way but i managed to replicate it on another device.
Overall, using trunked interfaces on pf is very risky in a production environment as any changes to it result in a momentary blip. I am curious if this would happen to me on a 4100 as that has different NIC drivers i believe. I still strongly suspect the NIC is the issue.
-
-
The 4100 uses identical NICs to the 6100 (and 8200).
The config not applying has to be a result of one the packages you have installed. If you removed all the packages you would be at the same point I was testing from (almost) where it does not happen.
Steve
-
@stephenw10 I'm really beginning to dislike the fact you are always right
I removed pfblockerNG
I then went in to change the vlan description and wouldnt you know it.......Applied right away.
There is still an outage tho.
So this leaves me with two questions
- where do we go from here? Do i reach out to the package maintener to describe the problem and see if theres a fix?
- Is the port flapping just what happens and thats the nature of how lagg is implemented?
-
Ah, that's interesting...hmm let me think about that. It could be simply the time it takes to reload everything some how. I assume you do not see it take 5mins to reload the filter when pfBlocker is populated correctly?
Applying changes to a VLAN on the LAGG is likely to bounce the LAGG because it reapplies the interfaces and the properties are inherited. I'm not sure if that can be prevented entirely.
-
@stephenw10 does it take a long time for pfblocker to update you mean? No it’s quick. Maybe a minute or 2 at the most.
Can you load pfblocker on your test device and replicate? -
No I mean the filter load so if you go to Status > Filter Reload and initiate the load how long does it take? It should be seconds.
-
@stephenw10 Ah. Right now without pfblockerNG installed. Seconds. Max 5s or so.
-
@stephenw10 You are not going to believe this [or maybe you will]. Upgraded to 23.01. Installed pfblocker [not ng version].
All problems have now went away. I am able to change vlan descriptions with pfblcoker installed AND the lagg doesn't bounce when you do it....
At the very least there was clearly some buggy/inconsistent behavior on 22.05 but on 23.01 all the problems noted above are now gone. LAGG is stable regardless of changes made.You have no idea how relieved i am on all this. I do deploy LAGGs and its just terrifying that i couldn't make a change unless im under a maintenance window.
-
Nice! Yeah the changes to 23.01 are large and extensive. It doesn't surprise me that the issue has changed. It would have been nice to know exactly what was happening there but it's probably not worth digging much further unless it reappears.
Steve
-
@stephenw10 appreciate you Stephen. Thanks for all the support !