Unable to reassign network port
-
@marcosm There is quite a bit to restore manually.
So what i am thinking about trying is the following-
re-install pfSense. Do not restore original configuration. Create VLANs and attempt to change the description. I will note the results. Afterward i will create a LAGG0. See how that goes.
-
Assuming the problem cannot be repeated from step 1 then i will restore my configuration. I will then blow away all vlans and lagg interfaces. recreate vlan.ids and laggs and attempt to modify.
Depending on how things go, it could very well be something funky in configuration.
The mystery is why is it that changes to interfaces and vlans through the GUI are not saved but if i edit the config.xml directly then interface changes are saved.
How does the GUI talk to the system files? I assume there is some commit check that takes place. If there is a log for that, that could reveal alot of whats going on behind the scenes. -
-
One more tidbit to kind of proved my point about the outages...
igc0 is my LAN. Not in a VLAN. Traffic not routed across the LAGG.
I change the vlan description and i have a continuous uninterrupted ping to google.com.Reply from 172.217.13.14: bytes=32 time=6ms TTL=115 Reply from 172.217.13.14: bytes=32 time=2ms TTL=115 Reply from 172.217.13.14: bytes=32 time=2ms TTL=115 Ping statistics for 172.217.13.14: Packets: Sent = 33, Received = 33, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 2ms, Maximum = 17ms, Average = 6ms
Now i set up an extended ping to another VLAN that is on the LAGG. I make a vlan description change and outage..
ping -t 192.168.17.2 Pinging 192.168.17.2 with 32 bytes of data: Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.17.2: bytes=32 time=1ms TTL=127 Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.50.254: Destination host unreachable. Reply from 192.168.50.254: Destination host unreachable. Request timed out. Request timed out. Reply from 192.168.17.2: bytes=32 time=2ms TTL=127 Reply from 192.168.17.2: bytes=32 time=14ms TTL=127 Reply from 192.168.17.2: bytes=32 time=1ms TTL=127 Reply from 192.168.17.2: bytes=32 time=1ms TTL=127 Ping statistics for 192.168.17.2: Packets: Sent = 15, Received = 13, Lost = 2 (13% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 14ms, Average = 1ms
@stephenw10 i swear im not crazy :)
-
Update:
Anyhthing that has to do with the LAGG triggers an outage on the LAGG.All i did was add a VLAN tonight and the results are below.
Pings start on igc0[192.168.50.221] which is not a member of the lagg and not part of any vlan.ping -t 192.168.17.2 Pinging 192.168.17.2 with 32 bytes of data: Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.17.2: bytes=32 time=1ms TTL=127 Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.50.254: Destination host unreachable. Request timed out. Reply from 192.168.50.254: Destination host unreachable. Reply from 192.168.50.254: Destination host unreachable. Request timed out. Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.17.2: bytes=32 time<1ms TTL=127 Reply from 192.168.17.2: bytes=32 time=1ms TTL=127 Reply from 192.168.17.2: bytes=32 time=1ms TTL=127
-
There are a few things that could be happening here:
Input validation in the GUI is preventing you making the changes because some existing setting it tries to apply at the same time is invalid. However if that were true I would expect it to throw an error in the gui when you tried to save it. And there wouldn't actually be anything applied to the interfaces so you wouldn't see the lagg bounce.
It creates a config that is invalid generating a bad config file and pfSense chooses the last valid config to use. If that was happening I would expect to see a bunch of logs indicating it.The fact it bumps lagg implies changes are being applied to the VLAN and it's trying to propagate those to it's parent interface, lagg0.
I haven't been able to replicate it even using a vlan on a lagg of igc NICs exactly as you have.Yet.
When you save the description change do you see that shown in Diag > Backup > Config History?
-
@stephenw10 Good question.
Just modified a vlan description. Change didnt stick
-
Hmm, and that also fails?
What does the config diff show if you just try to change the description of an existing VLAN?
-
@stephenw10 updated the screen shot. Wrote a test description. You see a config change but nothing in the GUI.
-
But that's a previous config change? It doesn't include the VLAN changes.
Or is that timestamp when you actually made the change?
-
@stephenw10 I just did a vlan change. 11:16
-
Hmm, what was the actual change you made? What was the new description you tried to set?
-
@stephenw10 just wrote the word 'Test' I wish i could show you this in real time. I also went to Services / Auto Configuration Backup/ Revision Information
Look at the vlan hierarchy and its unchanged.. -
Some good news.
As a first step i decided to delete all sub-interfaces , vlans and lagg0. All that is left is WAN and LAN.
First i re-created the lagg0.
Secondly, i re-created my vlan tags and assigned them all to parent interface lagg0.
I changed vlan descriptions multiple times and each time the change is reflected in the GUI.Because the interfaces were deleted and i had to create them again, attempting to restore Firewall rules from backup config i receive the following error message
In the back of my mind, i had a suspicion that somehow the interface mappings were messed up somehow. That looks to be the case for sure.
Fatal error: Uncaught Exception: XML error: SSHDATA at line 10148 cannot occur more than once in /etc/inc/xmlparse.inc:89 Stack trace: #0 [internal function]: startElement(Resource id #13, 'SSHDATA', Array) #1 /etc/inc/xmlparse.inc(188): xml_parse(Resource id #13, 'mldata>\n\t\t
My other question. Is it normal for the LAGG to flap when changing the description of a VLAN tag? The LED lights on the NetGate do go off and come back on. There is ping loss and the system logs do see an UP/DOWN event so im not making it up.
-
That's a known issue: https://redmine.pfsense.org/issues/13132
Looks like restoring a partial config created two sshdata sections. You should be able to manually remove one.
When you save the VLAN config to change the description is recreates the VLAN and doing that attempts to apply values to/from the parent interfaces. So, yes, I might expect to see the lagg bounce. You might not see the NICs re-link if nothing gets reconfigured.
Steve
-
@stephenw10 Alrighty now i am very confident i know what the problem is. The issue is with the LAGG itself.
Create a LAGG, assign your vlans and attempt to make modifications. That will leave you with an outage.
I now moved vlans over to igc2 [really any interface] and no issue.
Move it back to the LAGG, and then the interface bounce, no vlan description changes. Its a total outage.
I have managed to reproduce this so far on two devices, specifically 6100s.
This explains why I can throw my config.xml on another 6100 and the problem follows. It's the LAGG not playing nice somehow. Driver related? Unsure.Not trying to throw shade here but i think thats why this wasnt tested at all even though my redmine and TAC case was closed with 'unable to reproduce' message. The testing methodology isnt good even though i did point out that the LAGG is part of the implementation.
Throw the interfaces in a LAGG and my suspicion is that you will see the trouble I'm seeing with VLAN changes. -
I'm testing using a VLAN on a lagg of igc NICs on a 6100. And I can't reproduce it.
Specifically:<vlans> <vlan> <if>lagg0</if> <tag>100</tag> <pcp></pcp> <descr><![CDATA[Not test]]></descr> <vlanif>lagg0.100</vlanif> </vlan> </vlans> <laggs> <lagg> <members>igc2,igc3</members> <descr><![CDATA[lagg0]]></descr> <laggif>lagg0</laggif> <proto>lacp</proto> <lacptimeout>slow</lacptimeout> <lagghash>l2,l3,l4</lagghash> </lagg> </laggs>
I agree the lagg likely is important there. And if it is some property that can't be applied to the lagg members the NIC type would also be important.
I did also consider whether having the lagg interface itself assigned would make a difference since the first test I ran did have that and you don't. But unassigning it did not change anything. You might try assigning it to see if that changes anything for you.
I'll test your config dircetly and see if that triggers anything ....
-
@stephenw10 If this really cant be replicated on your end then i honestly dont know.
Any changes related to a LAGG such as interface descriptions shouldnt bounce a LAGG, at least in practice. Is this what you are seeing on your end? Is part of my problem description accurate on that? -
Ok, first I have to apologise because I failed to notice your config had spoofed the WAN MAC. So for a very brief period after I uploaded it onto a 6100 here the block rule I had added was for the wrong IP and it had outbound access. So I hope it didn't cause you any issues if it updated your dyndns with my IP for example.
But, having corrected that, I'm still unable to replicate what you're seeing. I changed the description of VLAN 17 and it applied as expected:
--- /conf/backup/config-1675903864.xml 2023-02-08 20:10:42.339035000 -0500 +++ /conf/config.xml 2023-02-08 20:10:42.358714000 -0500 @@ -4274,7 +4274,7 @@ <if>lagg0</if> <tag>17</tag> <pcp></pcp> - <descr><![CDATA[Test]]></descr> + <descr><![CDATA[Test2]]></descr> <vlanif>lagg0.17</vlanif> </vlan> <vlan> @@ -4294,9 +4294,9 @@ </vlans> <qinqs></qinqs> <revision> - <time>1675903864</time> - <description><![CDATA[admin@192.168.50.200 (Local Database Fallback): VLAN interface added]]></description> - <username><![CDATA[admin@192.168.50.200 (Local Database Fallback)]]></username> + <time>1675905042</time> + <description><![CDATA[admin@172.21.16.8 (Local Database Fallback): VLAN interface added]]></description> + <username><![CDATA[admin@172.21.16.8 (Local Database Fallback)]]></username> </revision> <gateways> <defaultgw4>WAN_DHCP</defaultgw4>
However, because I have blocked the firewall access to an upstream connection there are several differences.
It is unable to access packages so none are installed and you have a lot of packages in that config.
It cannot access ACB and the ruleset takes a very long time to load, like 5mins, because of the unpopulated aliases from pfBlocker.I suspect what you're seeing is somehow caused by interaction with one or more packages you have. I'd have to guess Wireguard or pfBlocker as they both produce config changes when they see interface changes.
Are you able to test disabling or removing those?
-
@stephenw10 said in Unable to reassign network port:
Ok, first I have to apologise because I failed to notice your config had spoofed the WAN MAC. So for a very brief period after I uploaded it onto a 6100 here the block rule I had added was for the wrong IP and it had outbound access. So I hope it didn't cause you any issues if it updated your dyndns with my IP for example.
That was you?!?!? Just kidding i didnt notice any outage on my end
Im going to disable pfBlocker first. Bringing down the WireGuard tunnels i lose access to my VPC.
I'll test and let you know. -
@michmoor Disabled both packages and tested. No change.
Another thing i have found through my testing. If i disable the interface. Then go to the vlan and change the description, the changes stick in the GUI. -
I suppose we should close this out here and these are my conclusions. Just to note this can be replicated to another 6100 but others cannot replicate it. So this issue varies between devices.
-
VLAN description changes cannot happen while the Interface is enabled. I first have to disable the interface, go back to the VLAN assignments and change the description there. Enable the interface again.
-
Any changes to the LAGG [adding a vlan, deleting a vlan, changing description] causes the LAGG to bounce. Any connectivity riding that LAGG will experience an outage. It shouldn't be like this as other vendors do not function like this so this really should be looked at. Its possible for other vendors to write proprietary code for LAGGs to avoid this issue?
-
Late last night i discovered that if you do not use a LAG but instead use a single interface trunked. any changes made to that trunk interface will result in a link bounce. Again, it shouldnt be this way but i managed to replicate it on another device.
Overall, using trunked interfaces on pf is very risky in a production environment as any changes to it result in a momentary blip. I am curious if this would happen to me on a 4100 as that has different NIC drivers i believe. I still strongly suspect the NIC is the issue.
-