pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue
-
@stephenw10 said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
@nrgia said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
I will redo the test as you suggested, only that the low setting was the default one.
Ah, interesting. Well an alternative might be to leave the switch port 15 set as 'high' and set the prio tag in pfSense to 7. I would expect that to then pass.
This is what I got with VLAN 20 priority set to 7 in pfSense and High priority set to port 15 in the switch.
high_priority.txt -
@stephenw10 ok I fired up my new AP, connected to my gs108e on port 8.. Using 802.1p setting for qos and have zero issues connecting to any of the vlans.
I switched it to port mode qos, and still no issues. Now only thing is my gs108 switch is downstream of my cisco sg300, if the switch is adding some odd vlan 0 tag maybe its stripping that?
I have a port on my 4860 I could connect the the netgear switch too, but that would mean creating new vlans on my pfsense as to not disrupt my network, etc. Happy to do so if you feel that could help in anyway.
-
Maybe you can setup a mirror port and make sure it's seeing all the traffic on the trunk/uplink?
-
@nrgia Hmm, still no ARP replies shown. I assume with that setting Guest wifi clients fail?
This is slightly odd:
01:31:36.279050 28:6d:97:7f:bb:0c (oui Unknown) > ac:1f:6b:45:fa:8a (oui Unknown), ethertype 802.1Q (0x8100), length 74: vlan 20, p 0, ethertype IPv4, 172.19.15.156.51950 > 192.168.10.1.domain: 54418+ A? gOoGle.cOM. (28)
Why is that client in the 172.19.15.X subnet using he 192.168.10.1 as a server?
-
@stephenw10 said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
@nrgia Hmm, still no ARP replies shown. I assume with that setting Guest wifi clients fail?
This is slightly odd:
01:31:36.279050 28:6d:97:7f:bb:0c (oui Unknown) > ac:1f:6b:45:fa:8a (oui Unknown), ethertype 802.1Q (0x8100), length 74: vlan 20, p 0, ethertype IPv4, 172.19.15.156.51950 > 192.168.10.1.domain: 54418+ A? gOoGle.cOM. (28)
Why is that client in the 172.19.15.X subnet using he 192.168.10.1 as a server?
That is the smart hub that should've been on 192.168.10.0 subnet, not on 172.18.0.0 subnet
From DHCP leases:
172.19.15.156 28:6d:97:7f:bb:0c (Samjin) hubv3-4011027753 2022/07/06 22:30:47 2022/07/07 00:30:47 online active 192.168.10.62 28:6d:97:7f:bb:0c (Samjin) 192.168.10.62 2022/07/06 22:08:29 2022/07/06 22:30:41 offline expired
After I played with what you said, happened the above.
Only after I restart pfSense it gets the proper IP from the proper subnet, after doing the above tests. Otherwise it never happens.172.19.15.156 28:6d:97:7f:bb:0c (Samjin) 172.19.15.156 2022/07/06 22:30:47 2022/07/06 23:08:28 offline expired 192.168.10.62 28:6d:97:7f:bb:0c (Samjin) hubv3-4011027753 2022/07/06 23:08:28 2022/07/07 01:08:28 online active
Please mind the timezone :)
-
Hmm, I guess the switch removed the tags at some point during the config changes.
That sort of thing is why running tagged and untagged traffic on the same link is better avoided. Though if you were doing that here the LAN would also have failed in 22.05.
Steve
-
@stephenw10 said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
Hmm, I guess the switch removed the tags at some point during the config changes.
That sort of thing is why running tagged and untagged traffic on the same link is better avoided. Though if you were doing that here the LAN would also have failed in 22.05.
Steve
So due to the fact that for the same port I have tagged and untagged, maybe the tags for the VLANS get stripped ? And only the traffic to VLAN 0(1) remains?
I my case for a home network and all., I used the default VLAN 1 for management which is untagged. So for port 15 which is where pfSense LAN side connects, it has traffic Untagged from VLAN 1, Tagged from VLAN 20, and Tagged from VLAN 30.
I pfsense I have VLAN 20, VLAN 30, and the management LAN is not part of any VLAN. I am writting this, maybe you find some mistake from my part.
So maybe not beeing part of any VLAN , the traffic is transmitted to VLAN 0 , because it doesn't know to which VLAN to send the traffic to and it sends it to Native VLAN 0 ?
-
@nrgia said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
So maybe not beeing part of any VLAN , the traffic is transmitted to VLAN 0
That shouldn't ever be able to happen.. I have never seen a switch where there wasn't a pvid on an interface. So if untagged the traffic would be on whatever the pvid is, which normally would be default vlan 1 of the switch.
-
@johnpoz said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
@nrgia said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
So maybe not beeing part of any VLAN , the traffic is transmitted to VLAN 0
That shouldn't ever be able to happen.. I have never seen a switch where there wasn't a pvid on an interface. So if untagged the traffic would be on whatever the pvid is, which normally would be default vlan 1 of the switch.
So a PVID is set by default to 1 if it's not set to anything else, ok then what is vlan 0? As I read out, some say VLAN 1 is the default, others that VLAN 0 is the default. As you saw my switch can only tag from 1 above...then I'm trying to understand which device in the network sets VLAN 0 tag?
-
@nrgia said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
then what is vlan 0
That is a special use vlan - it is not commonly used.. Its used normally to set priority when the actual vlan is not known, or whatever can not set the priority on the actual vlan.
It is a special use.. when a switch sees a vlan 0, it should set the priority of the frame to whatever the priority is on on vlan 0 to whatever the default vlan/native vlan for that port is, ie the pvid.. Which normally is 1 on any switch, unless it has been changed by the operator..
Here is what I can tell you how uncommon it is in the normal enterprise - I have never in 30 some years working in the biz, ever had need/want to set that on any sort of switches or routers, and have worked with lots and lots of them over the years. Nor have I ever seen it in the field on any pcaps, or any pcaps sent to me from multiple customer and locations - and I work for a major player, and have gotten pcaps for things they want help on from really all over the globe. Vlan 0 has never been part of any discussion or troubleshooting have ever been involved in. Now is it possible it was there and pcaps sent to me didn't have it - ok sure. But I have to say its not a very common used thing in my personal and professional opinion.
That your seeing them it is odd for sure..
-
@johnpoz said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
@nrgia said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
then what is vlan 0
That is a special use vlan - it is not commonly used.. Its used normally to set priority when the actual vlan is not known, or whatever can not set the priority on the actual vlan.
It is a special use.. when a switch sees a vlan 0, it should set the priority of the frame to whatever the priority is on on vlan 0 to whatever the default vlan/native vlan for that port is, ie the pvid.. Which normally is 1 on any switch, unless it has been changed by the operator..
Here is what I can tell you how uncommon it is in the normal enterprise - I have never in 30 some years working in the biz, ever had need/want to set that on any sort of switches or routers, and have worked with lots and lots of them over the years. Nor have I ever seen it in the field on any pcaps, or any pcaps sent to me from multiple customer and locations - and I work for a major player, and have gotten pcaps for things they want help on from really all over the globe. Vlan 0 has never been part of any discussion or troubleshooting have ever been involved in. Now is it possible it was there and pcaps sent to me didn't have it - ok sure. But I have to say its not a very common used thing in my personal and professional opinion.
That your seeing them it is odd for sure..
pfff...As deeper we go in, the more weirder it becomes.
The point is, if it was a package, like Suricata, or something, I could live without...but this is something that prevents me, to use pfSense. Sure, for a few months 22.01 is sound, but after those...I mean you guys are way more experienced than me, and ran out of ideas...
Can I at least ask both of you @johnpoz and @stephenw10 , to just drop me a message, if you stumble upon the same issue on other users posts, and you find a solution.
If you please @stephenw10 and find out, that something changed upstream in FreeBSD, can you let me know to test again?Thank you guys for your time, I will not dare to keep more on this. If you have any news, ideas please let me know.
Thank you again. -
@nrgia said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
So due to the fact that for the same port I have tagged and untagged, maybe the tags for the VLANS get stripped ? And only the traffic to VLAN 0(1) remains?
I was suggesting that what might have happened to get that client a dhcp lease on LAN is that while making changes to the QoS settings at some point the switch removed the VLAN tags from the port long enough for the DHCP sequence to complete.
Where as if you had your LAN assigned as ix2.10, for example, untagged traffic arriving at pfSense would simply be dropped.That's a completely separate issue to the inexplicable VLAN 0 tags we see in the pcaps though.
And I agree VLAN 0 (priority) tagged traffic is rare. I've only seen it in DHCP traffic arriving from an ISP.Steve
-
-
@stephenw10 said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
get a mirror port working that could see all the tagged traffic on your Netgear switch?
No I haven't I got tied up with real work stuff.. And have to look into being able to sniff vlan traffic on windows machine before hand anyway.. Or might be easier to just mirror the traffic to a spare interface on pfsense ;) That will be much easier I think.
-
@stephenw10 said in [pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue]
@NRgia Do you have any other interfaces you can use on that box? Or could you add any?
Steve
Yep, I have 4, only 2 are used
Is this board:
https://www.supermicro.com/en/products/motherboard/A2SDi-4C-HLN4FI tried to find something that resembles to Netgate Hardware, and not to be a no name board, but this is another discussion.
-
@stephenw10 said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
@nrgia said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
So due to the fact that for the same port I have tagged and untagged, maybe the tags for the VLANS get stripped ? And only the traffic to VLAN 0(1) remains?
I was suggesting that what might have happened to get that client a dhcp lease on LAN is that while making changes to the QoS settings at some point the switch removed the VLAN tags from the port long enough for the DHCP sequence to complete.
Where as if you had your LAN assigned as ix2.10, for example, untagged traffic arriving at pfSense would simply be dropped.That's a completely separate issue to the inexplicable VLAN 0 tags we see in the pcaps though.
And I agree VLAN 0 (priority) tagged traffic is rare. I've only seen it in DHCP traffic arriving from an ISP.Steve
Telling you that I saw the word (Incomplete) under DHCP leases table, will be of any help? I only saw this Incomplete during the last tests
-
Mmm, so only the 4 ix NICs on board. Is it in a case you can use the PCIe slow in to add another type of NIC?
That would be an easy way to prove out the driver/hardware vs config/network. -
@stephenw10 said in pfSense 22.05 breaks VLANS, restoring pfSense 22.01 fixes the issue:
Mmm, so only the 4 ix NICs on board. Is it in a case you can use the PCIe slow in to add another type of NIC?
That would be an easy way to prove out the driver/hardware vs config/network.I thought of that, unfortunately I don't have any low profiles NICs to test. For the low profile I need to buy one.
The chassis looks like this:https://www.supermicro.com/en/products/system/Mini-ITX/SYS-E300-9A-4C.cfm
In the PCI port I have a Riser card, in which a low profile NIC must be inserted.
I also want to be the driver, but you also tested with a board that uses ix driver. What is your NIC model? Mine is X553. Maybe the model counts also?
What if we do a little hack, it should work, for example to build an if_ix.ko on pfSense 22.01, then install pfSense 22.05, and load the .ko from the 22.01. Do you think it's a good test?
-
Yeah it seems unlikely to be the driver because that's the same SoC with the same NICs we use in the 6100 and 7100. And the 7100 uses VLANs on it by default.
Kernel modules from 22.01 will not load in the 22.05 kernel. At least most won't, especially something like that. You would have to build the 22.01 driver against 22.05. And it probably won't build without some work. You might be able to just revert the patch we think went in the created the issue in your network and compile that. However if it does that only proves the VLAN0 handling was bad.
We need to see pcap on a mirror port showing all the traffic on the wire going into the port with working two way connectivity.
It almost impossible to believe the driver could add those tags to incoming packets. Incorrectly removing them in 22.01 is far more likely.
Steve
-
@stephenw10 I just don't see how the driver would add them as well - that just makes no sense. And if that was the case as you mention you have it on a bunch of devices sold by netgate. And hard to believe he is the only one running this specific MB etc. But the driver would be the same.. So why would add something in on his, but not all the others?
So what would be something different on his hardware where something wrong with drive adding the vlan 0, but nothing else is - I would think lots of people running 22.05 with ix interfaces - why is the board not a flame with people saying their vlans are not working.
This is something pretty unique in this setup that is causing it.. Just missing what that something is in trying to solve the puzzle.
btw: I will fire up the mirror port on my netgear tmrw, unless something breaks or catches fire I have a pretty empty cal tmrw with real life work ;)