VLANs stop working after upgrading from 24.11 (for both 25.07.1 + 25.11)
-
Finding today non-native VLANs can't even ping the VLAN gateway. Yet, the clients receive DHCP?
On VLAN firewall rules, Set #1 position rule for ANY ANY even, and still nothing.
Client on PVID with pass rule can ping both VLAN gateways.
Firewall Logs on the VLAN interfaces say passing traffic (including ICMP to gateway), no blocking.
I did a config compare and found no tangible differences either?
I am testing 25.11 today, and it is the same behaviour. Firewall logs show "PASS" ICMP attempt to VLAN gateway, but client gets timed-out??
I've got 2 VLANs I need in particular, and both have this problem of not handling traffic any longer after upgrading?
Running on HP t730, NIC = NetXtreme II 1000 Express (QLogic NetXtreme II BCM5709) quad port.
I have match rules in-place but only for logging purposes.
-
Per https://redmine.pfsense.org/issues/16581
It inspired me to consider System Tunables, perhaps one of these is relevant to this issue, which I think are all custom input ones over the years:net.inet.ip.fastforwarding = 1
net.link.ether.inet.log_arp_movements = 0
net.inet.ip.intr_queue_maxlen = 1000
net.link.ether.inet.max_age = 1200
net.isr.dispatch = deferred
net.inet.tcp.nolocaltimewait = 1
net.inet.rss.enabled = 1
net.inet.rss.bits = 6
net.inet.tcp.soreceive_stream = 1
net.inet.udp.checksum = 1
net.inet.ip.portrange.first = 1024
net.inet.tcp.drop_synfin = 1
net.inet.tcp.recvspace = 65228
net.inet.tcp.sendspace = 65228
net.inet.udp.maxdgram = 57344
kern.ipc.maxsockbuf = 4262144
net.raw.recvspace = 65536
net.raw.sendspace = 65536
net.inet.raw.recvspace = 131072
net.inet.raw.maxdgram = 131072 -
Hmm, so clients in those VLANs receive DHCP leases from pfSense in the correct subnets?
That implies a functioning layer 2.
What firewall rules do you have on those? They show states and traffic?
If you run a packet capture on the parent NIC you see the incoming pings but no replies?
The VLANs here are all on the bce NICs I assume?
Check the hardware off loading options in use withifconfig -vm bce0. -
@petrt3522 said in VLANs stop working after upgrading from 24.11 (for both 25.07.1 + 25.11):
I have match rules in-place but only for logging purposes.
Do they have "Quick" set on the match rules? If so, uncheck that.
-
@petrt3522 Seems to be an issue with Kea DHCP, I have the same issue when I revert to ISC DHCP (Deprecated), seems to resolve the issue. If you check your DHCP lease when it's set to Kea, my VLANS do not show up, reverting to ISC DHCP (Deprecated) they show up and IP's start to get assigned again.
-
@Kayvil Got it. I have already have been using ISC, and just confirmed that didn't get auto changed during the upgrade.
-
@petrt3522 You are correct, I though it has solved my issue, My devices now get IP addresses, but no connection to the internet.
-
@stephenw10 Yes, Clients do receive correct leases.
On those interfaces themselves, none. I've also tried inserting ANY ANY rules, and no change in connectivity.
Packet Capture on BCE1 (LAN) does show the ping, but records "Echo (ping) request id=0xec42, seq=1/256, ttl=255 (no response found!)"
You got it, BCE.
I have turned-off hardware offloading, and no change. However, ifconfig -vm bce0 still shows "VLAN_HWCSUM" in the active options. I did disable in runtime with "ifconfig bce0 -vlanhwcsum". Also, no change.
-
@Derelict It looks like that was resolved with this 25.11 version, but I still went through and unchecked "quick" on all the match rules, even the disabled ones. No change, still no Internet for the VLANs.
-
@petrt3522 VLANs work for everybody else so...
-
@Bob.Dig That's brilliant. Thanks for contributing something helpful in this help forum.
-
Yes, sorry that's why I asked about match rule in the bug but I missed your response here. There was a bug that is now fixed but some users may have been relying on the broken behaviour.
Previously the quick flag on match rules was not being honored such that traffic would go on the be passed by other rules and that is no longer the case. So if you have match-quick rules traffic matching it will correctly not be passed.
That may apply to your situation. But it would apply to tagged and untagged traffic equally unless you only had match rules on VLAN interfaces.
Since DHCP is working correctly you do have the VLAN layer 2 functioning. Since traffic no longer passes it seems like a firewall rule problem.
Do you actually see firewall logs showing passed traffic of just a lack of blocked traffic logs?
Check state table while pinging from a client on the VLAN. Do you see states created?
-
@stephenw10 It is showing actual Passed in the firewall logs; as in, green check mark matching the rule I have in-place to allow out. To include a ICMP rule from 'VLAN Subnet' -> 'VLAN Address'.
Confirmed States are being created; I have attached:
-
Ok so 192.168.4.226 is the pfSense interface in the VLAN? And whilst it shows two traffic or pings with replies the client at 192.168.4.232 never actually sees the replies?
The state on the WAN is interesting since it shows double the packet count in one direction. Implying maybe reflected packets or maybe mis-routed replies are somehow being sent from there.
So try running packet captures on those interfaces to see if those replies are actually leaving the LAN side interface.
-
@stephenw10 You're quite right, added the oddity to Imgur post.
Single-packet ping, one per interface capture, to "69.162.81.155".
I added to bottom of https://imgur.com/a/0qEmFFJ
3rd img: WAN
4th img: VLAN
5th img: WAN #2 test -
Ok interesting. So the reply comes back but never makes it back onto the VLAN even though the state shows it succeeding.
What is in that ICMP unreachable packet?
It seems like it's maybe not being translated on the way back somehow.
Can we see the firewall rule(s) you have passing that ping?
-
@stephenw10 Diving deeper gets me a bit lost in seeing what might be relevant.
From the 1st of the 2 WAN captures:
https://pastebin.com/UYBZkTkq
and this is the data:
https://pastebin.com/Q8RD0itbFrom the 2nd of the 2 WAN captures:
https://pastebin.com/fKrHe8Gw
and this is the data:
https://pastebin.com/8QtUMzhFFor the rules, it is the 2nd active rule, see here:
https://imgur.com/SWDg0ER
https://imgur.com/6iRSW6e
https://imgur.com/VURrZfl
Let me know if you need me to go further down the nesting of the aliases. These are supposed to encompass local based networks, as well as multiple VPN tunnels. -
Hmm, curious. The packet it's replying to as 'Destination unreachable' is to it's own IP address and in both cases as code 1 'host unreachable' administratively prohibited.
So it looks like there is no state for it the firewall is rejecting it. Though it's rejecting rather than just dropping which would not be the default.
The say that it should be the 2nd active rule there, the ICMP pass rule, but that is showing zero states or packets. It looks like it's actually being passed by the 4th rule.
This still feels a lot like the previously mentioned match-quick issue. That behaviour definitely changed in 25.11 and it can produce some odd things if traffic no longer being parsed after the match.
What floating rules do you still have?
-
@stephenw10 It must have been the reorder of rule I did just prior to screenshot that erased the values (moving the disabled rule up to the top to ease the reading of it page).
I just booted back into Env and pinged, it is reflected in the 2nd rule for each ping.
There shouldn't any applicable rules remaining; I have included the screenshot here; warning there are many. I attempt to place a red circle to indicate rules that would have any applicable to the VLAN at all.
https://imgur.com/BiDFtFC
-
Hmm, try disabling those buffer-bloat match rules as a test. That's about the only thing I could imagine affecting this.
Unless you have any rules set to pass without creating a state?