VLAN routing failing under stress
-
Sorry for the long post. Trying to put in all the relevant detail.
I'm experimenting with VLANs (prior to deploying it in the main environment) and have run into an issue I'm not sure how to trouble shoot.
I installed PfSense on an old mini-desktop with a three 1 gigabit ports. One back to the main network (aka the internet) and the other two a small managed gigabit switch, I have configured LACP between the two ports on the PfSense and 2 on the switch.
I then created another interface using a VLAN (50) using a completely separate subnet.. Setup DCHP on it.
I configured the switch to send VLAN 1 packets to Pf without a TAG and VLAN 50 packets to pf with a TAG. I configured some ports to VLAN 1 and some to VLAN 50.
The VLAN interface has the following Firewall Rules
Pass - TCP/UPD - From Any - to VLAN50 address - port 53 (DNS)
Block - Any - From Any - To LAN subnets - port any
Pass - Any - From Any - To Any - port AnyThe LAN interface only has pass rules.
Everything is working until I stress the connection.
I connected a TrueNas server into a VLAN 50 port and started iperf3 -s on it. Then from a windows machine on VLAN 1 I have a cmd prompt ping the TrueNas every few seconds and then using a 2nd prompt I run iperf3 it works and I get around 900Mbps. But if I try to do iperf3 --bidir only some of the data actually transfers and the ping commands start timing out.If I switch the Windows machine over to a VLAN 50 switch port everything works perfectly.
CPU load pf never went above 5%, Ram is hovering around 7%
I also tried iperf to something down the WAN port and that works fine (a little slow since it's a single gigabit connection there and not a LAG)
-
@davep1328
Little more info.
I removed the LAG and I am still seeing the same issue with loss of connection. I did this time see the CPU usage jump to 25%, this is 4 core machine so sounds like it maxed out a core. I also had it completely stop routing traffic from LAN to anywhere (including WAN and the UI). I was able to get in via another physically link (I setup a cheep USB eth dongle as a completely separate interface as an "All else fails get me back in" when I was setting up the LAG that I have since removed)I can't have this dropout occur in the production environment.
-
Ok, I think I figured it out. I tried another piece of hardware and that is working a lot better. I checked, the original machine used some cheap China NICs, this new one is all Intel.
At this point I'm calling it a hardware issue.