Inter-vlan traffic is rate limited as VM
-
I'm seeking advice if anyone else has seen this odd behaviour.
By way of background, my project is to move a bare-metal installation to a VM instance running ESXi. Testing for over a two-month period in the lab, it was operating flawlessly. However, moving and reconfiguring it as part of the vCenter cluster presented problems, most notably that any traffic crossing the configured vlans appear to be throttled - i've seen a constant 32KBps on SCP transfers for example. There is an overall sluggishness that is reproducible when using services across the vlans. By way of testing, no such issues occur when operating on the same vlan (ie within L2).
The VM has been configured with 4 vCPUs, 16GB ram and is operating over an iSCSI link to shared storage which contains the other VM's. For the networking, the VM has it's own Distributed Switch and trunked appropriately (0-4095) for VGT and pfsense handles all the VLANS identical to how it was running when bare metal. The physical switches (site-wide unifi) have two 10GbE LACP ports which are configured for trunks and jumbo frames. Portgroup security has also been configured to allow for promiscuous mode, forged transits and MAC address changes in the usual way.
There is indeed connectivity across the vlans (with FW rules both open for testing and also configured as they should be for production use) but i cannot understand why the traffic is being rate-limited in this way. Has anyone seen this before?
Pfsense is v2.7CE with ESXi 7.0.
Cheers.
-
So in this setup all traffic between VLANs is routed and filtered by pfSense?
Do you see the same throttling for traffic routed from a VLAN to a WAN? Assuming that is possible.
Do you have any traffic shaping configured?
Steve
-
@stephenw10 hi
Correct, the router is connected to the core switch and manages all VLAN traffic. I am not seeing any such restrictions on endpoints that are traversing the WAN; they are operating over the WAN at the link's speed. No traffic shaping configured either.
It appears to be affecting traffic traversing the VLANs and not out through the WAN. Curiously, any inbound VPN traffic to those VLANS also appears to be affected and i can see spikes in the ping replies to the VLAN devices when accessing any running http service.
Pops
-
@Popolou said in Inter-vlan traffic is rate limited as VM:
Curiously, any inbound VPN traffic to those VLANS also appears to be affected
How is that routed? From external clients?
-
@stephenw10 Via an OpenVPN instance configured and routed within pfsense.
-
But I mean it's external OpenVPN clients accessing resources on one of the VLANs?
Do you see the throttling in both directions?
-
@stephenw10 Correct, yes and simple ping responses which should be in the low tens of milliseconds are coming back as several hundred of milliseconds. The behaviour does appear to be in both directions.
It’s got me stumped.
-
You're using VMX NICs in ESXi?
Did you apply the recommended tuning?
https://docs.netgate.com/pfsense/en/latest/hardware/tune.html#vmware-vmx-4-interfaces -
@stephenw10, evening. Thanks and yes, set against both vmx0 & vmx1 for the interface carrying the nine Vlans and the other for the WAN.
Pops
-
They were already set or you just set them now? Probably need to reboot to apply if you did.
-
@stephenw10 no, been set as part of the VM transition a week ago.
-
Hmm, do you see the same throttling if you test to or from the firewall directly?
-
@stephenw10 Good question and no, it works normally as expected. There are no traffic issues or any signs of throttling on the management interface or other devices on the same management Vlan. But traversing beyond the L2 domain into another vlan and wham, the problem occurs.
-
Is it a 'hard' limit? If you look at the traffic graphs is it flat or spikes?
It 'feels' like it could be an asymmetric routing issue. If so it would be very spikey.
-
@stephenw10 Hi, very spikey. The snapshot below is of a single device in a DMZ (with everything else shutdown) transferring a 1GB file via SMB from a VM in the management Vlan: -
The traffic path is simply from the VM target -> pfsense -> VM recipient. All VM's are on the same host and use the same aggregated LACP connection. In future, i could separate the VM's into an isolated portgroup so that they do not go over the physical network but this is trivial for the matter at the moment.
Lows of <1MBps and maxing out at best 4MBps. Very unexpected behaviour.
Thanks
pops -
Hmm, I think I'd grab a pcap of that and see what's happening. I'd expect a bunch of retransmits. Could reveal an MTU issue.
-
@stephenw10 Thanks and yes that did show retransmissions but it turned out the solution was to disable hardware large receive and checksum offloads. Not something i disabled before for VM's but occasionally the fix. Clearly something about the hardware i need to investigate.
Thanks again for your efforts.
Pops -
Ah, nice catch!