Large Receive Offload (LRO) – Big Troubles with VMXNET3 on HP Proliant G8/G9
-
We are running ProLiant DL360 Gen8 and Gen9 servers with model 331 Gigabit Ethernet Controller based on the Broadcom BCM5719 chipset; 1-2 RJ45 ports are used out of 4, VMWare ESXi version from 6.0 to 6.5.0 with all Hewlett-Packard drivers (highest version of ESXi that we’ve used is HPE Customized Image ESXi 6.5.0 version 650.9.6.5.27 released on May 2017 and based on ESXi 6.5.0 Vmkernel Release Build 5146846).
We have tried pfSense version from 2.3.0-RELEASE to 2.4.0-BETA (built on Fri May 26 19:15:04 CDT 2017), Open-VM-Tools package 10.1.0,1.
pfSense is used as a NAT router for the virtual machines on the same host. All virtual machines have 1-2 VMXNET3 adapters.
If I un-check the “Disable hardware large receive offload” option to enable hardware large receive offload – the virtual machines that are routed via pfSense have very low upload speed (about 1/500th of their normal speed) or drop connections. To get their speed back to normal, I have to check this option ON.
Other hardware offload options do not have problems – i have them unchecked to enable hardware offload of checksums and TCP segmentations.
The Broadcom BCM5719 chipset, that supports Large Receive Offload (LRO) is quite cheap and ubiquitous, released in 2013. VMWare has added support of hardware LRO to VMXNET3 also in 2013. In Windows, LRO is supported since Windows Server 2012 and Windows 8 (since 2012). FreeBSD supports it from version 8 (since 2009), and Linux also supports hardware LRO now (don't know from which version).
Here is what pfSense deverlopers think about the hardware offload capabilities (https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards): “The settings for Hardware TCP Segmentation Offload (TSO) and Hardware Large Receive Offload (LRO) under System > Advanced on the Networking tab default to checked (disabled) for good reason. Nearly all hardware/drivers have issues with these settings, and they can lead to throughput issues. Ensure the options are checked. Sometimes disabling via sysctl is also necessary.”
What a pity that pfSense prefers to just disable them rather than to cooperate with respective developers to resolve the incompatibilities!
There were some bugs in FreeBSD drivers for the E1000 adapter, and FreeBSD developers have fixed them for E1000, but not for VMXNET3 – see https://forum.pfsense.org/index.php?topic=96325.0 and https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199174
I saw in the forums that people discourage using VMXNet, in favour of E1000 (https://forum.pfsense.org/index.php?topic=98309.0): „We saw much better performance from the E1000 than VMXnet2 and 3”.
There is a VMWare blog on the benefits of LRO for Linux and Windows – see https://blogs.vmware.com/performance/2015/06/vmxnet3-lro.html According to this blog entry, LRO saves valuable CPU cycles, and is also very beneficial in VM-VM local traffic where VMs are located in the same host, communicating with each other through a virtual switch.
I suspect that the problem is more likely at the FreeBSD (pfSense) side, not at the VMWare aide, because Windows machines from our servers connected to Internet either directly or via pfSense have LRO enabled and don’t have performance degradation.
How can we get this issue investigated and fixed?
[[Update:]]
There VMWare Tools available as ISO for FreeBSD as a separate download from VMWare. The tools for FreeBSD are dated 17 May 2017 (VMware-Tools-10.1.7-other-5541682), I wanted to install them, but they did require compat6x-amd64, and they are not available as "pkg install" from the pfSense repository.
It would be great to have updated pfSense package for VMWare tools to the latest version. VMware-Tools are now 10.1.7 Open VM Tools is now version 10.1.5.
But the version available via pfSense GUI is 10.1.0. Maybe the LRO issue is already fixed in 10.1.5 or 10.1.7. Even if it is not fixed, I can test it.
-
TSO and LRO are meant for workstations and servers/appliances, NOT firewalls or routers. Do not uncheck those.
It isn't a FreeBSD thing or a pfSense thing, the fundamental design of LRO is not compatible with routing/firewall roles.
-
Thank you! I have configure them "Checked" on the router. Anyway, the performance difference is so huge (and the connections drop) that allow me to conclude that there is a bug.
-
Hello,
I came across these articles while looking for more information on the LRO/TSO.
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2055140
https://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1027511So is the checkbox option tick in pfsense sufficient or do I need to follow up with the setting in esxi machine?
I am currently using VMXNET3. -
TSO and LRO are meant for workstations and servers/appliances, NOT firewalls or routers. Do not uncheck those.
It isn't a FreeBSD thing or a pfSense thing, the fundamental design of LRO is not compatible with routing/firewall roles.
-
TSO and LRO are meant for workstations and servers/appliances, NOT firewalls or routers. Do not uncheck those.
It isn't a FreeBSD thing or a pfSense thing, the fundamental design of LRO is not compatible with routing/firewall roles.
What I meant was, do I need to disable those options in esxi as well since i am using vmxnet3 adapter?
Or should i just switch to e1000 adapter? -
just leave them as default
-
TSO and LRO are meant for workstations and servers/appliances, NOT firewalls or routers. Do not uncheck those.
It isn't a FreeBSD thing or a pfSense thing, the fundamental design of LRO is not compatible with routing/firewall roles.
Hmmm, that's good to know. I had no idea, I've always had those two boxes unchecked. For whatever reason I haven't had any issues with it but it seems that it isn't the best/most efficient setup. - Are there any implications other than potentially reduced performance? i.e., since I didn't have any performance issues, is there any other negative impact from having been using these settings such as increased CPU load or anything?
Also, is there any difference how these work in a virtual v physical machine? Or is it purely whether the machine is a client or a router?@jimp - would it be possible to reword the GUI text on these in 2.4? I also was thrown off by the "broken in some hardware drivers, and may impact performance with some specific NICs" In my opinion this is pretty misleading after having read your above post. It makes it sound like you might have trouble if you happen to have a certain NIC, when in fact it seems that you will be misconfiguring your device if you are using it as a router - which almost everyone here is.
I would suggest something generally along the lines of:
Checking this option will disable hardware TCP segmentation offloading (TSO, TSO4, TSO6). This offloading is
broken in some hardware drivers, and may impact performance with some specific NICsintended for machines configured as clients, NOT routers. This will take effect after a machine reboot or re-configure of each interface.Checking this option will disable hardware large receive offloading (LRO). This offloading is
broken in some hardware drivers, and may impact performance with some specific NICsintended for machines configured as clients, NOT routers. This will take effect after a machine reboot or re-configure of each interface.