Open-VM-Tools not operating correctly on VMs with >16 NICs
-
I've been having some trouble setting up VMware Tools or Open-VM-Tools on a pfSense 2.0.1 virtual machine. Turns out there is a bug that is present both in the current official VMware Tools as well as in the Open-VM-Tools package as supplied by pfSense. (I have tried version 8.7.0.3046 (build-313025) of the Open-VM-Tools package.)
I have opened a support ticket with VMware regarding this issue. I am including the text of the support case here which contains my analysis of the problem, as performed against Open-VM-Tools. Note that the error message listed in the support case is referencing a line number in the official VMware Tools source tree, of which I do not have a copy. The corresponding line number in Open-VM-Tools is guestInfoPosix.c:291.
ProductVMware vSphere ESXi 5.x
Product version 5
Severity 3 - Medium
Support level Production Support Agreement
Issue categoryFault/Crash
Issue description
VMware Tools Falls over with this error message after about a minute of operation:[ error] [vmsvc] MEM_ALLOC /build/mts/release/bora-65272/bora-vmsoft/services/plugins/guestInfo/getlib/getInfoPosix.c:279
I initially tried to run Open-VM-Tools rather than VMware Tools, and I got the exact same error (albeit with a different line number, but in the same file) with that toolkit. Since they are based on the same codebase, I figured I'd investigate the source code of Open-VM-Tools. It is safe to assume that the root cause is the same in both the official VMWare Tools (that I am requesting support with) as well as Open-VM-Tools (that is not supported). My analysis however has been done with the Open-VM-Tools source code (from version 2010.10.18-313025, which is the build I was trying with previously.)
The error happens inside the function ReadInterfaceDetails. The tools fall over on the ASSERT_MEM_ALLOC following a GuestInfoAddNicEntry call - so that the assertion will fail if GuestInfoAddNicEntry returns NULL.
Thus I investigated how GuestInfoAddNicEntry works. It would seem that the GuestInfoAddNicEntry returns NULL if there are more than NICINFO_MAX_NICS (seems to be 16 in the default build configuration). We're surpassing this limit because in our case the machine is a router/firewall with several VLANs configured and several OpenVPN tunnels, each bringing up a new network interface.
I will attach the source files from the version of Open-VM-Tools I have been investigating as attachments to the case, since the case manager won't allow me to paste them in-line.
Now, of course, I could attempt to patch this myself (I would probably crank up NICINFO_MAX_NICS, and then patch the bug in ReadInterfaceDetails which performs an erroneous assertion - the correct response to ReadInterfaceDetails returning NULL is to terminate the loop and stop adding interfaces, not to crash the program. The memory allocation is already checked inside ReadInterfaceDetails). But I would rather not do since this is a production VM that is critical for our entire environment - and I don't know what the consequence of increasing NICINFO_MAX_NICS might be - if there might be some unforeseen consequences (I guess the limit is there for a reason). It would also be helpful to get information about your bug 605821 in your internal bug tracker.
So I'd rather go through official support to get some kind of fix for the official VMware Tools. Are you able to help me with this?
Files attached /./Users/pvz/RDP Share/vmtoolsd.core.gz.gz
A core file from the crash may be helpful for debugging
/Users/pvz/Downloads/open-vm-tools-2010.10.18-313025.tar.gz.gz
The source code for the version of Open-VM-Tools I have been investigating. I am aware that this is covering an old version of Open-VM-Tools, but it seems to contain the exact same bug as the official VMWare Tools that likely contains the same code or very similar code in this case, which is the product I'm requesting support with here.Support Request Updates Help text for
Date
2012-06-21 14:56:07
I just realised I never actually mentioned the guest OS in question. The Guest OS is pfSense 2.0.1 (which is basically just FreeBSD 8.1 with some extra stuff on top).
2012-06-21 14:43:08
The build number of my version of ESXi was not available in the drop-down box. ESXi is version ESXi 5.0.0 721882.The VMware Tools I have been attempting to use are version 8.6.5.11852 (build-652272).
With some luck there may be an upstream patch for this issue Soon. It would be nice if the Open-VM-Tools package in pfSense were actually updated to reflect that version once it hits, since this seems like a quite common bug for pfSense users to trip over (since it happens when many network interfaces are configured, which is common in a router application).
Edit: I should add that I have checked in the source for the latest upstream Open-VM-Tools (open-vm-tools-2012.05.21-724730.tar.gz) and the bug is present in that version also.