Vmware vmxnet3 nic vs. e1000 vs. hardware-install - throughput performance
-
So… I've been seeing all these posts saying the vmxnet3 nic driver in vmware will give you better performance over the e1000 driver.
But you would get the best performance by installing pfsense directly onto your hardware.Tonight i set out to test these claims.
My test setup:
A shuttle XG41 with dual NICS. Intel Core 2 Duo E6400, 4096Mb Ram. And two laptops with lots of CPU power and gigabit nics. The laptops are running knoppix, and all bandwidth tests was done using iperf.
I used Pfsense 2.1 beta0 in my test. (I've done the same test with pfsense 2.0.1, the results are the same.)
The fist test i did was to connect the laptops with a crossover cable and test their maximum speed. Result 939Mbit
I then proceeded to install vmware esxi 5.0 on the shuttle, and in this i made a virtual 64bit freebsd machine with 2vcpu's and 1024mb ram and 2xe1000 nics. The wan interface got an ip called 10.0.0.15/24 and the lan 192.168.1.1/24. The laptops were then configured with an IP in their respective ranges, and a any any rule was created.
In the tests below laptop1 was connected straight to nic1 of the shuttle and laptop2 was connected straight to nic2.
Pfsense with E1000 nics. Result 850mbit cpu at 100%
I then installed the vmware drivers by using the guide from this post: http://forum.pfsense.org/index.php/topic,34043.0.htm (the post fom pfSense.User.1138)
Pfsense with vmxnet3 nics. Result 852mbit cpu at 100%
Then i took a live-cd with pfsense 2.1 image, put it on a USB-key and booted the shuttle on it, I then ran another test. Result 939mbit cpu at 30% according to the dashboard
The main reason i'm writing this post, is because i've had trouble finding information like this in the forums. Hopefully this will help someone else.
I've attached some screenshots on CPU usage in vmware when the iperf test was done, with e1000 and vmxnet3 drivers.
-
some screenshots
-
Very nice. Thanks for doing that.
-
Interesting test, thanks for sharing.
I wonder what the numbers would be with pf disabled (no nat, no packet filtering).
-
Are you talking about a VM setup with that disabled?
-
Yes, although it's pretty safe to assume that the bottleneck is due to VM Ethernet driver …
-
Great post.
Like an oasis of fact in a desert of speculation! :)Steve
-
would the results be different when the hardware supports virtualization technology? (intel VT, AMD-v).
I wonder if the vmxnet drivers would benefit from them techs.Thanks for the tests by the way :)
-
Hi,
i've recently started using pfsense again and it's running as a VM on my NAS.
i am curious to know, if one would get the same results using VT-d. i can pass the NICs directly to the pfsense VM. The reason why i haven't done this yet is because i have another VM that is a heavy downloader (WAN-speed is 128 Mbit). My thoughts were: with both VMs using the same controller, the traffic would stay within the hypervisor. If i dedicate the NICs to the pfsense VM only, i assume that traffic would have to leave the ESXi-Host and travel back through the switch.
Am i guessing correctly? Would that extra traffic be negligible compared to the stress i save the CPU?
Thanks.
-
Interesting test, thanks for sharing.
I wonder what the numbers would be with pf disabled (no nat, no packet filtering).
Allright, so i disabled pf under "system - advanced - firewall/nat". I then ran the test using the e1000 driver, and the vmxnet3 driver. The results are similar. 100% cpu in vmware graphs, 850mbit throughput.
-
I've also tried enabling/disabling TSO, powerD, fast tcp forwarding etc… But so far i haven't been able to get above the 850mbit marker.
-
notice your media is set at 10gbaseT. i'm using open vmtools and my all my intel gigabit cards with vmxnet3 are only recorded as 1000baseT
-
notice your media is set at 10gbaseT. i'm using open vmtools and my all my intel gigabit cards with vmxnet3 are only recorded as 1000baseT
I'm using the vendor supplied vmtools, and 10Gbit is their default speed. Source:http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1013083
-
Thanks for doing this testing.. Could you see what cpu usage you get when your not maxing out the pipe.
Put a switch between, and on the 1 box set interface to 100full and then on pfsense you got Gig – so max your going to see is 1/10 of what nic can do.. Is the cpu usage less in this mode on the vmxnet3?
This would be more of a setup you might see in normal usage -- isp is not always a gig connection, and you rarely see 100% saturation of the line etc.
-
ah, i knew there was a reason why i didn't pursue this further at the time. i was using vlans and the vmxnet driver didn't support that at the time. there is a patch but i don't want to apply it at the moment on 2.1_x64 and it doesn't appear that the nics are detected as i've added a new one with the vmxnet3 driver and pfsense isn't seeing it at the moment.
miloman, seeing as your interfaces have been found under the vmxnet3 driver, could you try and see if you can add a vlan for that interface? if so, i might push it a little further to get them working. -
Thanks for doing this testing.. Could you see what cpu usage you get when your not maxing out the pipe.
Put a switch between, and on the 1 box set interface to 100full and then on pfsense you got Gig – so max your going to see is 1/10 of what nic can do.. Is the cpu usage less in this mode on the vmxnet3?
This would be more of a setup you might see in normal usage -- isp is not always a gig connection, and you rarely see 100% saturation of the line etc.
I see where you're going. I'll be doing this test later today.
-
ah, i knew there was a reason why i didn't pursue this further at the time. i was using vlans and the vmxnet driver didn't support that at the time. there is a patch but i don't want to apply it at the moment on 2.1_x64 and it doesn't appear that the nics are detected as i've added a new one with the vmxnet3 driver and pfsense isn't seeing it at the moment.
miloman, seeing as your interfaces have been found under the vmxnet3 driver, could you try and see if you can add a vlan for that interface? if so, i might push it a little further to get them working.Vlans are indeed supported under PfSense 2.1_beta0 with the vmxnet3nic. You don't need to use the patch.
For me the vmxnet3 NIC was essential for getting better performance/througput. But it was useless in a production setup seeing vlan tagging wasn't supported. After my tests, i don't see why i should bother installing the driver and introducing a potential vmware tools/driver crash when the performance of the e1000 is pretty much the same.
-
thanks for the testing. does make me wonder why my vmxnet3 interfaces ain't showing. i'm using 2.1_x64 beta0 and when setting the driver to vmxnet3, pfsense doesn't see the additional interfaces.
vmxnet.ko is loaded and has corrected permissions but still doesn't show. -
I see where you're going. I'll be doing this test later today.
So for example my internet connection is about 16MBps sustained - sure it boosts to like 25, but on say a sustained download it levels off at about 16MBps – so maybe in this scenario e1000 causes 40% cpu while vmxnet3 only uses 30% ?
-
Here ya go…
Throughput capped to 100mbit using a switch.
Test with computers connected to each other only by using a switch = 96.5Mbit (this number is used for reference as to which speeds are possible without any firewalling)
Test with firewall in between doing the routing/firewalling = 94.5mbitYou can see the CPU usage in the screenshot i've attached. In this test the vmxnet3 driver uses a bit less cpu than the e1000. But i'm not impressed.
-
thanks for the testing. does make me wonder why my vmxnet3 interfaces ain't showing. i'm using 2.1_x64 beta0 and when setting the driver to vmxnet3, pfsense doesn't see the additional interfaces.
vmxnet.ko is loaded and has corrected permissions but still doesn't show.In this thread i've linked to the guide i used to install the vmware tools supplied with esxi 5.0. Those worked for me.
-
Is that with ESXi correct, I would image that direct install would be closer to the direct connect speed like gigabit speeds are.
-
Is that with ESXi correct, I would image that direct install would be closer to the direct connect speed like gigabit speeds are.
My bad… I should've written that the computers were connected to each other using a switch. I've edited my post.
-
Hi,
i've recently started using pfsense again and it's running as a VM on my NAS.
i am curious to know, if one would get the same results using VT-d. i can pass the NICs directly to the pfsense VM. The reason why i haven't done this yet is because i have another VM that is a heavy downloader (WAN-speed is 128 Mbit). My thoughts were: with both VMs using the same controller, the traffic would stay within the hypervisor. If i dedicate the NICs to the pfsense VM only, i assume that traffic would have to leave the ESXi-Host and travel back through the switch.
Am i guessing correctly? Would that extra traffic be negligible compared to the stress i save the CPU?
Thanks.
Hi,
I just tested this - I have a pfSense VM running with 2NIC passed through using IOMMU. I decided to run an iperf server on the pfSense and an iperf client on my laptop, connected to the pfSense via a gigabit switch. I found that
- pfSense CPU usage went to 10% due to iperf
- Throughput around 94mbits/s
I am slightly confused as to why this is happening - as far as I am aware all devices on my network are gigabit capable, so I'd have to look into this, but indications are that one can expect to see full gigabit throughput when using IOMMU passthrough.
Regarding traffic staying in hypervisor or leaving to the switch, it would be a case of using an extra port on your switch (like I have). Most switches worth their salt have enough internal bandwidth to shift traffic along all interfaces without slowing. So the only cost would be an extra port used on your switch.
To comment on my particular situation, I was irritated by the increased power consumption by a shared device (for the LAN, WAN was still passed through). This was caused by CPU consumption rising dramatically. I calculated I would only have been capable of 50-100mbits through from LAN<->WAN (no good if my WAN speed increases, but fine for now). As a result, I also passed through the LAN device. This resulted in stable power consumption when data was passing LAN<->WAN and no noticeable CPU usage. Unfortunately, because the drivers are not as good in freeBSD as they are in Linux (Xen Hypervisor), the idle power consumption of the system as a whole has risen by a couple of Watts due to two NICs being controlled by pfSense. My choices are mostly motivated by power consumption concerns.
Hope this was helpful.
Regards,Yax
EDIT:
100Mbit speed was caused by my cat5e not allowing gigabit speeds. Cat6 colved that problem. Now I see approx 550Mbit/s from client-server through the switch, with pfSense using approx 40% CPU (but only 5% is reported as iperf). Testing with a direct connection to pfSense shows a throughput of 550Mbits/s also. Not really sure why.
-
"But i'm not impressed."
Yeah it doesn't look like much there, but when talking CPU cycles on a VM - even if small difference, over time that adds up.
Again thanks for taking the time to actually test these drivers - I run the vmxnet3 on all my other vms other than pfsense. With vmxnet3 drivers I could not vpn into my work from client behind pfsense. With e1000 connects no problem - strange. Good to see it's not all that much of difference in performance.
-
"But i'm not impressed."
Yeah it doesn't look like much there, but when talking CPU cycles on a VM - even if small difference, over time that adds up.
Again thanks for taking the time to actually test these drivers - I run the vmxnet3 on all my other vms other than pfsense. With vmxnet3 drivers I could not vpn into my work from client behind pfsense. With e1000 connects no problem - strange. Good to see it's not all that much of difference in performance.
On all my windows servers etc. i'm using the vmxnet3 nic. The performance on the nic is great, especially when the traffic stays withing the hypervisor. :)
These tests was done to fond out if you would gain performance by using the vmxnet3 adapter instead of the e1000 on pfsense. And the answer to that, according to my tests, is no.
-
100Mbit speed was caused by my cat5e not allowing gigabit speeds. Cat6 colved that problem. Now I see approx 550Mbit/s from client-server through the switch, with pfSense using approx 40% CPU (but only 5% is reported as iperf). Testing with a direct connection to pfSense shows a throughput of 550Mbits/s also. Not really sure why.
Stupid Cables. Anyway, it sounds like a bus limitation (PCI perhaps) on that max of 550MBits/sec.
-
Btw, a Cat5E cable can do Gigabit in just fine as long as all wires are connected (all 4 pairs, not just the 2 data pairs) and there are no faults in the cable. A decent straight Cat5 should even be able to do Gigabit in short runs, like 50 feet or shorter depending on the quality of the cable and external interference.
There are cables marked as Cat5 that are only the 2 data pairs connected, these will only do 100Mb as Gb requires all 4 pairs to be connected correctly (maintain twists across the correct pairs, etc.) Many crossover cables only connect the 2 data pairs and only connect at 100Mb.
The "Cat"egories of cables specify electrical specifications, such as crosstalk, inductance, capacitance, etc, but not always the number of wires in the cable itself.
Cat6 would be required for 10Gb in short runs (again, depends on the interference and such), Cat6A supports 10Gb up to the full Ethernet Segment length of 100 meters.
-
Yup, I know. I have 5e, I thought I tested it to run gigabit, but it was limiting me in these tests, and I have more cat6 lying around than cat5e so I just switched it.
Good call on bus bandwidth, I had just resigned myself to not knowing the cause (admittedly having not tried very hard to figure it out). When I first read the comment, I got a little scared because I thought my pfSense box was crap (after all the time I invested), but then I had the sense to run an iperf test from 1 virtual machine to the pfSense (different interfaces, going through the switch, 1 virtual bridge connected to the VM, 1 directly connected to pfSense). This time I caught 942Mbits/s, and around 60% CPU usage by pfSense (50% interrupt, 7% system, 3% iperf). Not an expert, but I assume the interrupt time is completely unrelated to virtualisation (passed through NIC), and we can conclude that passthrough is indeed just as good as running baremetal?
-
…and we can conclude that passthrough is indeed just as good as running baremetal?
If you have 1000mbit, and you're using 942mbit, then yes. You won't be able to push any more data through that pipe. But when you're running a virtual box, it be a firewall or a server, it will never be as fast as running it baremetal.
-
What is a "virtual box" in this context? My pfSense is a virtual machine, but with exclusive control over the NICs. It is not running baremetal, but it achieved the same throughput as a baremetal server.
-
virtual box = everything you're running virtualized. Vmware, Hyper-V, KVM….
when you have to go through a layer of virtualization, you will lose some performance. that's just the way it is.
-
What kind of performance loss are you thinking? I would argue with modern technologies and a correct setup, the loss is negligible. For example, pfSense doesn't need to do much disk writing, therefore the performance lost on emulating disk I/O is low. But network traffic is what pfSense is all about - here, with pfSense given exclusive control via VT-d of network interfaces, although not running baremetal, no network throughput is lost. So, apart from a little CPU-time for emulated disk I/O, what performance has been lost?
-
All packets going through your pfsense needs to be inspected. For this pfsense uses your cpu… It's not always about the NIC.
If you don't know how big a performance hit you're facing, then make a post like i did. Do some cpu/bandwidth performance tests and post the results. I would be happy to see what you would come up with. :)
-
Ok so heres the setup.
pfSense, as a HVM Xen guest with two NICs passed through.
A windows machine, running as the iperf server.
The xen Domain-0, with 1 NIC shared in bridged mode and paravirtual drivers.Since pfSense is my gateway machine, I reconfigured only the WAN interface to be static on 10.0.0.1 with gatewat 10.0.0.2 - the Windows machine was given 10.0.0.2 and plugged into the WAN port of the VM box. The LAN interface that pfSense is using is plugged into a gigabit switch. The shared LAN interface is plugged into the switch also, so traffic is going like so
Client -> switch -> pfSense LAN -> pfSense WAN -> serverResults:
Network throughput measured several times with 20s window, it varied from 929 to 934 across 5 runs. pfSense CPU usage was monitored crudely via top, it was roughly at 40%-50% usage, with 35%-40% interrupt, and 5%-10% system time. -
The test you've done is kinda meh, unless you post some numbers with a baremetal install on the same hardware.
You could even make your own thread with some pictures of graphs, the hardware specs and stuff. I would be most delighted to read it! :)
-
How is that? Since what he is testing it the difference between 2 virtual drivers, the e1000 vs the vmxnet3. Bare metal performance has little to do with it as far as I can see.
-
Without the VM, I only have the server box and my desktop for testing. My laptop has a bus limitation, so is not useful. Therefore I conduct tests with pfSense+iperf server, and desktop iperf client.
With a pfSense in VM (1 VCPU, 256MB), traffic averaged 933Mbit/s. CPU time monitored via top was 50-60%.
With a pfSense baremetal (8 VPCU, 16GB), traffic averaged 939MBit/s. CPU time monitored via top was 25-30%.Please note, I did not take the time to make the firewall rules the same.
Looks like there is some CPU performance hit there, but a PCI-passthrough NIC should improve your ability to reach maximum network throughput, which is what this thread is about.Interestingly, I noted that running pfSense baremetal is less power efficient than running in my VM setup. My idle power consumption is 20W (at the wall), and 27W power is consumed during iperf tests. By contrast, pfSense baremetal runs at 30W Idle, and 35W during iperf tests.
What I think can be concluded is that if using a VM, and hardware that is IOMMU capable, throughput can be increased over virtual network drivers by assigning exclusive control of NICs to pfSense. Perhaps someone with vmware and the time to test can provide some quick test results?
-
Without the VM, I only have the server box and my desktop for testing. My laptop has a bus limitation, so is not useful. Therefore I conduct tests with pfSense+iperf server, and desktop iperf client.
With a pfSense in VM (1 VCPU, 256MB), traffic averaged 933Mbit/s. CPU time monitored via top was 50-60%.
With a pfSense baremetal (8 VPCU, 16GB), traffic averaged 939MBit/s. CPU time monitored via top was 25-30%.Please note, I did not take the time to make the firewall rules the same.
Looks like there is some CPU performance hit there, but a PCI-passthrough NIC should improve your ability to reach maximum network throughput, which is what this thread is about.Interestingly, I noted that running pfSense baremetal is less power efficient than running in my VM setup. My idle power consumption is 20W (at the wall), and 27W power is consumed during iperf tests. By contrast, pfSense baremetal runs at 30W Idle, and 35W during iperf tests.
What I think can be concluded is that if using a VM, and hardware that is IOMMU capable, throughput can be increased over virtual network drivers by assigning exclusive control of NICs to pfSense. Perhaps someone with vmware and the time to test can provide some quick test results?
You need another test with a virtual pfsense NOT using PCI-passthrough. Then you'll be able to compare the results.
Another thing is your virtual pfsense has 1 cpu, but the barebone has 8… That might have an impact on the energy consumption as well.
-
I'm not going to conduct a formal test with no passthrough. I enabled passthrough on the WAN interface so that there was no possibility of traffic going anywhere except through pfSense. The LAN was a shared interface and I had already determined I wasn't going to get more than 100mbits throughput. As a result, I moved the LAN to having a dedicated device as well.
Regarding power consumption, yes, pfSense has 1 CPU but the whole rig has 8 still. All cpu are controlled by the host (using linux, with better cpufreq drivers). As a result, idling in the VM uses less power than idling in baremetal pfSense.