Vmware vmxnet3 nic vs. e1000 vs. hardware-install - throughput performance

miloman

thanks for the testing. does make me wonder why my vmxnet3 interfaces ain't showing. i'm using 2.1_x64 beta0 and when setting the driver to vmxnet3, pfsense doesn't see the additional interfaces.
vmxnet.ko is loaded and has corrected permissions but still doesn't show.

In this thread i've linked to the guide i used to install the vmware tools supplied with esxi 5.0. Those worked for me.

podilarius

Is that with ESXi correct, I would image that direct install would be closer to the direct connect speed like gigabit speeds are.

miloman

@podilarius:

Is that with ESXi correct, I would image that direct install would be closer to the direct connect speed like gigabit speeds are.

My bad… I should've written that the computers were connected to each other using a switch. I've edited my post.

yaxattax

@gesshoku:

Hi,

i've recently started using pfsense again and it's running as a VM on my NAS.

i am curious to know, if one would get the same results using VT-d. i can pass the NICs directly to the pfsense VM. The reason why i haven't done this yet is because i have another VM that is a heavy downloader (WAN-speed is 128 Mbit). My thoughts were: with both VMs using the same controller, the traffic would stay within the hypervisor. If i dedicate the NICs to the pfsense VM only, i assume that traffic would have to leave the ESXi-Host and travel back through the switch.

Am i guessing correctly? Would that extra traffic be negligible compared to the stress i save the CPU?

Thanks.

Hi,

I just tested this - I have a pfSense VM running with 2NIC passed through using IOMMU. I decided to run an iperf server on the pfSense and an iperf client on my laptop, connected to the pfSense via a gigabit switch. I found that

pfSense CPU usage went to 10% due to iperf
Throughput around 94mbits/s

I am slightly confused as to why this is happening - as far as I am aware all devices on my network are gigabit capable, so I'd have to look into this, but indications are that one can expect to see full gigabit throughput when using IOMMU passthrough.

Regarding traffic staying in hypervisor or leaving to the switch, it would be a case of using an extra port on your switch (like I have). Most switches worth their salt have enough internal bandwidth to shift traffic along all interfaces without slowing. So the only cost would be an extra port used on your switch.

To comment on my particular situation, I was irritated by the increased power consumption by a shared device (for the LAN, WAN was still passed through). This was caused by CPU consumption rising dramatically. I calculated I would only have been capable of 50-100mbits through from LAN<->WAN (no good if my WAN speed increases, but fine for now). As a result, I also passed through the LAN device. This resulted in stable power consumption when data was passing LAN<->WAN and no noticeable CPU usage. Unfortunately, because the drivers are not as good in freeBSD as they are in Linux (Xen Hypervisor), the idle power consumption of the system as a whole has risen by a couple of Watts due to two NICs being controlled by pfSense. My choices are mostly motivated by power consumption concerns.

Hope this was helpful.
Regards,

Yax

EDIT:

100Mbit speed was caused by my cat5e not allowing gigabit speeds. Cat6 colved that problem. Now I see approx 550Mbit/s from client-server through the switch, with pfSense using approx 40% CPU (but only 5% is reported as iperf). Testing with a direct connection to pfSense shows a throughput of 550Mbits/s also. Not really sure why.

johnpoz

"But i'm not impressed."

Yeah it doesn't look like much there, but when talking CPU cycles on a VM - even if small difference, over time that adds up.

Again thanks for taking the time to actually test these drivers - I run the vmxnet3 on all my other vms other than pfsense. With vmxnet3 drivers I could not vpn into my work from client behind pfsense. With e1000 connects no problem - strange. Good to see it's not all that much of difference in performance.

miloman

@johnpoz:

"But i'm not impressed."

Yeah it doesn't look like much there, but when talking CPU cycles on a VM - even if small difference, over time that adds up.

Again thanks for taking the time to actually test these drivers - I run the vmxnet3 on all my other vms other than pfsense. With vmxnet3 drivers I could not vpn into my work from client behind pfsense. With e1000 connects no problem - strange. Good to see it's not all that much of difference in performance.

On all my windows servers etc. i'm using the vmxnet3 nic. The performance on the nic is great, especially when the traffic stays withing the hypervisor. :)

These tests was done to fond out if you would gain performance by using the vmxnet3 adapter instead of the e1000 on pfsense. And the answer to that, according to my tests, is no.

podilarius

@yaxattax:

100Mbit speed was caused by my cat5e not allowing gigabit speeds. Cat6 colved that problem. Now I see approx 550Mbit/s from client-server through the switch, with pfSense using approx 40% CPU (but only 5% is reported as iperf). Testing with a direct connection to pfSense shows a throughput of 550Mbits/s also. Not really sure why.

Stupid Cables. Anyway, it sounds like a bus limitation (PCI perhaps) on that max of 550MBits/sec.

matguy

Btw, a Cat5E cable can do Gigabit in just fine as long as all wires are connected (all 4 pairs, not just the 2 data pairs) and there are no faults in the cable. A decent straight Cat5 should even be able to do Gigabit in short runs, like 50 feet or shorter depending on the quality of the cable and external interference.

There are cables marked as Cat5 that are only the 2 data pairs connected, these will only do 100Mb as Gb requires all 4 pairs to be connected correctly (maintain twists across the correct pairs, etc.) Many crossover cables only connect the 2 data pairs and only connect at 100Mb.

The "Cat"egories of cables specify electrical specifications, such as crosstalk, inductance, capacitance, etc, but not always the number of wires in the cable itself.

Cat6 would be required for 10Gb in short runs (again, depends on the interference and such), Cat6A supports 10Gb up to the full Ethernet Segment length of 100 meters.

yaxattax

Yup, I know. I have 5e, I thought I tested it to run gigabit, but it was limiting me in these tests, and I have more cat6 lying around than cat5e so I just switched it.

Good call on bus bandwidth, I had just resigned myself to not knowing the cause (admittedly having not tried very hard to figure it out). When I first read the comment, I got a little scared because I thought my pfSense box was crap (after all the time I invested), but then I had the sense to run an iperf test from 1 virtual machine to the pfSense (different interfaces, going through the switch, 1 virtual bridge connected to the VM, 1 directly connected to pfSense). This time I caught 942Mbits/s, and around 60% CPU usage by pfSense (50% interrupt, 7% system, 3% iperf). Not an expert, but I assume the interrupt time is completely unrelated to virtualisation (passed through NIC), and we can conclude that passthrough is indeed just as good as running baremetal?

miloman

@yaxattax:

…and we can conclude that passthrough is indeed just as good as running baremetal?

If you have 1000mbit, and you're using 942mbit, then yes. You won't be able to push any more data through that pipe. But when you're running a virtual box, it be a firewall or a server, it will never be as fast as running it baremetal.

yaxattax

What is a "virtual box" in this context? My pfSense is a virtual machine, but with exclusive control over the NICs. It is not running baremetal, but it achieved the same throughput as a baremetal server.

miloman

virtual box = everything you're running virtualized. Vmware, Hyper-V, KVM….

when you have to go through a layer of virtualization, you will lose some performance. that's just the way it is.

yaxattax

What kind of performance loss are you thinking? I would argue with modern technologies and a correct setup, the loss is negligible. For example, pfSense doesn't need to do much disk writing, therefore the performance lost on emulating disk I/O is low. But network traffic is what pfSense is all about - here, with pfSense given exclusive control via VT-d of network interfaces, although not running baremetal, no network throughput is lost. So, apart from a little CPU-time for emulated disk I/O, what performance has been lost?

miloman

All packets going through your pfsense needs to be inspected. For this pfsense uses your cpu… It's not always about the NIC.

If you don't know how big a performance hit you're facing, then make a post like i did. Do some cpu/bandwidth performance tests and post the results. I would be happy to see what you would come up with. :)

yaxattax

Ok so heres the setup.

pfSense, as a HVM Xen guest with two NICs passed through.
A windows machine, running as the iperf server.
The xen Domain-0, with 1 NIC shared in bridged mode and paravirtual drivers.

Since pfSense is my gateway machine, I reconfigured only the WAN interface to be static on 10.0.0.1 with gatewat 10.0.0.2 - the Windows machine was given 10.0.0.2 and plugged into the WAN port of the VM box. The LAN interface that pfSense is using is plugged into a gigabit switch. The shared LAN interface is plugged into the switch also, so traffic is going like so
Client -> switch -> pfSense LAN -> pfSense WAN -> server

Results:
Network throughput measured several times with 20s window, it varied from 929 to 934 across 5 runs. pfSense CPU usage was monitored crudely via top, it was roughly at 40%-50% usage, with 35%-40% interrupt, and 5%-10% system time.

miloman

The test you've done is kinda meh, unless you post some numbers with a baremetal install on the same hardware.

You could even make your own thread with some pictures of graphs, the hardware specs and stuff. I would be most delighted to read it! :)

johnpoz

How is that? Since what he is testing it the difference between 2 virtual drivers, the e1000 vs the vmxnet3. Bare metal performance has little to do with it as far as I can see.

yaxattax

Without the VM, I only have the server box and my desktop for testing. My laptop has a bus limitation, so is not useful. Therefore I conduct tests with pfSense+iperf server, and desktop iperf client.

With a pfSense in VM (1 VCPU, 256MB), traffic averaged 933Mbit/s. CPU time monitored via top was 50-60%.
With a pfSense baremetal (8 VPCU, 16GB), traffic averaged 939MBit/s. CPU time monitored via top was 25-30%.

Please note, I did not take the time to make the firewall rules the same.
Looks like there is some CPU performance hit there, but a PCI-passthrough NIC should improve your ability to reach maximum network throughput, which is what this thread is about.

Interestingly, I noted that running pfSense baremetal is less power efficient than running in my VM setup. My idle power consumption is 20W (at the wall), and 27W power is consumed during iperf tests. By contrast, pfSense baremetal runs at 30W Idle, and 35W during iperf tests.

What I think can be concluded is that if using a VM, and hardware that is IOMMU capable, throughput can be increased over virtual network drivers by assigning exclusive control of NICs to pfSense. Perhaps someone with vmware and the time to test can provide some quick test results?

miloman

@yaxattax:

Without the VM, I only have the server box and my desktop for testing. My laptop has a bus limitation, so is not useful. Therefore I conduct tests with pfSense+iperf server, and desktop iperf client.

With a pfSense in VM (1 VCPU, 256MB), traffic averaged 933Mbit/s. CPU time monitored via top was 50-60%.
With a pfSense baremetal (8 VPCU, 16GB), traffic averaged 939MBit/s. CPU time monitored via top was 25-30%.

Please note, I did not take the time to make the firewall rules the same.
Looks like there is some CPU performance hit there, but a PCI-passthrough NIC should improve your ability to reach maximum network throughput, which is what this thread is about.

Interestingly, I noted that running pfSense baremetal is less power efficient than running in my VM setup. My idle power consumption is 20W (at the wall), and 27W power is consumed during iperf tests. By contrast, pfSense baremetal runs at 30W Idle, and 35W during iperf tests.

What I think can be concluded is that if using a VM, and hardware that is IOMMU capable, throughput can be increased over virtual network drivers by assigning exclusive control of NICs to pfSense. Perhaps someone with vmware and the time to test can provide some quick test results?

You need another test with a virtual pfsense NOT using PCI-passthrough. Then you'll be able to compare the results.

Another thing is your virtual pfsense has 1 cpu, but the barebone has 8… That might have an impact on the energy consumption as well.

yaxattax

I'm not going to conduct a formal test with no passthrough. I enabled passthrough on the WAN interface so that there was no possibility of traffic going anywhere except through pfSense. The LAN was a shared interface and I had already determined I wasn't going to get more than 100mbits throughput. As a result, I moved the LAN to having a dedicated device as well.

Regarding power consumption, yes, pfSense has 1 CPU but the whole rig has 8 still. All cpu are controlled by the host (using linux, with better cpufreq drivers). As a result, idling in the VM uses less power than idling in baremetal pfSense.