Vmware vmxnet3 nic vs. e1000 vs. hardware-install - throughput performance

yaxattax

What is a "virtual box" in this context? My pfSense is a virtual machine, but with exclusive control over the NICs. It is not running baremetal, but it achieved the same throughput as a baremetal server.

miloman

virtual box = everything you're running virtualized. Vmware, Hyper-V, KVM….

when you have to go through a layer of virtualization, you will lose some performance. that's just the way it is.

yaxattax

What kind of performance loss are you thinking? I would argue with modern technologies and a correct setup, the loss is negligible. For example, pfSense doesn't need to do much disk writing, therefore the performance lost on emulating disk I/O is low. But network traffic is what pfSense is all about - here, with pfSense given exclusive control via VT-d of network interfaces, although not running baremetal, no network throughput is lost. So, apart from a little CPU-time for emulated disk I/O, what performance has been lost?

miloman

All packets going through your pfsense needs to be inspected. For this pfsense uses your cpu… It's not always about the NIC.

If you don't know how big a performance hit you're facing, then make a post like i did. Do some cpu/bandwidth performance tests and post the results. I would be happy to see what you would come up with. :)

yaxattax

Ok so heres the setup.

pfSense, as a HVM Xen guest with two NICs passed through.
A windows machine, running as the iperf server.
The xen Domain-0, with 1 NIC shared in bridged mode and paravirtual drivers.

Since pfSense is my gateway machine, I reconfigured only the WAN interface to be static on 10.0.0.1 with gatewat 10.0.0.2 - the Windows machine was given 10.0.0.2 and plugged into the WAN port of the VM box. The LAN interface that pfSense is using is plugged into a gigabit switch. The shared LAN interface is plugged into the switch also, so traffic is going like so
Client -> switch -> pfSense LAN -> pfSense WAN -> server

Results:
Network throughput measured several times with 20s window, it varied from 929 to 934 across 5 runs. pfSense CPU usage was monitored crudely via top, it was roughly at 40%-50% usage, with 35%-40% interrupt, and 5%-10% system time.

miloman

The test you've done is kinda meh, unless you post some numbers with a baremetal install on the same hardware.

You could even make your own thread with some pictures of graphs, the hardware specs and stuff. I would be most delighted to read it! :)

johnpoz

How is that? Since what he is testing it the difference between 2 virtual drivers, the e1000 vs the vmxnet3. Bare metal performance has little to do with it as far as I can see.

yaxattax

Without the VM, I only have the server box and my desktop for testing. My laptop has a bus limitation, so is not useful. Therefore I conduct tests with pfSense+iperf server, and desktop iperf client.

With a pfSense in VM (1 VCPU, 256MB), traffic averaged 933Mbit/s. CPU time monitored via top was 50-60%.
With a pfSense baremetal (8 VPCU, 16GB), traffic averaged 939MBit/s. CPU time monitored via top was 25-30%.

Please note, I did not take the time to make the firewall rules the same.
Looks like there is some CPU performance hit there, but a PCI-passthrough NIC should improve your ability to reach maximum network throughput, which is what this thread is about.

Interestingly, I noted that running pfSense baremetal is less power efficient than running in my VM setup. My idle power consumption is 20W (at the wall), and 27W power is consumed during iperf tests. By contrast, pfSense baremetal runs at 30W Idle, and 35W during iperf tests.

What I think can be concluded is that if using a VM, and hardware that is IOMMU capable, throughput can be increased over virtual network drivers by assigning exclusive control of NICs to pfSense. Perhaps someone with vmware and the time to test can provide some quick test results?

miloman

@yaxattax:

Without the VM, I only have the server box and my desktop for testing. My laptop has a bus limitation, so is not useful. Therefore I conduct tests with pfSense+iperf server, and desktop iperf client.

With a pfSense in VM (1 VCPU, 256MB), traffic averaged 933Mbit/s. CPU time monitored via top was 50-60%.
With a pfSense baremetal (8 VPCU, 16GB), traffic averaged 939MBit/s. CPU time monitored via top was 25-30%.

Please note, I did not take the time to make the firewall rules the same.
Looks like there is some CPU performance hit there, but a PCI-passthrough NIC should improve your ability to reach maximum network throughput, which is what this thread is about.

Interestingly, I noted that running pfSense baremetal is less power efficient than running in my VM setup. My idle power consumption is 20W (at the wall), and 27W power is consumed during iperf tests. By contrast, pfSense baremetal runs at 30W Idle, and 35W during iperf tests.

What I think can be concluded is that if using a VM, and hardware that is IOMMU capable, throughput can be increased over virtual network drivers by assigning exclusive control of NICs to pfSense. Perhaps someone with vmware and the time to test can provide some quick test results?

You need another test with a virtual pfsense NOT using PCI-passthrough. Then you'll be able to compare the results.

Another thing is your virtual pfsense has 1 cpu, but the barebone has 8… That might have an impact on the energy consumption as well.

yaxattax

I'm not going to conduct a formal test with no passthrough. I enabled passthrough on the WAN interface so that there was no possibility of traffic going anywhere except through pfSense. The LAN was a shared interface and I had already determined I wasn't going to get more than 100mbits throughput. As a result, I moved the LAN to having a dedicated device as well.

Regarding power consumption, yes, pfSense has 1 CPU but the whole rig has 8 still. All cpu are controlled by the host (using linux, with better cpufreq drivers). As a result, idling in the VM uses less power than idling in baremetal pfSense.

miloman

@yaxattax:

I'm not going to conduct a formal test with no passthrough. I enabled passthrough on the WAN interface so that there was no possibility of traffic going anywhere except through pfSense. The LAN was a shared interface and I had already determined I wasn't going to get more than 100mbits throughput. As a result, I moved the LAN to having a dedicated device as well.

Regarding power consumption, yes, pfSense has 1 CPU but the whole rig has 8 still. All cpu are controlled by the host (using linux, with better cpufreq drivers). As a result, idling in the VM uses less power than idling in baremetal pfSense.

That's fine.. But it's impossible to conclude anything from your tests then.

FauxShow

I have a few questions:

Why run VLAN tagging with pfSense? Why not give leave the tagging up to ESX? This way pfSense passes everything untagged and ESX will tag it as it leaves. This method works fine for me, and prevents configuring the ESX for trunking.
Any benefit from running multiple physical NICs on the vSwitch? I'm running three 1-Gb NICs using "Route based on IP hash" load balancing, and I'm wondering if there is a benefit to running VMXNET3 or E1000. The VM still has one adapter per network though, so it's up to the ESX server to load balance and is transparent to the pfSense VM.

matguy

@FauxShow:

I have a few questions:

Why run VLAN tagging with pfSense? Why not give leave the tagging up to ESX? This way pfSense passes everything untagged and ESX will tag it as it leaves. This method works fine for me, and prevents configuring the ESX for trunking.

Any benefit from running multiple physical NICs on the vSwitch? I'm running three 1-Gb NICs using "Route based on IP hash" load balancing, and I'm wondering if there is a benefit to running VMXNET3 or E1000. The VM still has one adapter per network though, so it's up to the ESX server to load balance and is transparent to the pfSense VM.

1st question: For people with a single VM host (no clusters, or, at least, no v-motion) that idea is pretty much 6 of one 1/2 dozen of the other. Probably doesn't matter much, do whichever you're more familiar with. Personally, I'd do them at the ESX(i) host level as well, but feel free to do it either way. Unless you have the situation you talk about next…

2nd question: I believe so. For an ESX(i) host with multiple uplinks from its vSwitch, a VMXNET3 may help a lot. A VMXNET3 is presented to the VM as a 10Gb adapter, as such, it'll pass more than 1Gb of traffic to a VM. If you have multiple 1Gb connections you won't get more than 1Gb to any one destination, but you might be able to leverage more than 1Gb total to your VM, although I would not expect anywhere near a full 2Gb (mostly since the load balancing isn't based on load, so your two heavy destinations could easily end up on the same NIC.)

Although, I do remember something about some vNICs being able to communicate at "bus" speed, ignoring their stated connection speed and transferring as fast as possible. That was back in my VCP testing, so I don't exactly remember. But, it would be easy to test. Run 2 VM's on the same host, both with VMXNET3 vNICs, throw data around and see how fast it goes (generated data, though, files will be dependent on disk speed.) Note CPU usage, though, remember, the vNICs are virtualized, they take CPU to run; if I recall correctly, this is the original idea behind the VMXNET NICs.

You may notice I said "may help a lot" earlier. I did hear somewhere that the "speed" of your vSwitch may be "set" by the fastest physical NIC connected to it. Again, memory, not always as good as I'd like it to be; as well as my quick search google-fu.

FauxShow

@matguy:

@FauxShow:

I have a few questions:

Why run VLAN tagging with pfSense? Why not give leave the tagging up to ESX? This way pfSense passes everything untagged and ESX will tag it as it leaves. This method works fine for me, and prevents configuring the ESX for trunking.

Any benefit from running multiple physical NICs on the vSwitch? I'm running three 1-Gb NICs using "Route based on IP hash" load balancing, and I'm wondering if there is a benefit to running VMXNET3 or E1000. The VM still has one adapter per network though, so it's up to the ESX server to load balance and is transparent to the pfSense VM.

1st question: For people with a single VM host (no clusters, or, at least, no v-motion) that idea is pretty much 6 of one 1/2 dozen of the other. Probably doesn't matter much, do whichever you're more familiar with. Personally, I'd do them at the ESX(i) host level as well, but feel free to do it either way. Unless you have the situation you talk about next…

2nd question: I believe so. For an ESX(i) host with multiple uplinks from its vSwitch, a VMXNET3 may help a lot. A VMXNET3 is presented to the VM as a 10Gb adapter, as such, it'll pass more than 1Gb of traffic to a VM. If you have multiple 1Gb connections you won't get more than 1Gb to any one destination, but you might be able to leverage more than 1Gb total to your VM, although I would not expect anywhere near a full 2Gb (mostly since the load balancing isn't based on load, so your two heavy destinations could easily end up on the same NIC.)

Although, I do remember something about some vNICs being able to communicate at "bus" speed, ignoring their stated connection speed and transferring as fast as possible. That was back in my VCP testing, so I don't exactly remember. But, it would be easy to test. Run 2 VM's on the same host, both with VMXNET3 vNICs, throw data around and see how fast it goes (generated data, though, files will be dependent on disk speed.) Note CPU usage, though, remember, the vNICs are virtualized, they take CPU to run; if I recall correctly, this is the original idea behind the VMXNET NICs.

You may notice I said "may help a lot" earlier. I did hear somewhere that the "speed" of your vSwitch may be "set" by the fastest physical NIC connected to it. Again, memory, not always as good as I'd like it to be; as well as my quick search google-fu.

I think it's much easier to have ESX handle the VLAN tagging; it's one less thing to do on the VM to configure it properly, and it also prevents a user (authorized or not) from switching the network the VM is connected to from the VM.
I think you are correct about the bus speed; I just ran iperf between two KNOPPIX VMs on different VLANs and had 1.6Gb/s using the E1000 adapter and 1.3Gb/s using the VMXNET3 adapter. This speed was also confirmed on the pfSense live Traffic Graph. It's odd that the E1000 adapters ran quicker than the VMXNET3 though, however the pfSense VM is also running E1000 adapters if that makes a difference. The pfSense VM has 8 CPUs (single virtual socket, eight virtual cores) and most were barely registering during that test; one hit 50% and another hit 100% briefly. The ESX server is a beast though, with dual Xeons running at 2.4GHz, each with 6 cores, plus hyper-threading for 24 logical processors.

matguy

@FauxShow:

I think it's much easier to have ESX handle the VLAN tagging; it's one less thing to do on the VM to configure it properly, and it also prevents a user (authorized or not) from switching the network the VM is connected to from the VM.

I think you are correct about the bus speed; I just ran iperf between two KNOPPIX VMs on different VLANs and had 1.6Gb/s using the E1000 adapter and 1.3Gb/s using the VMXNET3 adapter. This speed was also confirmed on the pfSense live Traffic Graph. It's odd that the E1000 adapters ran quicker than the VMXNET3 though, however the pfSense VM is also running E1000 adapters if that makes a difference. The pfSense VM has 8 CPUs (single virtual socket, eight virtual cores) and most were barely registering during that test; one hit 50% and another hit 100% briefly. The ESX server is a beast though, with dual Xeons running at 2.4GHz, each with 6 cores, plus hyper-threading for 24 logical processors.

Do you notice any performance benefit from giving your pfSense VM so many cores? I would imagine 2 or 3 being the max that pfSense can really utilize. (Serious question, not saying you're doing anything wrong.)

As for the transfer, I can imagine that going between like vNIC's on the same host possibly being faster than different. Might be interesting to see VMXNET3 to VMXNET3. (I don't have time to test today.) For CPU usage, was that on the pfSense OS reporting CPU usage or VMware?

Supermule

Can you create a VLAN without a VLAN tag in Pfsense??

matguy

@Supermule:

Can you create a VLAN without a VLAN tag in Pfsense??

Are you looking to have mixed mode interface, with both a native, untagged network and tagged VLANs on the same interface?

If this is in a VM, there really wouldn't be a need unless you really don't want to configure multiple vNIC interfaces in ESX(i). If this is physical, remember, your switch will have to support VLANs anyway, so I'm not sure what benefit you'd be getting out of mixing the modes.

It's a good academic question, and if I understood the quick look at the VLAN documents for pfSense, I think it does. But, what are you trying to accomplish?

FauxShow

@matguy:

Do you notice any performance benefit from giving your pfSense VM so many cores? I would imagine 2 or 3 being the max that pfSense can really utilize. (Serious question, not saying you're doing anything wrong.)

As for the transfer, I can imagine that going between like vNIC's on the same host possibly being faster than different. Might be interesting to see VMXNET3 to VMXNET3. (I don't have time to test today.) For CPU usage, was that on the pfSense OS reporting CPU usage or VMware?

I've had stability issues with pfSense, so I'm making sure the VM has enough resources to keep running. I also switched back to 32-bit. Even if it can only fully utilize 2 there's no harm in giving 8, though if anyone can confirm a number I'd change it.

I ran the test again between a VM running the E1000 adapter on one VLAN and iperf'ing to a user on another VLAN and was able to get 634Mb/s. So that's from the VM, through pfSense, to ESX, though a gigabit switch, and then to the user. Because I'm more concerned about stability than performance, and 634Mb/s is fine with me, I'm not going to switch to VMXNET2/3 adapters.

matguy

@FauxShow:

@matguy:

Do you notice any performance benefit from giving your pfSense VM so many cores? I would imagine 2 or 3 being the max that pfSense can really utilize. (Serious question, not saying you're doing anything wrong.)

As for the transfer, I can imagine that going between like vNIC's on the same host possibly being faster than different. Might be interesting to see VMXNET3 to VMXNET3. (I don't have time to test today.) For CPU usage, was that on the pfSense OS reporting CPU usage or VMware?

I've had stability issues with pfSense, so I'm making sure the VM has enough resources to keep running. I also switched back to 32-bit. Even if it can only fully utilize 2 there's no harm in giving 8, though if anyone can confirm a number I'd change it.

I ran the test again between a VM running the E1000 adapter on one VLAN and iperf'ing to a user on another VLAN and was able to get 634Mb/s. So that's from the VM, through pfSense, to ESX, though a gigabit switch, and then to the user. Because I'm more concerned about stability than performance, and 634Mb/s is fine with me, I'm not going to switch to VMXNET2/3 adapters.

With a lot of cores per VM you can run in to scheduling issues of trying to schedule all the vCPUs at the same time. Of course, with 24 cores at your disposal, this might not be an issue… yet. If you start putting a lot of VMs (especially multi-core) on that host, watch your CPU Ready metrics, they'll tell you if you're having CPU scheduling issues.

miloman

@FauxShow:

I have a few questions:

Why run VLAN tagging with pfSense? Why not give leave the tagging up to ESX? This way pfSense passes everything untagged and ESX will tag it as it leaves. This method works fine for me, and prevents configuring the ESX for trunking.

Any benefit from running multiple physical NICs on the vSwitch? I'm running three 1-Gb NICs using "Route based on IP hash" load balancing, and I'm wondering if there is a benefit to running VMXNET3 or E1000. The VM still has one adapter per network though, so it's up to the ESX server to load balance and is transparent to the pfSense VM.

i'm letting pfsense handle the vlan tag because i have more vlans than it's possible to assign physical adapters.