Performance penalty of virtualized pfsense

cr_hyland

What is the performance difference of running pfsense on ESXi 4.1?

We are using it to route and protect traffic for hundreds of Cloud Servers but we are always concerned that we made the wrong choice performance wise by going with pfsense in a virtual environment. We dont have any data to compare our performance to a bare metal install so maybe some of you could advise us.

Is there any performance difference for network latency / throughput?
Is it really noticeable?
Are we talking 1 or 2 % or are we talking several miliseconds?
Should we still be able to saturate 1Gb links using vmware on ESXi or are we likely to see a lot less than that?

Anything else you can offer would be great.

Just so you know we use a 1:1 relationship of 1 pfsense 2.0.1 on 1 HP DL360 G5 with 4 processors and 6GB Ram, SAS drives and Intel Pro 1000 PT nics.

The reason we chose virtualization was to have high availability of firewall hardware without the difficulty of configuring CARP and bridges in pfsense.

Thanks,

cmb

Virtualization is a nice, widely used option for a lot of reasons, but best scalability isn't among those. Best performance I've seen in ESX is around 400 Mbps, on hardware that could do 5+ Gbps (depending on packet size) on bare metal. I haven't had a chance to analyze why that is in great depth, but if you need to push 400+ Mbps, do it on bare metal. For lower throughput than that, it doesn't matter either way. This forum sits behind virtual firewalls in ESX, all the servers behind it in total do millions of page loads a month, plus a wide variety of other things, running under 5% utilization.

biggsy

Out of interest, I ran an iperf test between two Win 7 machines -one, the iperf server, is a VM on a virtual (i.e., no physical NIC) DMZ passing through a pfSense VM to a physical Win 7 machine on LAN - just to see what the throughput was. Result without any tweaks: a lousy 200 Mb/s.

Then I jacked iperf's receive window up to 640k from the default of 8k and got 765 Mb/s. :)
Unfortunately, there's no way to set RWIN on the Win 7 stack these days. :(

The ESXi server is an HP dc7900 with a Core2 E7600 at ~3.0 GHz and an Intel Pro/1000 MT NIC on LAN. The pfSense and Win 7 VMs all use E1000 interfaces. The physical Win 7 machine is another of the same model but with an E8500 processor at 3.16 GHz and a Pro/1000 GT NIC.

I've just built another ESXi server on a machine identical to the physical Win 7 machine above and will try to set up a test passing through pfSense from LAN to WAN.

I should mention that some believe iperf is a flawed and unreliable test.

biggsy

Just remembered that the Pro/1000 GT in the physical Win 7 machine is a PCI card.

Swapped to a Pro/1000 CT and ran the test again with the same 640k window tweak. Throughput went up to 878 Mb/s.

C:\Users\ME>D:\Install\iperf\iperf.exe -c 192.168.11.2 -w640000
------------------------------------------------------------
Client connecting to 192.168.11.2, TCP port 5001
TCP window size:  625 KByte
------------------------------------------------------------
[128] local 192.168.111.148 port 49179 connected with 192.168.11.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[128]  0.0-10.0 sec  1.02 GBytes   878 Mbits/sec

cmb

That's solid. The scenarios I was referring to were all virtual to virtual, and the VMs themselves could well have played into it. I didn't have the opportunity to really mess with it. Sounds like in your case it's fast enough that you can do gig wire speed no problem, though I suspect if you had 10G you'd see a considerably lower limit in the VM than on the bare metal. If gigabit is all you need, you should be in good shape there.

heper

i've managed 943mbit on a dell r310 with an intel server nic running esxi4.1

i've yet to see what happens when ya trunk 2 physical interfaces with Lagg (2*1Gbit)
would the speed would also double or is this maxing out the VM ? The rrd graphs show slight increase in cpu, but not a lot
unfortunally i don't have a second device with multiple nics & connected to a managed switch readily available ;)

marsboer

In general you should be able to get at least 10 gbit/s throughput with paravirtualized drivers on virtualized setups on pure point to point. But as long as pfSense is running on emulated drivers like E1000 it will never reach good performance, at least not without using a lot of CPU on the physical host.

I get about 450 mbit/s on KVM from pfSense (non routed) with 100% cpu utilization (single CPU on physical host). On linux I get 19,6 gbit/s as a drop in replacement for pfSense, but it does support the paravirtualized drivers which gives much better performance. But this is VM to VM.
ESXi have better performance than KVM when running emulated network devices, so you should get more than 450mbit/s without problems on a good ESXi setup even when using emulated devices.

Full gigabit performance is no problem for any good virtualized firewall when paravirtualized drivers are used and VT-x is fully supported.
To continue using my own setup as an example the physical host uses only 13% of a single CPU when the KVM-virtualized linux VM pushes full gigabit speed (939 mbit/s) to another host on the physical network.

So the conclusion: gigabit performance on virtualized firewalls is absolutely no problem at all, but you need to use paravirtualized drivers, which FreeBSD and hence pfsense, is sadly severely lacking in support of. Without these drivers you get much higher CPU usage and much lower performance.

On ESXi the VMXNET2/3, not Intel E1000, are the paravirtualized network devices you should strive to use for best performance in general, but I don't think pfSense really has stable support for these paravirtualized devices on OS level like modern Linux yet. This is also valid for storage devices.

My iperf performance numbers are based on the default OS values of iperf. I did not use extraordinary large TCP window sizes to get the illusion of better performance, or tweaks of any kind (jumbo frames, kernel parameters etc)

biggsy

I did not use extraordinary large TCP window sizes to get the illusion of better performance, or tweaks of any kind (jumbo frames, kernel parameters etc)

I don't disagree with anything you've said about the benefits of paravirtualized versus emulated drivers but I feel the statement above requires me to qualify my reasons for increasing the window size.

My intention, in doing the testing that I did, was to see what a virtualized pfSense could push between two different interfaces (LAN and DMZ) using E1000 drivers.

The reason for using Windows 7 endpoints (virtualized in the DMZ and physical on the LAN) was simply that those machines already existed.

Increasing the receive window was just a way of trying to ensure that the performance of the iperf server and client end points, the Win 7 machines, were not the limiting factor. It was not to gain any "illusion of better performance".

cmb

@marsboer:

I get about 450 mbit/s on KVM from pfSense (non routed) with 100% cpu utilization (single CPU on physical host). On linux I get 19,6 gbit/s as a drop in replacement for pfSense, but it does support the paravirtualized drivers which gives much better performance. But this is VM to VM.

I seriously doubt if that was a truly equivalent test, Linux with the firewall enabled is almost certainly not going to push 20 Gbps (maybe with all jumbo frames). Route, maybe, you can get as much as 10* the traffic through pfsense if you disable the packet filter. There is a lot of overhead in filtering. I'm sure Linux does have higher top end performance because of the PV drivers, but it's not likely 450 Mb vs. 20 Gb for completely identical circumstances.

marsboer

My intention was not to make a Linux vs pfSense point, just to state how much you loose when not using paravirtualized devices.

The two setups was configured as following, just to give a little more insight:

Minimally configured default install of pfSense 2.0.1:

No DHCP server running
NAT turned completely off (all rules removed)
Allow any traffic on all interfaces (all other rules removed)
Run ospf on WAN interface
KVM NICs: Intel gigabit NIC

Minimal netinstall of Linux, Debian Wheezy (testing), kernel 3.1.0-1

IP forwarding enabled
iptables policy set to ACCEPT on INPUT, OUTPUT, FORWARD (the default)¨
Run ospf on WAN interface (quagga)
KVM NICs: virtio

Clients:
Linux vms on same host, also Debian machines (test partners for vm to vm tests)
Linux on other directly connected physical host (test real physical throughput and CPU usage)

Results non-routed:
vm to pfsense on same virtual LAN: ~450 mbit/s
vm to Linux on same virtual LAN: ~19,6 gbit/s

Results routed:
vm to vm on other virtual LAN via pfSense: ~210 mbit/s
vm to vm on other virtual LAN via Linux: ~9,6 gbit/s

I do not see that I do something wrong here that make pfSense do a lot more work than it's Linux counterpart by configuration mismatch.

If you know of something glaringly obvious here I would gladly reconfigure pfSense and retest. My understanding is that iptables is active on Linux when policy is set to ACCEPT, but I could also test with iptables default policy DROP and then add a rule that allows anything. A bit artificial, but just to ensure that this is not the main differentiator here.

EDIT:
I configured default policy to DROP and added rules to allow anything on INPUT, OUTPUT and FORWARD just to make sure that Linux has its packet filtering active.
The result when testing vm to linux fw vm performance now was 19,9-20 gbit/s.

By looking at iptables -L -v I can see the packet and byte counter increase for the rules when testing so I can definitively confirm that packet filtering is active on Linux so this is a valid and real comparison.

marsboer

@biggsy:

Increasing the receive window was just a way of trying to ensure that the performance of the iperf server and client end points, the Win 7 machines, were not the limiting factor. It was not to gain any "illusion of better performance".

My remark was actually meant at my own numbers, as around 20 gbit/s for the Linux firewall seems rather incredible if one are used to non-paravirtualized performance with pfSense.

mattlach

I can't speak to overall throughput limitations.

My Zacate E350 with a dual Intel server NIC is able to max out my 30Mbit down / 25Mbit up WAN connection in every test I can think of.

I hvae - however - suspected that the virtualization overhead adds some latency, at least when you map traffic through ESXi's virtual switches. My theory is that this goes away if you direct IO MAP the NIC's to the pfsense VM using Intel's VT-D or AMD's IOMMU, but I have not been able to confirm this yet, as my hardware is not VT-D or IOMMU compatible.

I can - however - confirm that ESXi 5.0 has terrible impact on USB speeds for devices plugged into the local host. ESXi does not support my eSATA controller so I temporarily have my external array hooked up using USB2.0. It is my experience that USB2.0 tops out at ~26MB/s natively, but mapped through ESXi my external array is frustratingly limited to about 3.5MB/s.

pf123user

VMXNET 2 (Enhanced) and VMXNET 3 NICs are not supported yet?

iFloris

@pf2.0nyc:

VMXNET 2 (Enhanced) and VMXNET 3 NICs are not supported yet?

Actually, there is a way to get vmxnet 3 working under esxi 5, but it requires some tweaking.
I have not been able to get it to work, but you can give the procedure described here a shot:
[How-To] Using VMXNET2/3 NICs in pfSense 2.0

Supermule

I am getting close to 650mbit throughput on PFSense using speedtest.net

Using a normal fumctioning firewall with rules in place etc. Nothing disabled and not running Squid to cache anything.

ESXi 4.1 on IBM X3550M3 hardware with Intel quadport server nic's.

photonman

@pf2.0nyc:

VMXNET 2 (Enhanced) and VMXNET 3 NICs are not supported yet?

with the open vmtools package, you can get vmxnet2 drivers to work easily.

kathampy

I am able to saturate 1Gbps In and 1Gbps Out with LAN routing in pfSense on two NICs that are shared with 2 other VMs using the emulated E1000 drivers on pfSense and VMXNET 3 on the other VMs. The 1Gbps IO is just the pfSense traffic.