Performance penalty of virtualized pfsense
-
I did not use extraordinary large TCP window sizes to get the illusion of better performance, or tweaks of any kind (jumbo frames, kernel parameters etc)
I don't disagree with anything you've said about the benefits of paravirtualized versus emulated drivers but I feel the statement above requires me to qualify my reasons for increasing the window size.
My intention, in doing the testing that I did, was to see what a virtualized pfSense could push between two different interfaces (LAN and DMZ) using E1000 drivers.
The reason for using Windows 7 endpoints (virtualized in the DMZ and physical on the LAN) was simply that those machines already existed.
Increasing the receive window was just a way of trying to ensure that the performance of the iperf server and client end points, the Win 7 machines, were not the limiting factor. It was not to gain any "illusion of better performance".
-
I get about 450 mbit/s on KVM from pfSense (non routed) with 100% cpu utilization (single CPU on physical host). On linux I get 19,6 gbit/s as a drop in replacement for pfSense, but it does support the paravirtualized drivers which gives much better performance. But this is VM to VM.
I seriously doubt if that was a truly equivalent test, Linux with the firewall enabled is almost certainly not going to push 20 Gbps (maybe with all jumbo frames). Route, maybe, you can get as much as 10* the traffic through pfsense if you disable the packet filter. There is a lot of overhead in filtering. I'm sure Linux does have higher top end performance because of the PV drivers, but it's not likely 450 Mb vs. 20 Gb for completely identical circumstances.
-
My intention was not to make a Linux vs pfSense point, just to state how much you loose when not using paravirtualized devices.
The two setups was configured as following, just to give a little more insight:
Minimally configured default install of pfSense 2.0.1:
- No DHCP server running
- NAT turned completely off (all rules removed)
- Allow any traffic on all interfaces (all other rules removed)
- Run ospf on WAN interface
- KVM NICs: Intel gigabit NIC
Minimal netinstall of Linux, Debian Wheezy (testing), kernel 3.1.0-1
- IP forwarding enabled
- iptables policy set to ACCEPT on INPUT, OUTPUT, FORWARD (the default)¨
- Run ospf on WAN interface (quagga)
- KVM NICs: virtio
Clients:
Linux vms on same host, also Debian machines (test partners for vm to vm tests)
Linux on other directly connected physical host (test real physical throughput and CPU usage)Results non-routed:
vm to pfsense on same virtual LAN: ~450 mbit/s
vm to Linux on same virtual LAN: ~19,6 gbit/sResults routed:
vm to vm on other virtual LAN via pfSense: ~210 mbit/s
vm to vm on other virtual LAN via Linux: ~9,6 gbit/sI do not see that I do something wrong here that make pfSense do a lot more work than it's Linux counterpart by configuration mismatch.
If you know of something glaringly obvious here I would gladly reconfigure pfSense and retest. My understanding is that iptables is active on Linux when policy is set to ACCEPT, but I could also test with iptables default policy DROP and then add a rule that allows anything. A bit artificial, but just to ensure that this is not the main differentiator here.
EDIT:
I configured default policy to DROP and added rules to allow anything on INPUT, OUTPUT and FORWARD just to make sure that Linux has its packet filtering active.
The result when testing vm to linux fw vm performance now was 19,9-20 gbit/s.By looking at iptables -L -v I can see the packet and byte counter increase for the rules when testing so I can definitively confirm that packet filtering is active on Linux so this is a valid and real comparison.
-
Increasing the receive window was just a way of trying to ensure that the performance of the iperf server and client end points, the Win 7 machines, were not the limiting factor. It was not to gain any "illusion of better performance".
My remark was actually meant at my own numbers, as around 20 gbit/s for the Linux firewall seems rather incredible if one are used to non-paravirtualized performance with pfSense.
-
I can't speak to overall throughput limitations.
My Zacate E350 with a dual Intel server NIC is able to max out my 30Mbit down / 25Mbit up WAN connection in every test I can think of.
I hvae - however - suspected that the virtualization overhead adds some latency, at least when you map traffic through ESXi's virtual switches. My theory is that this goes away if you direct IO MAP the NIC's to the pfsense VM using Intel's VT-D or AMD's IOMMU, but I have not been able to confirm this yet, as my hardware is not VT-D or IOMMU compatible.
I can - however - confirm that ESXi 5.0 has terrible impact on USB speeds for devices plugged into the local host. ESXi does not support my eSATA controller so I temporarily have my external array hooked up using USB2.0. It is my experience that USB2.0 tops out at ~26MB/s natively, but mapped through ESXi my external array is frustratingly limited to about 3.5MB/s.
-
VMXNET 2 (Enhanced) and VMXNET 3 NICs are not supported yet?
-
@pf2.0nyc:
VMXNET 2 (Enhanced) and VMXNET 3 NICs are not supported yet?
Actually, there is a way to get vmxnet 3 working under esxi 5, but it requires some tweaking.
I have not been able to get it to work, but you can give the procedure described here a shot:
[How-To] Using VMXNET2/3 NICs in pfSense 2.0 -
I am getting close to 650mbit throughput on PFSense using speedtest.net
Using a normal fumctioning firewall with rules in place etc. Nothing disabled and not running Squid to cache anything.
ESXi 4.1 on IBM X3550M3 hardware with Intel quadport server nic's.
-
@pf2.0nyc:
VMXNET 2 (Enhanced) and VMXNET 3 NICs are not supported yet?
with the open vmtools package, you can get vmxnet2 drivers to work easily.
-
I am able to saturate 1Gbps In and 1Gbps Out with LAN routing in pfSense on two NICs that are shared with 2 other VMs using the emulated E1000 drivers on pfSense and VMXNET 3 on the other VMs. The 1Gbps IO is just the pfSense traffic.