Performance Measurments with VirtIO + Offloading on Atom C2358 [Updated]
[Please see the updated 01/2017 post below for more up-to-date information. This post contains the original assertions.]
I just wanted to post an experience that seems to run contrary to the prevailing wisdom that you should disable hardware checksum and other offloading options when using the VirtIO network drivers with pfsense. I just finished testing a number of combinations of virtio settings mixed with various offloading settings, and wound up seeing an 4x performance improvement when offloading was enabled everywhere.
Hardware: Lanner FW-7551 w/ Intel Atom C2358 (2 cores @ 1.74GHz), 8GB RAM, and a 128 GB SSD
Hypervisor: KVM/QEMU/libvirt running atop Ubuntu 16.04.1 server (4.4.0-47-generic kernel)
pfSense: 2.3.2-RELEASE-p1 64-bit running on a 2 CPU, 2GB RAM VM w/ virtio drivers in macvtap mode
I ran a serious of tests with both checksum offloading toggled at both the hypervisor and pfSense level, as well as with segmentation offloading and LRO toggled at both levels.
In any situations where offloading was disabled, either at the hypervisor level, the pfSense level, or both, iperf tests between pfsense and another machine across the gigabit network would top out between 150 mbps and 250 mbps depending on the number of parallel connections being run. In these cases, the VM CPU would quickly peg at 100% indicating a CPU packet processing bottleneck.
When I enable all offloading options (checksum offloading, segmentation offloading, and LRO) at both the hypervisor level and the pfSense level, however, my performance immediately jumps up to the 750 mbps to 850 mbps range – a 4x increase over the non-offloading configuration.
With all the hardware offloading enabled, I can even hit ~1.5 gbps between pfSense and the hypervisor itself (e.g. across the macvtap soft-switch).
All of these tests were performed with the default MTU (1500), with VLAN tagging happening at the hypervisor layer, and while using an LACP bonded pair of the built-in gigabit Intel NICs. Other than the offloading options, both the pfSense 2.3 and Ubuntu 16.04 installs are fairly stock configurations.
I have noticed no packet loss or other adverse effects with hardware offloading enabled. I'll keep playing around with this at report back here if I encounter any issues, but at least with the latest Ubuntu and pfSense software and Atom C2358 hardware, there seems to be a significant performance increase to using hardware offloading coupled with the virtio drivers.
I also have an 8-core Supermicro C2378 Atom board I'll try to repeat these tests on and report back here as well.
For reference, here's the QEMU line from my running VM (as generated by virt-manager and libvirt):
$ qemu-system-x86_64 -enable-kvm -name pf1 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -cpu Westmere,+erms,+smep,+3dnowprefetch,+rdtscp,+rdrand,+tsc-deadline,+movbe,+pdcm,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid eabbad55-89c5-4892-8be4-47e44ac3608b -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-pf1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device ahci,id=sata0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/dev/libvirt-ssd-1/pf1-root,format=raw,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=XX:XX:XX:XX:XX:XX,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=XX:XX:XX:XX:XX:XX,bus=pci.0,addr=0x9 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
As far as I can tell we ran into problems with Hardware TCP Segmentation Offloading (TSO) enabled, not on the LAN side but on the WAN side (VDSL).
pfSense is installed on an ESXI Host as a VM. LAN and WAN are a Intel i350-T4 card.
With TSO enabled I could not make any transfers with "larger" files anymore, e.g. we could not send E-Mails with an attachment via Outlook anymore.
A follow-up to my original post. I did some more testing, and while offloading does seem to significantly increase throughput from a host to pfSense itself, it also can have the oppisite effects on traffic that just traverses pfSense before being filtered or forwarded elsewhere. This aligns with previously reports issues with offloading + virtio.
Here are some benchmarks for various configurations using pfSense 2.3.2-RELEASE-p1 on a Lanner FW-7551 w/ Intel Atom C2358 board and the following network configuration:
[Host A] –- [ (Network A) | pf1 | (Network B) ] –- [Host B]
chk tso lro | A -> B | B -> A | A -> pf1 | pf1 -> A | B -> pf1 | pf1 -> B | ------------|--------------|--------------|--------------|-------------|-------------|-------------| Y Y Y | 5.18 +- 3.26 | 4.91 +- 1.92 | 831 +- 71.4 | 242 +- 11.1 | 808 +- 86.9 | 262 +- 13.6 | N Y Y | 322 +- 13.1 | 290 +- 11.4 | 165 +- 7.51 | 248 +- 17.3 | 173 +- 12.6 | 260 +- 9.41 | N Y N | 307 +- 17.6 | 281 +- 9.07 | 167 +- 9.71 | 232 +- 5.58 | 153 +- 9.42 | 261 +- 11.1 | N N Y | 287 +- 16.2 | 275 +- 10.4 | 160 +- 7.08 | 172 +- 3.85 | 157 +- 8.67 | 165 +- 2.67 | N N N | 305 +- 19.9 | 287 +- 9.30 | 168 +- 6.10 | 174 +- 4.75 | 157 +- 5.65 | 173 +- 4.30 |
All measurements are in mbps with 95% confidence intervals (assuming a normal distribution). In each case, the three pfSense offloading checkbox settings are shown on the left: Y = unchecked (feature is enabled), N = checked (feature is disabled). In each test, the Linux hypervisor offload settings for the underlying physical network cares were set to the default values (which generally means all supported hypervisor offloading was enabled).
As you can see, when all pfSense offloading is enabled, host -> pfSense speeds increases by ~5x. Unfortunately, pfSense routing/filtering speeds decrease by several orders of magnitude as well. Thus, in line with previous reports, it seems disabling at least checksum offloading is required to get decent cross-pfSense throughput when using virtio drivers. Neither LRO nor TSO seems to have a significant effect on cross-pfSense throughput. Enabling TSO does increase the host -> pfSense throughput by ~50%.
This does make me think that if FreeBSD ever adds proper support for virtio checksumming (i.e. by ignoring unchecksummed values in pf), it may lead to significant performance gains. But for now, disabling checksum offloading seems necessary, even though it significantly diminishes the performance of host -> pfSense traffic.
Thank you for posting this, it's extremely helpful. I'm hopeful changes in 2.4 will benefit virtio performance? I'm not in a position where I can just pass through a nic dedicated to pfsense and so am at the mercy of virtio.