About virtualization and very high throughput



  • Hello

    I have a question about virtualization firewall . I need make virtual firewall with 2 nodes  ( probably I would like use Citrix XenServer ( cheapest solution  with support) to create 10 context ( virtual firewalls) . The total throughput  - only firewall without VPN,IPS etc at one node  should be  at least 7,5 Gb/s to real application stream ( at 2 node firewall with active-active connections 15 Gb/s). I would like use IBM 1U Server with 2 physical latest Xeon E5 processors ( above 3 Ghz 8 core each and 16 GB RAM and 2 dual 10 Gb/s NIC  cards for example IBM server cards).
    It's posiibility that machine handle such througput by virtual platform ?
    If  someone used this approach between internal network and network data-center ?
    Maybe I need more then 2 nodes ?

    Best regards,
    Marcin



  • when you need high throughput, it is never ideal to run virtual. Depending on the supported networkcard drivers and their throughput you might run out of bits to process soon
    perhaps vm-pass-through/vmdirectpath is possible if the NICs are pcie ? (i've never tried to see if there is ANY performance gain)

    I'm pretty confident that when running baremetal that this would run flawlessly, personally have never tried +2Gbit on a virtual machine (esxi5) with pfsense



  • FreeBSD (which pfsense is based on) virtualised performance is rather limited, at least when compared to Linux, when used with one of the popular VM platforms (Xen, KVM, VMware etc).

    There is some development under work to improve FreeBSD in that regard, but it's uncertain when it'll bear fruits …



  • @heper:

    when you need high throughput, it is never ideal to run virtual. Depending on the supported networkcard drivers and their throughput you might run out of bits to process soon
    perhaps vm-pass-through/vmdirectpath is possible if the NICs are pcie ? (i've never tried to see if there is ANY performance gain)

    I'm pretty confident that when running baremetal that this would run flawlessly, personally have never tried +2Gbit on a virtual machine (esxi5) with pfsense

    I agree. I'm running virtual on ESXi 5 but I only have ~250mbps in bandwidth combined across my three inbound connections. Locally on the LAN I have zero issue running sustained gigabit speeds and CPU use is very minimal.

    You don't need 2x 8-core CPUs to run sustained ~10GbE unless you will have a lot of other VMs running on that same hardware. For reference I'm running on older Dell hardware and have 2x Xeon x5482 CPUs (two quad core 3.2ghz) and I've only allocated my PfSense box 2 cores and 4.0GB of RAM. The highest I've ever gotten was ~58% CPU use and memory up to ~28% used and that was when we were load testing the disk I/O of the RAID array inside the ESXi host server.

    If space is a concern there are many small passive appliance type boxes that would fit in the back of the rack and not take up a full 1U of space. Two appliances running an Intel Atom or i3 CPU would be plenty to handle a 10GbE connection if built and configured properly and would give you the redundancy you need/want.



  • Use the pci passthrough feature. It will come at a cost (upped power consumption, because freeBSD NIC drivers appear to do that compared with linux). I am running xen with a pfSense VM, and I found that the CPU time went up when moving traffic that went through my LAN interface (which was the shared interface, the WAN interface already had a passthrough NIC). Because all traffic that came in to the LAN interface was inevitably destined for the WAN, I didn't hit a transfer limit cap, but I estimate I would have been capped at somewhere between 50 and 100MB/s. No good. So I installed a third NIC and also passed that through as the LAN interface, Power consumption went up by 2W, but the CPU never goes up for network transfers now.

    The reason is that  (in linux+xen anyway), when running a purely HVM virtual machine (required, since BSD + paravirtual drivers don't really work yet), qemu-dm is used to emulate the device. This process uses a lot of CPU-time (read: it's crap) and is a major cap for network and disk I/O. Disk I/O will still suffer the same limitations, but one doesn't expect too much disk I/O for this to be a serious concern, unless you have lots of logging (then use a remote log server I guess?). A linux virtual machine doesn't have this limitation, because paravirtual drivers do work, and this allows a HVM guest to control the I/O device directly (indirectly) through some PCI front and and back end drivers in the guest and host that doesn't rely on device emulation, like qemu-dm.

    So basically, if you want a high throughput firewall system, its absolutely possible. You'll probably need a remote logserver, your hardware must support VT-d (or AMD equivalent which provides IOMMU, don't know its commercial name, and its essential bother motherboard and CPU support this properly), and your hypervisor should support using IOMMU (I imagine all paid hypervisors do by now, xen and by extension citrix xenserver, most certainly do).


Locked