Bundled CPU performance

mrgenie

I'm running a layer 2/tap oVPN across the continent.
172.22.56.0 to 172.22.70.0 as a very large network.

currently deployed are wrt3200acm and wrt1900acs as well as intel xeon and intel i5 hardware.

The oVPN speed running tap with AES256 encryption is between 80Mbps on the routers and 160Mbps between the intel systems.

The reason why oVPN maxing out at 160Mbps is simple:"using only 1 core"

So although the CPU's might be able to do 500-550Mbps, since 90% of them sitting idle, as only 1 thread can handle oVPN,
it's maxed out at roughly 160Mbps.

Now I read about pfsense that it's implementation "bundles" the power of a multi-core/multi-thread environment.

How should I look at that? Since it's mentioned specifically, does that mean the OS presents it's software those multi cores as if they
were 1 unit? And thus oVPN can use them all with a higher performance?

Or does that simply means, pfsense only took advantage of 1 core previously and now it can use all cores, but the single threaded
software like oVPN will still be limited to 1 core/thread at a time.

So how does pfsense perform against Server 2012 with oVPN or vs Ubuntu with oVPN?

doktornotor

OpenVPN 2.x is single threaded.

Pippin

The reason why oVPN maxing out at 160Mbps is simple:"using only 1 core"

Take a look here because there is a little more nuance to that:
https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux

VAMike

On AES-NI hardware openvpn is currently bottlenecked on the hmac calculation (SHA1 or SHA256 or whatever is chosen). Without changing the architecture, openvpn will see a massive speedup when AES GCM is enabled. This is most clearly seen by changing the algorithm in openvpn from AES256 to AES128. Currently it shows almost no difference, even though the AES-NI hardware processes the latter significantly faster than the former. Flip the auth algorithm from SHA-1 to SHA-256 and you'll see a dramatic difference, because that's where the bottleneck is. I wouldn't be surprised to see average performance more than double with openvpn 2.4 and AES GCM on AES-NI hardware.

Pippin

On my board, some time ago, I did some throughput testing in a routed client server client scenario, meaning OpenVPN/OpenSSL has double the task of fragmenting/defragmenting encrypting/decrypting hashing… (not the client-to-client directive in server config)
AES-256-CBC
SHA512
prng SHA512 32 (standard 16)
Throughput was ~160Mb/s.

Disabling crypto and hashing resulted in ~270 Mb/s.

With pure routing the board does ~950 Mb/s, so counting overhead 1 Gb/s.

Now you kind of see what impact fragmenting/defragmenting, jump to kernel and back and ??? has.

with openvpn 2.4 and AES GCM on AES-NI hardware

Even without AES-NI capable hardware it will improve I would think.

VAMike

@Pippin:

with openvpn 2.4 and AES GCM on AES-NI hardware

Even without AES-NI capable hardware it will improve I would think.

It'll improve, but the difference won't be as dramatic as for the AES-NI hardware (because you're not replacing a software MAC with a hardware-assisted MAC, you're replacing one software MAC with a somewhat more efficient software MAC.) And really I'm using AES-NI as a more familiar shortcut here, the real differentiator is the PCLMULQDQ operations, which are only on CPUs with AES-NI, but there are AES-NI CPUs (like the avotons/rangeleys) which lack PCLMULQDQ and aren't as efficient for AES-GCM on an instructions-per-byte basis.