Gigabit VPN Router
-
Why isn't NI-AES hardware crypto acceleration mentioned here? My understanding is that both i3 and i5 support this instruction set.
Because openvpn is bottlenecked long before you hit the theoretical limits for AES-NI; you'll go faster than you will without it, but it's not enough to let you hit gigabit VPN rates. It's also a fair bet that anyone trying to do gigabit VPN is already using a processor with AES-NI (which is basically any higher-end processor for the past 5 years).
-
Why isn't NI-AES hardware crypto acceleration mentioned here? My understanding is that both i3 and i5 support this instruction set.
Because openvpn is bottlenecked long before you hit the theoretical limits for AES-NI; you'll go faster than you will without it, but it's not enough to let you hit gigabit VPN rates. It's also a fair bet that anyone trying to do gigabit VPN is already using a processor with AES-NI (which is basically any higher-end processor for the past 5 years).
I am trying to understand what is this limit being hit that prevents gigabit vpn.
-
It's algorithms that need to be executed fastee than a CPu can
-
In general, it is the context switching between user and kernel/system mode and the tun driver that gets in the way of faster OpenVPN performance.
-
Also, it wasn't until recently that IPSec could be done really fast on commodity hardware, it used to need crypto accelerators. OpenVPN has different issues, but will need a comparable order of magnitude improvement before you can use it for high speeds.
-
@johnkeates:
It's algorithms that need to be executed fastee than a CPu can
CPU is holding it back? Didn't someone say it was something else besides the CPU?
In general, it is the context switching between user and kernel/system mode and the tun driver that gets in the way of faster OpenVPN performance.
In a hardware context, is it having to swap out the data on its cache when it switches between kernal/system mode? Does it not have enough cache to do both and swapping out the cache is not happening fast enough due to a bus speed limitation?
Bascially, I am trying to understand it to the point where I can look at a piece of hardware and understand how fast it can go up to.
-
OpenVPN will always be slower than IPSec can be because for each packet that is sent there are more instructions required to be processed due to switching context between kernel and user mode. In it's current form at least.
Steve
-
OpenVPN will always be slower than IPSec can be because for each packet that is sent there are more instructions required to be processed due to switching context between kernel and user mode. In it's current form at least.
Steve
I know OpenVPN is slower. What I am trying to figure out is what hardware limitation is OpenVPN running into. Somewhere on the hardware, something is being maxed out and that is why it is not going any faster.
-
OpenVPN will always be slower than IPSec can be because for each packet that is sent there are more instructions required to be processed due to switching context between kernel and user mode. In it's current form at least.
Steve
I know OpenVPN is slower. What I am trying to figure out is what hardware limitation is OpenVPN running into. Somewhere on the hardware, something is being maxed out and that is why it is not going any faster.
This might help you understand the limitations of OpenVPN, it certainly helped me :)
https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux
Summary:
1. First bottleneck is the OpenSSL encryption / decryption routines perform better with larger packet sizes. This also helps reducing the context switching between user space and kernel space as more data are fed in one packet hence reducing the switching overhead (less switching is done)
2. Second is AES NI acceleration on the CPU
3. Encryption itself. Without encryption they managed to hit almost gigabit speeds with jumbo frames in the TUNwith the above settings I am hitting about 300mbps from my Digital Ocean web server to my gigabit connection at home. CPU utilisation on the Digital Ocean Ubuntu box is about 90% on the OpenVPN process so it could be the virtual CPU limiting me or the network stack/virtualisation drivers they are using. On my personal devices I use IPSec where I get a comfortable 400-500 mbps throughput.
-
You are simply not listening. http://www.linfo.org/context_switch.html
Context and Mode switching.
-
Basically, unless OpenVPN could be implemented like IPsec or implemented like for example OpenVSwitch which does initial matching, flow creation and setup etc. in user space, and then all packets/frames after that can be handled in the kernel, it will not get 'faster'.
-
Why isn't NI-AES hardware crypto acceleration mentioned here? My understanding is that both i3 and i5 support this instruction set.
Because the main or mostly effort and gain will IPsec getting from using that instruction set.
I wouldn't expect gigabit speeds over VPN out of any commodity hardware. And no, I'm not aware of any situation where Optane would improve pfSense performance. I could be wrong, though. The technology is very new.
That said, your hardware list does look like it would provide very good performance.
VPN is a structure with two ends! And the most users forget this by thinking they have powerful hardware in game "on their site"
but the other site or VPN end must be also strong enough to handle or offer the wished speed result.I am trying to understand what is this limit being hit that prevents gigabit vpn.
GB VPN will be not a big secret and will be also able to reach for sure but not with OpenVPN, hardware we discuss here or
based on the tun/tap design or the entire OpenVPN code.Also, it wasn't until recently that IPSec could be done really fast on commodity hardware, it used to need crypto accelerators. OpenVPN has different issues, but will need a comparable order of magnitude improvement before you can use it for high speeds.
@gonzopancho is owing a 1 GBit/s symmetric Internet connection and he is using the mid ranged SG-4860 appliance from pfsense, together with AES-NI and IPsec he gets out something around ~470 MBit/s over IPsec and on top of this the VPN and TCP/IP
overhead it is nearly ~500 MBit/s, once more again this is a small 4 core Intel Atom CPU! Reddit: SG-4860 vs SG-8860I know OpenVPN is slower. What I am trying to figure out is what hardware limitation is OpenVPN running into. Somewhere on the hardware, something is being maxed out and that is why it is not going any faster.
Perhaps if OpenVPN will be new written and it is using multiple CPU cores this will be more scaling up or pending on the tun/tap
interface it will be better to get other mechanisms that will better matching then. -
Exactly. Also, making OpenVPN multicore won't immediately get you Ncores performance increase, as ctx switches will still happen, but now Ncores times more, as well as possible IPC. On top of that, imagine having to do: packet-ctx(to kernel)-packet-ctx(to user mode)-ipc-ctx(back to kernel)-packet before a packet in multicore mode can be processed if multiple threads or processed need to swap out information on certain packets. The horror.
I think a split user-kernel design would help a lot more, but I have no idea how that would be implemented since key material should probably not be stored in two places in memory, and unless you can do session setup and control in user space, and raw packet processing in the other, it would probably require such a big redesign that it won't be compatible with existing versions (i.e. needing a protocol change to allow separate control and data flows). At the same time, the fact that there is a daemon mode with control interface and a client for that means that they might have been thinking about that. Oh well, I should really dig in more before talking about all of this.
-
Info:
https://community.openvpn.net/openvpn/wiki/RoadMap#OpenVPN3.0 -
Info:
https://community.openvpn.net/openvpn/wiki/RoadMap#OpenVPN3.0Nice! Looks like my speculation wasn't too far off. I guess anyone having more questions about OpenVPN speed should just be directed there as it both explains the current limitations as well as solutions, answering their questions just fine.
-
OpenVPN will always be slower than IPSec can be because for each packet that is sent there are more instructions required to be processed due to switching context between kernel and user mode. In it's current form at least.
Steve
I know OpenVPN is slower. What I am trying to figure out is what hardware limitation is OpenVPN running into. Somewhere on the hardware, something is being maxed out and that is why it is not going any faster.
This might help you understand the limitations of OpenVPN, it certainly helped me :)
https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux
Summary:
1. First bottleneck is the OpenSSL encryption / decryption routines perform better with larger packet sizes. This also helps reducing the context switching between user space and kernel space as more data are fed in one packet hence reducing the switching overhead (less switching is done)
2. Second is AES NI acceleration on the CPU
3. Encryption itself. Without encryption they managed to hit almost gigabit speeds with jumbo frames in the TUNwith the above settings I am hitting about 300mbps from my Digital Ocean web server to my gigabit connection at home. CPU utilisation on the Digital Ocean Ubuntu box is about 90% on the OpenVPN process so it could be the virtual CPU limiting me or the network stack/virtualisation drivers they are using. On my personal devices I use IPSec where I get a comfortable 400-500 mbps throughput.
You are simply not listening. http://www.linfo.org/context_switch.html
Context and Mode switching.
Info:
https://community.openvpn.net/openvpn/wiki/RoadMap#OpenVPN3.0Thank you. Those links were really useful.
From what this seems to be saying, the bottleneck is how fast the CPU swaps processes. My understanding is that the hardware bottleneck is how long it takes to send and receive data between the L1 cache and the RAM. This leaves me to believe that higher GHZ RAM will mean a faster VPN router since it is the RAM access speed that is slowing it down. Is this understanding correct? When OpenVPN context swaps, is it dumping its L1 cache and CPU state into the RAM or is it just being dumped into L2/L3 cache ? Is the RAM access the bottleneck making context swaps take so long?
-
It's not that, the speeds (bandwidth) isn't the issue, it's response time or latency or 'expensive' operations (i.e. waste many CPU cycles between tasks to get from one task to another). It's probably more comparable to the SSD vs. HDD thing where an SSD isn't necessarily faster (in terms of bandwidth) but is always faster in terms of access time which is what users experience.
There is no magical fix here, mostly because of architectural and x86 reasons, both which cannot be changed anytime soon. OpenVPN 3 might help with some OpenVPN architectural changes, so that is your best bet. Putting 'better' hardware in a box only does a little for VPN speeds. Spending 10x more money might get you only 5% more speed, and it gets worse as you get higher.
-
@johnkeates:
It's not that, the speeds (bandwidth) isn't the issue, it's response time or latency or 'expensive' operations (i.e. waste many CPU cycles between tasks to get from one task to another). It's probably more comparable to the SSD vs. HDD thing where an SSD isn't necessarily faster (in terms of bandwidth) but is always faster in terms of access time which is what users experience.
There is no magical fix here, mostly because of architectural and x86 reasons, both which cannot be changed anytime soon. OpenVPN 3 might help with some OpenVPN architectural changes, so that is your best bet. Putting 'better' hardware in a box only does a little for VPN speeds. Spending 10x more money might get you only 5% more speed, and it gets worse as you get higher.
Should I be looking at cache response time (if context switches only happen in cache) or should I look into RAM response time( if there is RAM interaction in this context switch)?
-
@johnkeates:
It's not that, the speeds (bandwidth) isn't the issue, it's response time or latency or 'expensive' operations (i.e. waste many CPU cycles between tasks to get from one task to another). It's probably more comparable to the SSD vs. HDD thing where an SSD isn't necessarily faster (in terms of bandwidth) but is always faster in terms of access time which is what users experience.
There is no magical fix here, mostly because of architectural and x86 reasons, both which cannot be changed anytime soon. OpenVPN 3 might help with some OpenVPN architectural changes, so that is your best bet. Putting 'better' hardware in a box only does a little for VPN speeds. Spending 10x more money might get you only 5% more speed, and it gets worse as you get higher.
Should I be looking at cache response time (if context switches only happen in cache) or should I look into RAM response time( if there is RAM interaction in this context switch)?
There is no single component that does it all, and there also is no guarantee on CPU behaviour with specific programs. That's the whole issue here: it's not just some specific action on a specific port on a specific device that makes OpenVPN either slow or fast. It's everything. Unless you are into low level system architecture and design, there is very little you can do to either fix it in code or in hardware in this case.
If you really really really really want more information, just hook into a OpenVPN process with a debugger/tracer and start recording call performance etc. Not sure on what OS you'll be doing it, but check things like strace, dtrace, gdb, valgrind etc.
-
I agree it reads like you're looking for an answer that doesn't exist here.
If you want the highest OpenVPN speeds you can have, get the fastest single thread performance CPU you can afford. Though as said above spending twice as much will probably not result in twice the throughput.
Steve