Gigabit VPN Router
-
Info:
https://community.openvpn.net/openvpn/wiki/RoadMap#OpenVPN3.0Nice! Looks like my speculation wasn't too far off. I guess anyone having more questions about OpenVPN speed should just be directed there as it both explains the current limitations as well as solutions, answering their questions just fine.
-
OpenVPN will always be slower than IPSec can be because for each packet that is sent there are more instructions required to be processed due to switching context between kernel and user mode. In it's current form at least.
Steve
I know OpenVPN is slower. What I am trying to figure out is what hardware limitation is OpenVPN running into. Somewhere on the hardware, something is being maxed out and that is why it is not going any faster.
This might help you understand the limitations of OpenVPN, it certainly helped me :)
https://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux
Summary:
1. First bottleneck is the OpenSSL encryption / decryption routines perform better with larger packet sizes. This also helps reducing the context switching between user space and kernel space as more data are fed in one packet hence reducing the switching overhead (less switching is done)
2. Second is AES NI acceleration on the CPU
3. Encryption itself. Without encryption they managed to hit almost gigabit speeds with jumbo frames in the TUNwith the above settings I am hitting about 300mbps from my Digital Ocean web server to my gigabit connection at home. CPU utilisation on the Digital Ocean Ubuntu box is about 90% on the OpenVPN process so it could be the virtual CPU limiting me or the network stack/virtualisation drivers they are using. On my personal devices I use IPSec where I get a comfortable 400-500 mbps throughput.
You are simply not listening. http://www.linfo.org/context_switch.html
Context and Mode switching.
Info:
https://community.openvpn.net/openvpn/wiki/RoadMap#OpenVPN3.0Thank you. Those links were really useful.
From what this seems to be saying, the bottleneck is how fast the CPU swaps processes. My understanding is that the hardware bottleneck is how long it takes to send and receive data between the L1 cache and the RAM. This leaves me to believe that higher GHZ RAM will mean a faster VPN router since it is the RAM access speed that is slowing it down. Is this understanding correct? When OpenVPN context swaps, is it dumping its L1 cache and CPU state into the RAM or is it just being dumped into L2/L3 cache ? Is the RAM access the bottleneck making context swaps take so long?
-
It's not that, the speeds (bandwidth) isn't the issue, it's response time or latency or 'expensive' operations (i.e. waste many CPU cycles between tasks to get from one task to another). It's probably more comparable to the SSD vs. HDD thing where an SSD isn't necessarily faster (in terms of bandwidth) but is always faster in terms of access time which is what users experience.
There is no magical fix here, mostly because of architectural and x86 reasons, both which cannot be changed anytime soon. OpenVPN 3 might help with some OpenVPN architectural changes, so that is your best bet. Putting 'better' hardware in a box only does a little for VPN speeds. Spending 10x more money might get you only 5% more speed, and it gets worse as you get higher.
-
@johnkeates:
It's not that, the speeds (bandwidth) isn't the issue, it's response time or latency or 'expensive' operations (i.e. waste many CPU cycles between tasks to get from one task to another). It's probably more comparable to the SSD vs. HDD thing where an SSD isn't necessarily faster (in terms of bandwidth) but is always faster in terms of access time which is what users experience.
There is no magical fix here, mostly because of architectural and x86 reasons, both which cannot be changed anytime soon. OpenVPN 3 might help with some OpenVPN architectural changes, so that is your best bet. Putting 'better' hardware in a box only does a little for VPN speeds. Spending 10x more money might get you only 5% more speed, and it gets worse as you get higher.
Should I be looking at cache response time (if context switches only happen in cache) or should I look into RAM response time( if there is RAM interaction in this context switch)?
-
@johnkeates:
It's not that, the speeds (bandwidth) isn't the issue, it's response time or latency or 'expensive' operations (i.e. waste many CPU cycles between tasks to get from one task to another). It's probably more comparable to the SSD vs. HDD thing where an SSD isn't necessarily faster (in terms of bandwidth) but is always faster in terms of access time which is what users experience.
There is no magical fix here, mostly because of architectural and x86 reasons, both which cannot be changed anytime soon. OpenVPN 3 might help with some OpenVPN architectural changes, so that is your best bet. Putting 'better' hardware in a box only does a little for VPN speeds. Spending 10x more money might get you only 5% more speed, and it gets worse as you get higher.
Should I be looking at cache response time (if context switches only happen in cache) or should I look into RAM response time( if there is RAM interaction in this context switch)?
There is no single component that does it all, and there also is no guarantee on CPU behaviour with specific programs. That's the whole issue here: it's not just some specific action on a specific port on a specific device that makes OpenVPN either slow or fast. It's everything. Unless you are into low level system architecture and design, there is very little you can do to either fix it in code or in hardware in this case.
If you really really really really want more information, just hook into a OpenVPN process with a debugger/tracer and start recording call performance etc. Not sure on what OS you'll be doing it, but check things like strace, dtrace, gdb, valgrind etc.
-
I agree it reads like you're looking for an answer that doesn't exist here.
If you want the highest OpenVPN speeds you can have, get the fastest single thread performance CPU you can afford. Though as said above spending twice as much will probably not result in twice the throughput.
Steve