IPSec VPN performance slow…
-
I've have dozens of tunnels throughout the world, all using pfSense, and I love it! However, I never seem to be able to get the performance I want out of the VPN connections no matter what I do. Please allow me to provide as much detail as possible to help troubleshoot this problem. I appreciate, in advance, anyone taking time to respond and assist with this issue. ;D
I'll break this down as much as possible. Here goes…
::::::::::::::::::
:: HARDWARE ::
::::::::::::::::::HQ Office: Core 2 Duo 2.0 GHz | 2 GB RAM | 4 x 1 Gbps Intel | 4 GB CF | VPN Accelerator (Hifn 7955) | pfSense 2.0.2 | SHDSL 16/16 Mbps Internet
Remote Office: AMD Geode 500 MHz | 256 MB RAM | 3 x 1 100 Mbps Via | 4 GB CF | No accel card | pfSense 2.0.2 | Cable 50/10 Mbps Internet:::::::::::::::::::::::::
:: CURRENT CONFIG ::
:::::::::::::::::::::::::IPSec Site-to-Site
PH1:
Auth: Mutual PSK
Neg: main
Policy: Default
Proposal: Default
Enc: 3DES
Hash: MD5
DH: 2PH 2:
Proto: ESP
Enc: 3DES
Hash: MD5
PFS: OffWhen I have the link configured in this way I get the following via iperf:
Client connecting to 192.168.4.4, TCP port 5001
TCP window size: 64.0 KByte (default)[156] local 192.168.1.198 port 5599 connected with 192.168.4.4 port 5001
[ ID] Interval Transfer Bandwidth
[156] 0.0-10.3 sec 3.12 MBytes 2.55 Mbits/secUDP shows the following:
–----------------------------------------------------------
Client connecting to 192.168.4.4, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 64.0 KByte (default)[156] local 192.168.1.198 port 51293 connected with 192.168.4.4 port 5001
[ ID] Interval Transfer Bandwidth
[156] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec
[156] Server Report:
[156] 0.0-10.0 sec 1.25 MBytes 1.05 Mbits/sec 8.630 ms 1/ 893 (0.11%)
[156] Sent 893 datagramsObviously I'm doing these tests when both connections are idle and nobody else is using the Internet so they are given the best chance of performing.
Well, the first thing I did to troubleshoot the issue is try to figure out what my processors could crunch the fastest. So I ran openssl speed on each device and received the following results:
HQ Office:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 1659.82k 3480.80k 4840.05k 5335.25k 5504.70k
mdc2 5319.60k 6059.71k 6271.54k 6313.53k 6330.05k
md4 17240.30k 61751.73k 182953.16k 359497.68k 499266.77k
md5 14282.69k 49024.11k 134284.62k 237846.57k 307724.72k
hmac(md5) 17198.61k 57369.41k 149283.13k 249123.43k 309769.22k
sha1 12477.83k 36818.88k 81640.94k 117601.57k 134872.33k
rmd160 11511.13k 33534.44k 72386.00k 101824.57k 115529.89k
rc4 215933.90k 268120.96k 283980.69k 291399.65k 290060.12k
des cbc 44939.77k 46998.29k 47431.21k 47599.64k 47656.99k
des ede3 17001.83k 17173.74k 17233.57k 17243.63k 17241.12k
idea cbc 0.00 0.00 0.00 0.00 0.00
seed cbc 0.00 0.00 0.00 0.00 0.00
rc2 cbc 19782.37k 20331.20k 20485.82k 20582.68k 20465.19k
rc5-32/12 cbc 136187.42k 152410.81k 157617.24k 158551.36k 159041.45k
blowfish cbc 72896.90k 76868.12k 77971.14k 78347.78k 78362.01k
cast cbc 70242.55k 74448.71k 75896.64k 75750.25k 75835.41k
aes-128 cbc 49219.02k 50609.53k 51101.22k 51247.47k 51068.72k
aes-192 cbc 42385.80k 43573.45k 43955.34k 43921.37k 43831.72k
aes-256 cbc 37324.43k 38069.19k 38452.41k 38451.08k 38332.24k
camellia-128 cbc 48012.21k 49951.89k 50456.26k 50615.27k 50535.68 k
camellia-192 cbc 37194.79k 38297.02k 38630.58k 38670.14k 38673.31 k
camellia-256 cbc 37231.60k 38331.17k 38499.48k 38640.83k 38632.70 k
sha256 8397.97k 20822.54k 37781.77k 48714.03k 52936.68k
sha512 3449.34k 13739.54k 20885.03k 29089.56k 32931.82k
aes-128 ige 49748.12k 52399.09k 53206.97k 53414.35k 53404.16k
aes-192 ige 42801.25k 44821.20k 45391.82k 45519.34k 45488.85k
aes-256 ige 37698.81k 39056.68k 39505.62k 39616.58k 39566.29k
sign verify sign/s verify/s
rsa 512 bits 0.000682s 0.000075s 1465.2 13392.6
rsa 1024 bits 0.003240s 0.000180s 308.6 5561.9
rsa 2048 bits 0.018860s 0.000561s 53.0 1782.7
rsa 4096 bits 0.125113s 0.001934s 8.0 517.1
sign verify sign/s verify/s
dsa 512 bits 0.000550s 0.000627s 1819.4 1595.0
dsa 1024 bits 0.001577s 0.001824s 634.2 548.3
dsa 2048 bits 0.005055s 0.006057s 197.8 165.1This device has an accelerator card and so we can see some good numbers here. While the remote site's device is only an AMD 500 Mhz with no accelerator and so obviously won't be as performant and received the following results:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 334.01k 710.90k 1005.26k 1108.05k 1151.45k
mdc2 561.47k 634.79k 659.04k 666.16k 673.44k
md4 2284.33k 7985.06k 22237.18k 40621.68k 53257.10k
md5 1737.32k 5777.80k 15250.69k 26202.77k 32755.54k
hmac(md5) 2020.69k 6529.12k 16518.14k 27100.72k 32963.17k
sha1 1485.90k 4109.91k 8470.16k 11565.00k 12991.89k
rmd160 1485.02k 4131.16k 8613.36k 11930.47k 13347.61k
rc4 22104.75k 26471.34k 27729.55k 28097.17k 28028.23k
des cbc 5942.92k 6286.65k 6385.52k 6426.94k 6396.98k
des ede3 2154.33k 2203.92k 2229.73k 2225.74k 2200.75k
idea cbc 0.00 0.00 0.00 0.00 0.00
seed cbc 0.00 0.00 0.00 0.00 0.00
rc2 cbc 2857.53k 2974.34k 3015.45k 3020.43k 3021.39k
rc5-32/12 cbc 16781.33k 19511.91k 20359.36k 20582.86k 20626.70k
blowfish cbc 9983.13k 11051.41k 11291.04k 11360.61k 11369.82k
cast cbc 8726.10k 9416.21k 9636.47k 9695.25k 9703.13k
aes-128 cbc 5468.26k 5715.25k 5817.59k 5843.74k 5836.81k
aes-192 cbc 4732.20k 4964.46k 5037.87k 5102.09k 5086.74k
aes-256 cbc 4291.03k 4450.14k 4510.26k 4511.12k 4521.81k
camellia-128 cbc 5800.16k 6158.16k 6281.17k 6320.09k 6295.45 k
camellia-192 cbc 4615.07k 4867.09k 4911.12k 4968.32k 4943.24 k
camellia-256 cbc 4561.92k 4826.93k 4914.36k 4873.11k 4878.62 k
sha256 993.95k 2260.63k 3922.08k 4833.08k 5153.44k
sha512 395.29k 1578.48k 2368.90k 3301.85k 3715.76k
aes-128 ige 5565.77k 5908.00k 6017.24k 6058.39k 6056.14k
aes-192 ige 4809.31k 5086.57k 5220.05k 5208.61k 5217.06k
aes-256 ige 4288.23k 4511.80k 4602.72k 4624.59k 4605.74k
sign verify sign/s verify/s
rsa 512 bits 0.006852s 0.000667s 145.9 1500.3
rsa 1024 bits 0.031129s 0.001624s 32.1 615.9
rsa 2048 bits 0.176131s 0.004923s 5.7 203.1
rsa 4096 bits 1.100097s 0.016854s 0.9 59.3
sign verify sign/s verify/s
dsa 512 bits 0.005221s 0.005985s 191.5 167.1
dsa 1024 bits 0.014252s 0.016788s 70.2 59.6
dsa 2048 bits 0.044936s 0.054130s 22.3 18.5Judging from this data, and using 1024 bits, we could presume that when creating an IPSec tunnel we would want to use Blowfish as it can pull just over 11 Mbps on that 500 MHz processor and then use MD5 which outperformed SHA1 by more than 2x.
So, I went back and changed only the encryption and auth protocols so that they were blowfish 128 and MD5. After doing that and restarting raccoon (just for good measure) on the remote site the tunnel reconnected and I reran my iperf tests which got me the following:
–----------------------------------------------------------
Client connecting to 192.168.4.4, TCP port 5001
TCP window size: 64.0 KByte (default)[156] local 192.168.1.198 port 7681 connected with 192.168.4.4 port 5001
[ ID] Interval Transfer Bandwidth
[156] 0.0-10.1 sec 6.88 MBytes 5.72 Mbits/secAs you can see the tunnel improved a bit. But, I'm still not seeing anything near the 11 Mbps that I should be able to acheive with these settings. So the next step I took was to leave the P1 to blowfish 128 & MD5 but to turn the tunnel to AH (no encryption) with MD5. After doing that I received nearly the same results with encryption on. I would think with AH I should get much higher speeds!?
What am I doing wrong that my speeds are going so slow? Does anyone have any guidance for me on this? Thanks again in advance.
-
Cable 50/10 Mbps Internet
That means you have MAX 10 Mbps uplink on that cable connection site! You did not explain is this Cable site the downloading site or uploading site. AND what kind of results do you have from the ISP network in general? (speedtest.net) If you have other traffic ongoing you of course loose some usable bandwith in ipsec tunnel.
-
Clouseau:
Thank you for your response. It seems to be the same in either direction when it comes to file transfers. However, for our purposes, let's say I'm located at the cable site (remote site) and I want to download a single 3 GB file from HQ site. Since my download is 50 Mbps and the uploading site (HQ) can pump out a reliable 16 Mbps all day long I should, theoretically and not counting basic overhead, be able to pull at least 10 Mbps and even upwards of 14+ Mbps most days.
Both ISPs are providing the advertised speeds very reliably over the last couple of years so we can rule them out in this particular issue. Again, I've performed these tests after hours (and watched both sides routers to see the traffic levels - nearly zero at the time of the test) and so have near perfect conditions on which to get the best possible speeds. So I have to assume the problem lies in the protocols, hardware limitations at the router level, or some other issue and that's what I'm trying to troubleshoot.
Again, thanks for the response. Hopefully my answers here can help you to help me. ;)
-
I think it might be your window size. TCP window size: 64.0 KByte (default).
According to your tables the remote MD5 is capable of only 5777.80k/s at that size. I think the problem is as the remote side with the AMD Geode 500MHz. -
Podilarious:
Thank you for your response as well. I was a little confused as to how the openssl speed test was registering the table. I was assuming that the header was for the encryption level I set and not for the size of the TCP window. But now that I'm seeing those speeds almost darned near matching what you posted then it makes much more sense.
That begs the question then, how do I change the TCP window to get to 1024 so I can get the 26 Mbps instead of the 6 Mbps I'm seeing now. Is that something I can do on pfSense or some tweak I need to make to Windows SMB client/servers to make it work? This is great progress, thanks Podilarious!
-
What kind of latency does the IPsec tunnel have ? If it's very high, you'd have to tune your TCP window size accordingly.
I also would start with checking the performance of AMD Geode 500MHz (btw if it's an Alix I seem to remember that it has VPN accel for AES-128, so try using that after enabling it from webGUI System -> Advanced -> Miscellaneous -> Crypto HW). Try testing the IPsec tunnel with a more powerful system.
Finally, on my pfsense 2.1-BETA system the latest openssl 1.0.1 performs much better (30% to 100% faster) than the old openssl 0.9.8, and on pfs2.1 ipsec-tools is compiled with the new openssl 1.0.1e. That might help people with heavily loaded VPN servers.
On 2.1 try:
/usr/bin/openssl speed aes-128-cbc
/usr/local/bin/openssl speed aes-128-cbc -
I think you want -w. See http://doc.pfsense.org/index.php/Iperf_man_page for more details.
dhatz is right, I have seen better performance since moving to 2.1.
I think on most systems that the Window size is auto set … never really have to change that or troubleshoot that before. -
Dhatz:
With the tunnel saturated (currently pushing about 6 Mbps through it) I'm able to get an average of 79 ms which isn't too bad. There are 14 hops between us and pinging outside the tunnel to the routers WAN IP gives me an average of 70 ms so the tunnel has little effect on my ping which is great.
The remote site is using an Alix.2D13 (http://store.netgate.com/-P40.aspx) board. And now that you mention it that site does say it comes with an OCF encryption accelerator. I enabled the Crypto option as you suggest and ran the test again (mind that the tunnel is active so the results will be a little scewed) and got this:
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 329.17k 708.05k 996.65k 1118.89k 1148.68k
mdc2 549.04k 637.85k 659.26k 663.64k 655.26k
md4 2289.13k 7854.97k 21920.32k 40305.54k 52503.86k
md5 1753.12k 5772.70k 15143.99k 26220.17k 32645.97k
hmac(md5) 2004.90k 6419.02k 16479.07k 26775.02k 33190.16k
sha1 1475.25k 4074.76k 8455.58k 11490.71k 12844.19k
rmd160 1468.63k 4096.69k 8569.98k 11790.03k 13403.30k
rc4 22092.32k 26689.23k 27538.92k 27871.09k 27791.23k
des cbc 5988.65k 6292.11k 6409.92k 6530.34k 6423.43k
des ede3 2176.81k 2194.97k 2241.66k 2237.70k 2205.02k
idea cbc 0.00 0.00 0.00 0.00 0.00
seed cbc 0.00 0.00 0.00 0.00 0.00
rc2 cbc 2854.43k 2940.76k 3002.53k 2941.11k 2999.06k
rc5-32/12 cbc 16558.64k 19628.17k 20311.70k 20462.82k 20403.23k
blowfish cbc 9999.56k 10856.40k 11422.71k 11376.40k 11248.24k
cast cbc 8665.50k 9402.30k 9916.11k 9865.39k 9650.14k
aes-128 cbc 5381.94k 5666.46k 5714.98k 5762.96k 5767.86k
aes-192 cbc 4734.72k 4987.94k 4974.26k 5053.12k 5030.78k
aes-256 cbc 4266.21k 4379.63k 4440.91k 4463.24k 4461.63k
camellia-128 cbc 5725.98k 6261.62k 6313.85k 6278.99k 6223.47 k
camellia-192 cbc 4604.60k 4865.82k 4862.31k 4924.04k 4892.54 k
camellia-256 cbc 4502.63k 4857.29k 4870.93k 4862.82k 4926.56 k
sha256 1007.89k 2288.96k 3873.00k 4783.14k 5079.48k
sha512 390.13k 1567.99k 2360.24k 3260.45k 3649.77k
aes-128 ige 5449.03k 5863.30k 6074.94k 6049.77k 6101.76k
aes-192 ige 4723.68k 5036.47k 5225.38k 5217.74k 5220.90k
aes-256 ige 4214.93k 4501.45k 4583.63k 4629.43k 4645.62k
sign verify sign/s verify/s
rsa 512 bits 0.006918s 0.000674s 144.6 1484.0
rsa 1024 bits 0.031551s 0.001653s 31.7 605.0
rsa 2048 bits 0.179939s 0.004950s 5.6 202.0
rsa 4096 bits 1.113613s 0.016874s 0.9 59.3
sign verify sign/s verify/s
dsa 512 bits 0.005288s 0.006044s 189.1 165.5
dsa 1024 bits 0.014283s 0.016861s 70.0 59.3
dsa 2048 bits 0.045229s 0.053605s 22.1 18.7I'm not sure I see much of an improvement, at least for that test. Secondly, I switched over the tunnel to be as follows:
IPSec Site-to-Site
PH1:
Auth: Mutual PSK
Neg: main
Policy: Default
Proposal: Default
Enc: AES (128 bits)
Hash: SHA1
DH: 2PH 2:
Proto: ESP
Enc: AES (128 bits)
Hash: SHA1I'm not seeing much of a difference in the tunnel. Is this the part in ADVANCED -> SYSTEM TUNABLES that I would change and if so what are some options that I should try? Also, would I change this on both sides or just the remote side (As I have other VPNs to other sites as well that I don't want to effect yet)?
net.inet.tcp.recvspace Maximum incoming/outgoing TCP datagram size (receive) default (65228)
net.inet.tcp.sendspace Maximum incoming/outgoing TCP datagram size (send) default (65228)
I really appreciate everyone's help and I'll do my best to provide the data you need to help me. I hope this helps other in the future as well!