Site-to-Site IKEv2 Slows To Crawl Until Re-established
-
I have a Site-to-Site tunnel set up between an MBT-2220/2.6.0 CE and an SG-1100/22.05.
My 1000/1000 fiber circuits speedtest around ~400/400 on the SG-1100 side, and about 600/600 on the MBT side, because I'm currently router-limited.
Normally the tunnel works great, and I can iperf up to 130 Mbps if there's no other traffic going on between the sites.
But periodically it just slows to a crawl:
./iperf3 -c 192.168.1.3 Connecting to host 192.168.1.3, port 5201 [ 4] local 192.168.0.213 port 52028 connected to 192.168.1.3 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 138 KBytes 1.13 Mbits/sec [ 4] 1.00-2.00 sec 8.55 KBytes 70.0 Kbits/sec [ 4] 2.00-3.00 sec 2.24 MBytes 18.8 Mbits/sec [ 4] 3.00-4.00 sec 97.0 KBytes 792 Kbits/sec [ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec [ 4] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec [ 4] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec [ 4] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec [ 4] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec [ 4] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 2.48 MBytes 2.08 Mbits/sec sender [ 4] 0.00-10.00 sec 2.25 MBytes 1.89 Mbits/sec receiver iperf Done.
Status: IPSec shows that the tunnel is up.
Status: Traffic Graph does not show any traffic that would be saturating either the tunnel or untunneled traffic out to the internet:
If I Disconnect the tunnel, it will spontaneously reconnect, and sometimes the throughput goes back up to a normal > 100 Mbps:
./iperf3 -c 192.168.0.13 Connecting to host 192.168.0.13, port 5201 [ 4] local 192.168.1.100 port 51464 connected to 192.168.0.13 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 15.3 MBytes 128 Mbits/sec [ 4] 1.00-2.00 sec 15.7 MBytes 131 Mbits/sec [ 4] 2.00-3.00 sec 15.6 MBytes 131 Mbits/sec [ 4] 3.00-4.00 sec 15.9 MBytes 133 Mbits/sec [ 4] 4.00-5.00 sec 16.2 MBytes 136 Mbits/sec [ 4] 5.00-6.00 sec 16.4 MBytes 137 Mbits/sec [ 4] 6.00-7.00 sec 14.9 MBytes 125 Mbits/sec [ 4] 7.00-8.00 sec 15.0 MBytes 125 Mbits/sec [ 4] 8.00-9.00 sec 16.1 MBytes 135 Mbits/sec [ 4] 9.00-10.00 sec 15.5 MBytes 130 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 156 MBytes 131 Mbits/sec sender [ 4] 0.00-10.00 sec 156 MBytes 131 Mbits/sec receiver iperf Done.
If I restart the IPSec service on the SG-1100 side, the throughput will go back up to > 100 Mbps.
I did not think to log the CPU or RAM utilization when the tunnel was misbehaving, but here it is on the SG-1100 side after IPSec restart:
Any known issues with IPSec IKEv2 tunnels slowing down?
-
@thewaterbug There is an issue (not officially confirmed but widely spread) with using AES-GCM encryption on the ARM boxes with Safexcel. The symptoms are exactly as you describe.
It happens at random times, and seems to more prevalent if you have more than one Phase2 in your tunnel.
Try and change your encryption on Phse1+2 to AES (Not AES-CGM), and your issue is gone.
-
Thanks for the tip. Anyone know if this is a software bug that can be fixed? AES-CBC reduces my throughput by more than 60%:
./iperf3 -c 192.168.0.13 Connecting to host 192.168.0.13, port 5201 [ 4] local 192.168.1.235 port 63728 connected to 192.168.0.13 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 4.94 MBytes 41.4 Mbits/sec [ 4] 1.00-2.00 sec 5.92 MBytes 49.4 Mbits/sec [ 4] 2.00-3.00 sec 5.49 MBytes 46.3 Mbits/sec [ 4] 3.00-4.00 sec 5.91 MBytes 49.6 Mbits/sec [ 4] 4.00-5.00 sec 4.84 MBytes 40.6 Mbits/sec [ 4] 5.00-6.00 sec 5.32 MBytes 44.6 Mbits/sec [ 4] 6.00-7.00 sec 6.05 MBytes 50.8 Mbits/sec [ 4] 7.00-8.00 sec 6.64 MBytes 55.7 Mbits/sec [ 4] 8.00-9.00 sec 6.04 MBytes 50.6 Mbits/sec [ 4] 9.00-10.00 sec 6.92 MBytes 58.1 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 58.1 MBytes 48.7 Mbits/sec sender [ 4] 0.00-10.00 sec 58.0 MBytes 48.7 Mbits/sec receiver iperf Done.
-
@thewaterbug Whooaa, that was a major difference in throughput :-(
Like I said, it not an officially acknowledged bug, and so far it’s only caused spurious redmine reports that all have different symptoms. It seems to be a pretty rare issue, but I suffer it as well (though without the measurable buffer problem).
See this thread on the issue:
https://forum.netgate.com/topic/174562/aes-cgm-and-stalling-ipsecHere’s one Redmine bug report, but like you can see, it’s not being worked on.
https://redmine.pfsense.org/issues/13074
-
@keyser The gist of it is that CGM causes the ARM boxes to stall seemingly randomly - and all with different settings and symptoms.
One guy had the measurable mbuf_clusters symptom before it stalls. He seems to be able to avoid it by disabling safexcel.
Another guy seems to have the issue often if IPv6 is enabled on the ARM box, and very rarely if not
For me it starts happening quite often if I have more than one Phase two tunnels in a tunnel. Though it will happen - but very rarely - with a single phase 2.Common for all these cases - and several more located here on the forum - is that you can avoid the issue by NOT using AES-GCM. And it has only been happening for users with ARM boxes on at least one end of the tunnels. Also: It is the ARM box you need to restart or restart IPSec on, to wake up the tunnel again.
-
@keyser
Thanks! I just realized that I did my latest iperfing via WiFi instead of Ethernet, so it was invalid. Here are some updated runs over Ethernet, where each is the best of 3 runs while the office is closed, so no other load on the router:AES-GCM/SafeXcel On:
[ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 171 MBytes 143 Mbits/sec sender [ 4] 0.00-10.00 sec 171 MBytes 143 Mbits/sec receiver
AES-GCM/SafeXcel Off:
[ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 66.9 MBytes 56.1 Mbits/sec sender [ 4] 0.00-10.00 sec 66.8 MBytes 56.1 Mbits/sec receiver
AES-CBC-256/SafeExcel Off:
[ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 89.9 MBytes 75.4 Mbits/sec sender [ 4] 0.00-10.00 sec 89.8 MBytes 75.4 Mbits/sec receiver
AES-CBC-256/SafeExcel On:
[ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 114 MBytes 96.0 Mbits/sec sender [ 4] 0.00-10.00 sec 114 MBytes 95.9 Mbits/sec receiver
AES-CBC-128/SafeExcel On:
[ ID] Interval Transfer Bandwidth [ 4] 0.00-10.00 sec 122 MBytes 103 Mbits/sec sender [ 4] 0.00-10.00 sec 122 MBytes 102 Mbits/sec receiver
SHA256 for all AES-CBC configurations. DH Group 14 for all.
So AES-CBC/SafeXcel seems to be a reasonable alternative until this gets fixed. It's still a pretty big performance hit, percentage-wise, but as long as I'm around ~100 Mbps absolute, it'll be tolerable.
Is there a rough equivalence metric for security between AES-GCM and AES-CBC-128, or -256? Or are they all sufficiently secure that the choice doesn't really matter?
-
@thewaterbug Excellent tests and performance comparison :-)
Last time i checked, both AES-128 and AES256 (both in CBC and GCM mode) were considered safe since they need HEAVY supercomputertime to be decrypted.
128bit is “possible” to decrypt with modern supercomputers, but not in a anywhere close to usable timeframe. 256bit is not practically decryptable (we are looking at many many years of superc time). -
-
-
-