Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Site-to-Site IKEv2 Slows To Crawl Until Re-established

    Scheduled Pinned Locked Moved IPsec
    7 Posts 2 Posters 966 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T
      TheWaterbug
      last edited by

      I have a Site-to-Site tunnel set up between an MBT-2220/2.6.0 CE and an SG-1100/22.05.

      My 1000/1000 fiber circuits speedtest around ~400/400 on the SG-1100 side, and about 600/600 on the MBT side, because I'm currently router-limited.

      Normally the tunnel works great, and I can iperf up to 130 Mbps if there's no other traffic going on between the sites.

      But periodically it just slows to a crawl:

      ./iperf3 -c 192.168.1.3
      Connecting to host 192.168.1.3, port 5201
      [  4] local 192.168.0.213 port 52028 connected to 192.168.1.3 port 5201
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec   138 KBytes  1.13 Mbits/sec                  
      [  4]   1.00-2.00   sec  8.55 KBytes  70.0 Kbits/sec                  
      [  4]   2.00-3.00   sec  2.24 MBytes  18.8 Mbits/sec                  
      [  4]   3.00-4.00   sec  97.0 KBytes   792 Kbits/sec                  
      [  4]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec                  
      [  4]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec                  
      [  4]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec                  
      [  4]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec                  
      [  4]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec                  
      [  4]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec                  
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-10.00  sec  2.48 MBytes  2.08 Mbits/sec                  sender
      [  4]   0.00-10.00  sec  2.25 MBytes  1.89 Mbits/sec                  receiver
      
      iperf Done.
      

      Status: IPSec shows that the tunnel is up.

      Status: Traffic Graph does not show any traffic that would be saturating either the tunnel or untunneled traffic out to the internet:

      https://www.kan.org/pictures/TrafficGraph.png

      If I Disconnect the tunnel, it will spontaneously reconnect, and sometimes the throughput goes back up to a normal > 100 Mbps:

      ./iperf3 -c 192.168.0.13
      Connecting to host 192.168.0.13, port 5201
      [  4] local 192.168.1.100 port 51464 connected to 192.168.0.13 port 5201
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec  15.3 MBytes   128 Mbits/sec                  
      [  4]   1.00-2.00   sec  15.7 MBytes   131 Mbits/sec                  
      [  4]   2.00-3.00   sec  15.6 MBytes   131 Mbits/sec                  
      [  4]   3.00-4.00   sec  15.9 MBytes   133 Mbits/sec                  
      [  4]   4.00-5.00   sec  16.2 MBytes   136 Mbits/sec                  
      [  4]   5.00-6.00   sec  16.4 MBytes   137 Mbits/sec                  
      [  4]   6.00-7.00   sec  14.9 MBytes   125 Mbits/sec                  
      [  4]   7.00-8.00   sec  15.0 MBytes   125 Mbits/sec                  
      [  4]   8.00-9.00   sec  16.1 MBytes   135 Mbits/sec                  
      [  4]   9.00-10.00  sec  15.5 MBytes   130 Mbits/sec                  
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-10.00  sec   156 MBytes   131 Mbits/sec                  sender
      [  4]   0.00-10.00  sec   156 MBytes   131 Mbits/sec                  receiver
      
      iperf Done.
      

      If I restart the IPSec service on the SG-1100 side, the throughput will go back up to > 100 Mbps.

      I did not think to log the CPU or RAM utilization when the tunnel was misbehaving, but here it is on the SG-1100 side after IPSec restart:

      https://www.kan.org/pictures/CPURAM.png

      Any known issues with IPSec IKEv2 tunnels slowing down?

      keyserK 1 Reply Last reply Reply Quote 0
      • keyserK
        keyser Rebel Alliance @TheWaterbug
        last edited by

        @thewaterbug There is an issue (not officially confirmed but widely spread) with using AES-GCM encryption on the ARM boxes with Safexcel. The symptoms are exactly as you describe.

        It happens at random times, and seems to more prevalent if you have more than one Phase2 in your tunnel.

        Try and change your encryption on Phse1+2 to AES (Not AES-CGM), and your issue is gone.

        Love the no fuss of using the official appliances :-)

        1 Reply Last reply Reply Quote 1
        • T
          TheWaterbug
          last edited by TheWaterbug

          @keyser

          Thanks for the tip. Anyone know if this is a software bug that can be fixed? AES-CBC reduces my throughput by more than 60%:

          ./iperf3 -c 192.168.0.13
          Connecting to host 192.168.0.13, port 5201
          [  4] local 192.168.1.235 port 63728 connected to 192.168.0.13 port 5201
          [ ID] Interval           Transfer     Bandwidth
          [  4]   0.00-1.00   sec  4.94 MBytes  41.4 Mbits/sec                  
          [  4]   1.00-2.00   sec  5.92 MBytes  49.4 Mbits/sec                  
          [  4]   2.00-3.00   sec  5.49 MBytes  46.3 Mbits/sec                  
          [  4]   3.00-4.00   sec  5.91 MBytes  49.6 Mbits/sec                  
          [  4]   4.00-5.00   sec  4.84 MBytes  40.6 Mbits/sec                  
          [  4]   5.00-6.00   sec  5.32 MBytes  44.6 Mbits/sec                  
          [  4]   6.00-7.00   sec  6.05 MBytes  50.8 Mbits/sec                  
          [  4]   7.00-8.00   sec  6.64 MBytes  55.7 Mbits/sec                  
          [  4]   8.00-9.00   sec  6.04 MBytes  50.6 Mbits/sec                  
          [  4]   9.00-10.00  sec  6.92 MBytes  58.1 Mbits/sec                  
          - - - - - - - - - - - - - - - - - - - - - - - - -
          [ ID] Interval           Transfer     Bandwidth
          [  4]   0.00-10.00  sec  58.1 MBytes  48.7 Mbits/sec                  sender
          [  4]   0.00-10.00  sec  58.0 MBytes  48.7 Mbits/sec                  receiver
          
          iperf Done.
          
          
          keyserK 1 Reply Last reply Reply Quote 0
          • keyserK
            keyser Rebel Alliance @TheWaterbug
            last edited by

            @thewaterbug Whooaa, that was a major difference in throughput :-(

            Like I said, it not an officially acknowledged bug, and so far it’s only caused spurious redmine reports that all have different symptoms. It seems to be a pretty rare issue, but I suffer it as well (though without the measurable buffer problem).

            See this thread on the issue:
            https://forum.netgate.com/topic/174562/aes-cgm-and-stalling-ipsec

            Here’s one Redmine bug report, but like you can see, it’s not being worked on.

            https://redmine.pfsense.org/issues/13074

            Love the no fuss of using the official appliances :-)

            keyserK 1 Reply Last reply Reply Quote 1
            • keyserK
              keyser Rebel Alliance @keyser
              last edited by

              @keyser The gist of it is that CGM causes the ARM boxes to stall seemingly randomly - and all with different settings and symptoms.

              One guy had the measurable mbuf_clusters symptom before it stalls. He seems to be able to avoid it by disabling safexcel.
              Another guy seems to have the issue often if IPv6 is enabled on the ARM box, and very rarely if not
              For me it starts happening quite often if I have more than one Phase two tunnels in a tunnel. Though it will happen - but very rarely - with a single phase 2.

              Common for all these cases - and several more located here on the forum - is that you can avoid the issue by NOT using AES-GCM. And it has only been happening for users with ARM boxes on at least one end of the tunnels. Also: It is the ARM box you need to restart or restart IPSec on, to wake up the tunnel again.

              Love the no fuss of using the official appliances :-)

              T 1 Reply Last reply Reply Quote 1
              • T
                TheWaterbug @keyser
                last edited by

                @keyser
                Thanks! I just realized that I did my latest iperfing via WiFi instead of Ethernet, so it was invalid. Here are some updated runs over Ethernet, where each is the best of 3 runs while the office is closed, so no other load on the router:

                AES-GCM/SafeXcel On:

                [ ID] Interval           Transfer     Bandwidth
                [  4]   0.00-10.00  sec   171 MBytes   143 Mbits/sec                  sender
                [  4]   0.00-10.00  sec   171 MBytes   143 Mbits/sec                  receiver
                

                AES-GCM/SafeXcel Off:

                [ ID] Interval           Transfer     Bandwidth
                [  4]   0.00-10.00  sec  66.9 MBytes  56.1 Mbits/sec                  sender
                [  4]   0.00-10.00  sec  66.8 MBytes  56.1 Mbits/sec                  receiver
                

                AES-CBC-256/SafeExcel Off:

                [ ID] Interval           Transfer     Bandwidth
                [  4]   0.00-10.00  sec  89.9 MBytes  75.4 Mbits/sec                  sender
                [  4]   0.00-10.00  sec  89.8 MBytes  75.4 Mbits/sec                  receiver
                

                AES-CBC-256/SafeExcel On:

                [ ID] Interval           Transfer     Bandwidth
                [  4]   0.00-10.00  sec   114 MBytes  96.0 Mbits/sec                  sender
                [  4]   0.00-10.00  sec   114 MBytes  95.9 Mbits/sec                  receiver
                

                AES-CBC-128/SafeExcel On:

                [ ID] Interval           Transfer     Bandwidth
                [  4]   0.00-10.00  sec   122 MBytes   103 Mbits/sec                  sender
                [  4]   0.00-10.00  sec   122 MBytes   102 Mbits/sec                  receiver
                

                SHA256 for all AES-CBC configurations. DH Group 14 for all.

                So AES-CBC/SafeXcel seems to be a reasonable alternative until this gets fixed. It's still a pretty big performance hit, percentage-wise, but as long as I'm around ~100 Mbps absolute, it'll be tolerable.

                Is there a rough equivalence metric for security between AES-GCM and AES-CBC-128, or -256? Or are they all sufficiently secure that the choice doesn't really matter?

                keyserK 1 Reply Last reply Reply Quote 0
                • keyserK
                  keyser Rebel Alliance @TheWaterbug
                  last edited by

                  @thewaterbug Excellent tests and performance comparison :-)

                  Last time i checked, both AES-128 and AES256 (both in CBC and GCM mode) were considered safe since they need HEAVY supercomputertime to be decrypted.
                  128bit is “possible” to decrypt with modern supercomputers, but not in a anywhere close to usable timeframe. 256bit is not practically decryptable (we are looking at many many years of superc time).

                  Love the no fuss of using the official appliances :-)

                  1 Reply Last reply Reply Quote 1
                  • T TheWaterbug referenced this topic on
                  • T TheWaterbug referenced this topic on
                  • T TheWaterbug referenced this topic on
                  • T TheWaterbug referenced this topic on
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.