Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    60+ tunnels. IKEv2 OK. All IKEv1 randomly drops traffic. Tunnel status stays ok

    IPsec
    4
    13
    2.6k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      BCSE
      last edited by

      I hope someone can point us in the right direction. I think it all started when we upgraded boxes to 2.2.x and started using IKEv2 for IPsec on several connections.

      The IPSec connections we have using IKEv2 are all ok. All the other IPsec connections were ok for years. They all started dropping traffic for several weeks/months now.

      We first thought it could be hardware but we upgrade from ALIX APU to a C2558 system.

      The other side are all ALIX.2D13 boxes or ALIX.APU boxes. Running pfSense 2.0.1 to 2.1.5. The 2.2.x boxes running on IKEv2 are ok.

      The tunnel status is ok on both ends. All traffic is dead. If i disconnect the tunnel on our side it starts reconnecting and after a few seconds everything is working again.

      I tried to take a look at the log on our pfSense but with over 60 tunnels there is a lot of logging. I also don't understand why it worked for years and now it seems like it is happening because we are using IKEv1 & IKEv2 tunnels on our system. Or is it because of using Strongswan at our end and Racoon on the other end?

      A quick look in the log on the other end of some boxes shows 'IPsec-SA expired'. It looks like traffic is dead ather that but i'm not quite sure.

      I hope it's a simple settting that will fix the problem  :D.

      1 Reply Last reply Reply Quote 0
      • J
        jsvg
        last edited by

        On a side note, is this IKEv2 interconnects between pfsense boxen, or is it third party clients?

        1 Reply Last reply Reply Quote 0
        • B
          BCSE
          last edited by

          The other side are all ALIX.2D13 boxes or ALIX.APU boxes. Running pfSense 2.0.1 to 2.1.5. The 2.2.x boxes running on IKEv2 are ok.

          Only using pfSense  ;)

          1 Reply Last reply Reply Quote 0
          • C
            cmb
            last edited by

            If you're not on 2.2.6 already on the 2.2.x side, upgrade. Some of the issues fixed in the latest strongswan version could be the source of occasional IKEv1 rekeying issues.

            1 Reply Last reply Reply Quote 0
            • B
              BCSE
              last edited by

              All the updates are installed as soon as possible so the box is on 2.2.6

              1 Reply Last reply Reply Quote 0
              • C
                cmb
                last edited by

                Ok good.

                So where you have one side logging that its SA is expired, and the other end shows it's still up, you almost certainly have a lifetime mismatch (assuming after the expired log there isn't a log that it renegotiated). If there are no logs on the racoon side beyond the expiry, then there isn't any traffic being initiated from that end across the tunnel, and the other side is still using an old SA that is valid as far as it's concerned.

                In that circumstance, sounds like you're probably not using DPD. Granted on some of the really old versions, at least 2.0.x and earlier, it likely doesn't work reliably. Should have DPD enabled on both sides on at least 2.1.x and newer versions, as that would even recover from a config mismatch, plus a variety of other possible reasons one end might drop an SA and the other end wouldn't.

                1 Reply Last reply Reply Quote 0
                • B
                  BCSE
                  last edited by

                  Just checked 1 of the failing tunnels. 2.2.6 <-> 2.1.5 says it's up on both sides. No traffic.

                  2.1.5 log showing
                  Dec 24 19:43:32 racoon: [xx]: INFO: IPsec-SA expired: ESP x.x.x.x[500]->x.x.x.x[500] spi=3435406966(0xccc42676)
                  Dec 24 19:43:32 racoon: [xx]: INFO: IPsec-SA expired: ESP/Tunnel x.x.x.x[500]->x.x.x.x[500] spi=266953082(0xfe9617a)

                  DPD is ON at all configurations. Even tried to switch it off to see what happens. No luck.
                  Lifetime on phase 1 28800 seconds - default on all the boxes
                  Lifetime on phase 2 3600 seconds - default on all the boxes
                  I'm pretty sure the configurations match because i checked the most of them and they worked for years until now. The mismatched configurations we had were corrected when 2.2 came available (a long time ago). Also 1 or 2 mistakes are possible but 30+ tunnels are failing.

                  Eventually the tunnel wil come back up but this is not within a reasonable time. With over 30 tunnels that are failing it's pretty annoying.

                  1 Reply Last reply Reply Quote 0
                  • C
                    cmb
                    last edited by

                    Ok if they both think they're up, that's different, sounded like from the description one side was expired and down. Do you have the "Prefer old SAs" option enabled on the 2.0.x/2.1.x sides? That's under System>Advanced. That checkbox no longer exists in 2.2.x versions, and was rarely if ever desirable on older versions but got enabled more often than it should have been. It should be disabled.

                    1 Reply Last reply Reply Quote 0
                    • B
                      BCSE
                      last edited by

                      Checked about 10 boxes. Some of them had the option enabled. Unchecked it for now. The ones that were already unchecked are also failing. So i don't think it will help but worth a try.

                      1 Reply Last reply Reply Quote 0
                      • C
                        cmb
                        last edited by

                        Also compare the SPIs between them, Status>IPsec, SAD tab. More than one pair? Should be one entry in each direction. Where there are multiple pairs, and one end was set to prefer old, that would have been a problem up until the old one that end was using expired. The ones where prefer old was enabled, that's certainly been a problem at some point.

                        Comparing the SPIs when both show up but traffic isn't passing would be telling as well.

                        1 Reply Last reply Reply Quote 0
                        • B
                          BCSE
                          last edited by

                          After upgrading 2 remote side boxes to the latest version of pfSense the tunnels are not dropping anymore. IPSec settings (on IKEv1) didn't change during/after upgrade.

                          Before the upgrade, tunnels to these boxes were dropping multiple times a day. So i was thinking if this all could be a problem between Racoon en Strongswan?

                          Checked settings (again) on other (not updated) boxes, "dpd enabled", "prefer old SAs" disabled and matching phase 1 and phase2 settings. Tunnels still dropping….. Didn't had time to dig into the log.

                          1 Reply Last reply Reply Quote 0
                          • T
                            timboau
                            last edited by

                            I have had a similar issue:

                            Connections think they are up
                            'Restarting IPSEC' doesn't seem to fix the links
                            Stopping and starting often does

                            I do see in the logs:
                            rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.  (ALL ARE STATIC)

                            A lot of these errors:
                            no matching CHILD_SA config found  (Starting and stopping IPSEC without altering anything works)
                            Generating QUICK_MODE request 2190518743 [ HASH SA No ID ID ]

                            When trying to stop a single connection and restart from either end:
                            Unable to delete SAD entry with SPI c57b432d: No such file or directory (2)

                            1 Reply Last reply Reply Quote 0
                            • B
                              BCSE
                              last edited by

                              It looks like DPD is the problem. Disabled it on 15 tunnels (both sides). All 15 connections are stable for at least a day now.

                              DPD is still active on the "Strongswan" boxes. Not having any problems with them.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.