Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Rekey of Child SA Subnet A creates Child SA of Subnet B rendering tunnel unusable

    IPsec
    1
    3
    198
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • U
      unico-dm
      last edited by unico-dm

      Short version
      Site to site VPN. The (expected) rekeying of one Child SA Subnet A creates a new Child SA of Subnet B (instead of refreshing Child SA of the very same Subnet A!) and then deletes the Child SA of Subnet A. Leaving 2 Child SAs of Subnet B and - after a few seconds - a new Child SA of Subnet A. This goes on randomly and creates Child SA duplicates. (Once I saw over 20 Child SAs.) Impact: The Connection with the duplicates doesn't work anymore.

      What goes wrong?
      :Correct process of Child SA rekeying in theory according to my understanding. (Desired):

      • con5 #413 is the old Child SA for 172.17.0.0/23 and will be replaced by con5 #441, the new Child SA for 172.17.0.0/23
      • con5 #299 is the old Child SA for 172.20.0.0/16 and will be replaced by con5 #443, the new Child SA for 172.20.0.0/16

      Result: There is one Child SA for each of our Subnets (OK)

      :Process according to log. (Not desired.):

      • con5 #299 is the old Child SA for 172.20.0.0/16 and will be replaced by #441, the new Child SA for 172.17.0.0/23
      • con5 #443 is the new Child SA for 172.20.0.0/16 and is created a few seconds later because otherwise Child SA for 172.20.0.0/16 is missing
      • con5 #413 is the old Child SA for 172.17.0.0/23 and remains

      Result: There is one Child SA more. One of the Subnets got a duplicate (not OK)

      :Commented details with Log:

      A rekey job is initiated (probably by pfsense) for Child SA con5{299} (172.20.0.0/16). Some data is collected and sent to the peer. Meanwhile a new Child SA con5{441} is created. But there's the problem this new Child SA is of the wrong Subnet (172.17.0.0/23).

      2023-07-11 08:56:51.130151+02:00 charon 54934 08[KNL] creating rekey job for CHILD_SA ESP/0x911b1be3/178.174.xx.xx
      2023-07-11 08:56:51.130216+02:00 charon 54934 06[IKE] <con5|1300> queueing CHILD_REKEY task
      2023-07-11 08:56:51.130236+02:00 charon 54934 06[IKE] <con5|1300> activating new tasks
      2023-07-11 08:56:51.130245+02:00 charon 54934 06[IKE] <con5|1300> activating CHILD_REKEY task
      2023-07-11 08:56:51.130300+02:00 charon 54934 06[CFG] <con5|1300> proposing traffic selectors for us:
      2023-07-11 08:56:51.130309+02:00 charon 54934 06[CFG] <con5|1300> 172.17.0.0/23|/0
      2023-07-11 08:56:51.130316+02:00 charon 54934 06[CFG] <con5|1300> 172.20.0.0/16|/0
      2023-07-11 08:56:51.130328+02:00 charon 54934 06[CFG] <con5|1300> proposing traffic selectors for other:
      2023-07-11 08:56:51.130335+02:00 charon 54934 06[CFG] <con5|1300> 192.168.200.0/24|/0
      2023-07-11 08:56:51.130341+02:00 charon 54934 06[CFG] <con5|1300> 192.168.200.0/24|/0
      2023-07-11 08:56:51.130368+02:00 charon 54934 06[CFG] <con5|1300> configured proposals: ESP:AES_CBC_256/HMAC_SHA2_512_256/MODP_2048/NO_EXT_SEQ
      2023-07-11 08:56:51.130380+02:00 charon 54934 06[IKE] <con5|1300> establishing CHILD_SA con5{441} reqid 13
      2023-07-11 08:56:51.132612+02:00 charon 54934 06[CHD] <con5|1300> CHILD_SA con5{299} state change: INSTALLED => REKEYING
      2023-07-11 08:56:51.132659+02:00 charon 54934 06[ENC] <con5|1300> generating CREATE_CHILD_SA request 62 [ N(REKEY_SA) N(ESP_TFC_PAD_N) SA No KE TSi TSr ]
      2023-07-11 08:56:51.132733+02:00 charon 54934 06[NET] <con5|1300> sending packet: from 194.165.xx.xx[500] to 178.174.xx.xx[500] (528 bytes)
      

      With this (wrong) choice made, con5{441} (172.17.0.0/23) is established basically correctly. And afterwards con5{299} (172.20.0.0/16) is removed.

      2023-07-11 08:56:51.758356+02:00 charon 54934 06[CHD] <con5|1300> CHILD_SA con5{441} state change: INSTALLING => INSTALLED
      2023-07-11 08:56:51.758373+02:00 charon 54934 06[CHD] <con5|1300> adding outbound ESP SA
      2023-07-11 08:56:51.758380+02:00 charon 54934 06[CHD] <con5|1300> SPI 0xe04c955d, src 194.165.xx.xx dst 178.174.xx.xx
      2023-07-11 08:56:51.758425+02:00 charon 54934 06[IKE] <con5|1300> outbound CHILD_SA con5{441} established with SPIs c6e325fc_i e04c955d_o and TS 172.17.0.0/23|/0 === 192.168.200.0/24|/0
      2023-07-11 08:56:51.758434+02:00 charon 54934 06[CHD] <con5|1300> CHILD_SA con5{299} state change: REKEYING => REKEYED
      2023-07-11 08:56:51.758479+02:00 charon 54934 06[IKE] <con5|1300> reinitiating already active tasks
      2023-07-11 08:56:51.758485+02:00 charon 54934 06[IKE] <con5|1300> CHILD_REKEY task
      2023-07-11 08:56:51.758509+02:00 charon 54934 06[IKE] <con5|1300> closing CHILD_SA con5{299} with SPIs c10cf427_i (12538 bytes) 911b1be3_o (1418520 bytes) and TS 172.20.0.0/16|/0 === 192.168.200.0/24|/0
      2023-07-11 08:56:51.758518+02:00 charon 54934 06[IKE] <con5|1300> sending DELETE for ESP CHILD_SA with SPI c10cf427
      2023-07-11 08:56:51.758525+02:00 charon 54934 06[CHD] <con5|1300> CHILD_SA con5{299} state change: REKEYED => DELETING
      2023-07-11 08:56:51.758540+02:00 charon 54934 06[ENC] <con5|1300> generating INFORMATIONAL request 63 [ D ]
      2023-07-11 08:56:51.758589+02:00 charon 54934 06[NET] <con5|1300> sending packet: from 194.165.xx.xx[500] to 178.174.xx.xx[500] (96 bytes)
      2023-07-11 08:56:51.781184+02:00 charon 54934 13[NET] <con5|1300> received packet: from 178.174.xx.xx[500] to 194.165.xx.xx[500] (96 bytes)
      2023-07-11 08:56:51.781276+02:00 charon 54934 13[ENC] <con5|1300> parsed INFORMATIONAL response 63 [ ]
      2023-07-11 08:56:51.781309+02:00 charon 54934 13[IKE] <con5|1300> CHILD_SA closed
      2023-07-11 08:56:51.781337+02:00 charon 54934 13[CHD] <con5|1300> CHILD_SA con5{299} state change: DELETING => DELETED
      2023-07-11 08:56:51.781401+02:00 charon 54934 13[IKE] <con5|1300> activating new tasks
      2023-07-11 08:56:51.781414+02:00 charon 54934 13[IKE] <con5|1300> nothing to initiate
      2023-07-11 08:56:56.781498+02:00 charon 54934 01[IKE] <con5|1300> queueing CHILD_DELETE task
      2023-07-11 08:56:56.781546+02:00 charon 54934 01[IKE] <con5|1300> activating new tasks
      2023-07-11 08:56:56.781562+02:00 charon 54934 01[IKE] <con5|1300> activating CHILD_DELETE task
      2023-07-11 08:56:56.781595+02:00 charon 54934 01[CHD] <con5|1300> CHILD_SA con5{299} state change: DELETED => DESTROYING
      2023-07-11 08:56:56.781745+02:00 charon 54934 01[IKE] <con5|1300> activating new tasks
      2023-07-11 08:56:56.781761+02:00 charon 54934 01[IKE] <con5|1300> nothing to initiate
      

      Last but not least. At some point IPsec seems to realize, something's missing and establishes a new Child SA for 172.20.0.0/16.

      2023-07-11 08:57:15.114638+02:00 charon 54934 06[CHD] <con5|1300> CHILD_SA con5{443} state change: CREATED => INSTALLING
      2023-07-11 08:57:15.117260+02:00 charon 54934 06[CHD] <con5|1300> CHILD_SA con5{443} state change: INSTALLING => INSTALLED
      2023-07-11 08:57:15.117252+02:00 charon 54934 06[IKE] <con5|1300> CHILD_SA con5{443} established with SPIs cd8810b8_i 9d002767_o and TS 172.20.0.0/16|/0 === 192.168.200.0/24|/0
      

      Workarounds

      • Remove/Disconnect all Child SAs. But duplicates will be created again after certain time
      • I've found that setting option "Split connections" helps. (Still verifying! It seems that I have to rebuild the tunnel completely. I can't just set the option and save.)

      Conclusion
      So this is no rekey conflict (I think). Lifetime, Timing or other Child SA settings (like Responder only etc) do not matter. Rekeying seems to go on as expected (well - basically). Rekeying simply seems to make a wrong decision by chosing the wrong combination of Local subnet/Remote subnet randomly.

      I know that sometimes it is necessary to set "Split connections" so IPsec works at all (Cisco ASA as peer). But in my use case it seems that I have to set this option in general. No matter what vendor sits on the other side.

      My questions: Am I missing something obvious? Is this a bug? If this looks like a bug: Where would I register it? (Strongswan or pfSense?)

      Thanks for your opinion on this!

      U 1 Reply Last reply Reply Quote 0
      • U
        unico-dm @unico-dm
        last edited by

        TL;DR
        If you see Child SA duplicates, check Split connections in Phase 1 options. Reason: pfSense may create a Child SA of the wrong traffic selector when rekeying.

        Test scenarios
        As a test I've divided all IPsecs in three groups. For each group I've implemented one of these measures

        • Child SA behaviour to best practice (responder only on one side etc.)
        • Disable DPD (desperate act)
        • Set "Split connections" (SC) in Phase 1

        Test report
        On day one all groups had duplicates during the day. I again removed all duplicates. On day two (overnight) only the first two groups had duplicates. The SC-group was alright. So I changed some IPsecs of the other groups to SC as well. On day three I could confirm that the new SC-IPsecs were duplicate-free as well. On the other hand the ones without SC created 2-5 duplicates overnight.

        Other conclusions/observations
        I think which Child SA is created is random, because

        • duplicates within one IPsec are mostly of the same network. But not always.
        • Child SA id and its order in the configuration has no relation to of which network a duplicate is created

        I say there is no rekey conflict because

        • pfSenses Rekey- (and other) randomization is working well. (I can see this in the status page.) There is no way a conflict could happen so often
        • I see the duplicates in IPsecs that have huge lifetime (24h/8h) and very short lifetime (15min/15min)
        • There are duplicates even if I'm strict an am responder only.

        It doesn't matter what vendor the peer is. There are Fortigate, ZyWall, Sophos, Cisco. Only Cisco worked from the beginning - because Cisco ASA doesn't work without Split Connections:)

        U 1 Reply Last reply Reply Quote 0
        • U
          unico-dm @unico-dm
          last edited by

          No duplicates appeared over the weekend. Seems split connections is the way to go.

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.