2.2.5: stalled site-to-site connection due to rekeying (missing SAD entries)



  • Hi guys,

    I'm running a site-to-site IPSec tunnel (IKEv2) between two pfSense 2.2.5 instances. Both ends of the tunnel are virtualized using CARP, with two physical machines per site, all of them with static WAN IPs. What I'm observing is that during rekeying the connection seems to stall. For instance, a running MySQL query is interrupted. I noticed that the SAD contains more entries than expected and the log contains entries like unable to query SAD entry with SPI foobar: No such file or directory (2). I tried to set one site to be "Responder only" but it didn't help. Here's an example of such a case:

    Site A:

    • WAN: aaa.aaa.aaa.aaa
    • Local subnet 1: bbb.bbb.bbb.bbb
    • Local subnet 2: ccc.ccc.ccc.ccc
    
    Nov 20 05:48:39 charon: 09[KNL] creating rekey job for CHILD_SA ESP/0xcc22a26b/aaa.aaa.aaa.aaa
    Nov 20 05:48:39 charon: 09[KNL] creating rekey job for CHILD_SA ESP/0xcc22a26b/aaa.aaa.aaa.aaa
    Nov 20 05:48:39 charon: 09[IKE] establishing CHILD_SA con1{2}
    Nov 20 05:48:39 charon: 09[IKE] <con1|28>establishing CHILD_SA con1{2}
    Nov 20 05:48:40 charon: 12[IKE] CHILD_SA con1{240} established with SPIs cf623e10_i cbc8a6c7_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 20 05:48:40 charon: 12[IKE] <con1|28>CHILD_SA con1{240} established with SPIs cf623e10_i cbc8a6c7_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 20 05:48:40 charon: 12[KNL] unable to query SAD entry with SPI c1cb505d: No such file or directory (2)
    Nov 20 05:48:40 charon: 12[KNL] <con1|28>unable to query SAD entry with SPI c1cb505d: No such file or directory (2)
    Nov 20 05:48:40 charon: 12[KNL] unable to query SAD entry with SPI c1cb505d: No such file or directory (2)
    Nov 20 05:48:40 charon: 12[KNL] <con1|28>unable to query SAD entry with SPI c1cb505d: No such file or directory (2)
    Nov 20 05:48:40 charon: 12[IKE] closing CHILD_SA con1{238} with SPIs cc22a26b_i (0 bytes) c1cb505d_o (0 bytes) and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 20 05:48:40 charon: 12[IKE] <con1|28>closing CHILD_SA con1{238} with SPIs cc22a26b_i (0 bytes) c1cb505d_o (0 bytes) and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 20 05:48:40 charon: 12[KNL] unable to delete SAD entry with SPI c1cb505d: No such file or directory (2)
    Nov 20 05:48:40 charon: 12[KNL] <con1|28>unable to delete SAD entry with SPI c1cb505d: No such file or directory (2)
    Nov 20 05:52:42 charon: 09[IKE] CHILD_SA con1{241} established with SPIs c21c1a57_i cd708c49_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 20 05:52:42 charon: 09[IKE] <con1|28>CHILD_SA con1{241} established with SPIs c21c1a57_i cd708c49_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 20 05:52:42 charon: 09[KNL] unable to query SAD entry with SPI ca3698a7: No such file or directory (2)
    Nov 20 05:52:42 charon: 09[KNL] <con1|28>unable to query SAD entry with SPI ca3698a7: No such file or directory (2)
    Nov 20 05:52:42 charon: 09[KNL] unable to query SAD entry with SPI ca3698a7: No such file or directory (2)
    Nov 20 05:52:42 charon: 09[KNL] <con1|28>unable to query SAD entry with SPI ca3698a7: No such file or directory (2)
    Nov 20 05:52:42 charon: 09[IKE] closing CHILD_SA con1{239} with SPIs c5dd07c1_i (26968029 bytes) ca3698a7_o (0 bytes) and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 20 05:52:42 charon: 09[IKE] <con1|28>closing CHILD_SA con1{239} with SPIs c5dd07c1_i (26968029 bytes) ca3698a7_o (0 bytes) and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 20 05:52:42 charon: 09[KNL] unable to delete SAD entry with SPI ca3698a7: No such file or directory (2)
    Nov 20 05:52:42 charon: 09[KNL] <con1|28>unable to delete SAD entry with SPI ca3698a7: No such file or directory (2)</con1|28></con1|28></con1|28></con1|28></con1|28></con1|28></con1|28></con1|28></con1|28></con1|28></con1|28> 
    

    Site B (responder only):

    
    Nov 20 05:48:40 charon: 04[IKE] CHILD_SA con1{228} established with SPIs cbc8a6c7_i cf623e10_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 
    Nov 20 05:48:40 charon: 04[IKE] <con1|15>CHILD_SA con1{228} established with SPIs cbc8a6c7_i cf623e10_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 
    Nov 20 05:48:40 charon: 04[KNL] unable to query SAD entry with SPI cc22a26b: No such file or directory (2)
    Nov 20 05:48:40 charon: 04[KNL] <con1|15>unable to query SAD entry with SPI cc22a26b: No such file or directory (2)
    Nov 20 05:48:40 charon: 04[KNL] unable to query SAD entry with SPI cc22a26b: No such file or directory (2)
    Nov 20 05:48:40 charon: 04[KNL] <con1|15>unable to query SAD entry with SPI cc22a26b: No such file or directory (2)
    Nov 20 05:48:40 charon: 04[IKE] closing CHILD_SA con1{226} with SPIs c1cb505d_i (312 bytes) cc22a26b_o (0 bytes) and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 
    Nov 20 05:48:40 charon: 04[IKE] <con1|15>closing CHILD_SA con1{226} with SPIs c1cb505d_i (312 bytes) cc22a26b_o (0 bytes) and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 
    Nov 20 05:48:40 charon: 04[KNL] unable to delete SAD entry with SPI cc22a26b: No such file or directory (2)
    Nov 20 05:48:40 charon: 04[KNL] <con1|15>unable to delete SAD entry with SPI cc22a26b: No such file or directory (2)
    Nov 20 05:52:42 charon: 12[KNL] creating rekey job for CHILD_SA ESP/0xca3698a7/xxx.xxx.xxx.xxx
    Nov 20 05:52:42 charon: 12[KNL] creating rekey job for CHILD_SA ESP/0xca3698a7/xxx.xxx.xxx.xxx
    Nov 20 05:52:42 charon: 06[IKE] establishing CHILD_SA con1{6}
    Nov 20 05:52:42 charon: 06[IKE] <con1|15>establishing CHILD_SA con1{6}
    Nov 20 05:52:42 charon: 12[IKE] CHILD_SA con1{229} established with SPIs cd708c49_i c21c1a57_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 
    Nov 20 05:52:42 charon: 12[IKE] <con1|15>CHILD_SA con1{229} established with SPIs cd708c49_i c21c1a57_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 
    Nov 20 05:52:42 charon: 12[KNL] unable to query SAD entry with SPI c5dd07c1: No such file or directory (2)
    Nov 20 05:52:42 charon: 12[KNL] <con1|15>unable to query SAD entry with SPI c5dd07c1: No such file or directory (2)
    Nov 20 05:52:42 charon: 12[KNL] unable to query SAD entry with SPI c5dd07c1: No such file or directory (2)
    Nov 20 05:52:42 charon: 12[KNL] <con1|15>unable to query SAD entry with SPI c5dd07c1: No such file or directory (2)
    Nov 20 05:52:42 charon: 12[IKE] closing CHILD_SA con1{227} with SPIs ca3698a7_i (12883637 bytes) c5dd07c1_o (0 bytes) and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 
    Nov 20 05:52:42 charon: 12[IKE] <con1|15>closing CHILD_SA con1{227} with SPIs ca3698a7_i (12883637 bytes) c5dd07c1_o (0 bytes) and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 
    Nov 20 05:52:42 charon: 12[KNL] unable to delete SAD entry with SPI c5dd07c1: No such file or directory (2)
    Nov 20 05:52:42 charon: 12[KNL] <con1|15>unable to delete SAD entry with SPI c5dd07c1: No such file or directory (2)</con1|15></con1|15></con1|15></con1|15></con1|15></con1|15></con1|15></con1|15></con1|15></con1|15></con1|15> 
    

    Any idea?

    In general: should a site-to-site tunnel have a fixed initiator/responder setup, IOW, should one site be set to "Responder Only" in phase 1? If one site is set to "Responder Only", should that site have DPD disabled? And related to that: is it expected that the responder site also initiates rekeying as seen in the log above?

    Thanks!



  • I noticed that the SAD contains more entries than expected

    A few more data points:

    • the extra SAD entries always have the respective remote as source and can be seen on both endpoints

    • both endpoints have the same number of (extra) SAD entries

    • the extra SAD entries' SPI don't match any SPIs at the respective remote site

    • some extra SAD entries have 0 bytes of data

    Hope this helps…

    Update: the first two changed last night after I tried setting net.key.preferred_oldsa=0, just in case (obviously didn't help). Now one site has yet another extra local source entry compared to the other site.



  • Darn, I think I found the issue… mea culpa... one of site A's phase 2 entries had the entire LAN network configured as local subnet, not the intended LAN subnet configured at site B. So it was most likely just a simple phase 2 subnet mask mismatch after all!



  • Too bad, that didn't fix it! Here's what I see now:

    • Rekey initiated by the remote site

    • Local site is unable to query SAD entry with SPI foobar: No such file or directory (2) - note that only the "_o" (outgoing?) SPIs seem to be affected

    • No extra SAD entries at both sites so far sigh, they're back

    • Local client loses connection through tunnel, presumably via the SA that got lost

    Site A (local, initiator):

    • WAN: aaa.aaa.aaa.aaa
    • Local subnet 1: bbb.bbb.bbb.bbb
    • Local subnet 2: ccc.ccc.ccc.ccc
    
    Nov 24 10:14:15 charon: 07[IKE] <con1|4>IKE_SA con1[4] established between aaa.aaa.aaa.aaa[aaa.aaa.aaa.aaa]...xxx.xxx.xxx.xxx[xxx.xxx.xxx.xxx]
    Nov 24 10:14:15 charon: 07[IKE] <con1|4>CHILD_SA con1{6} established with SPIs c7f7c055_i cc09d138_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    
    Nov 24 10:56:32 charon: 09[KNL] creating rekey job for CHILD_SA ESP/0xcc09d138/xxx.xxx.xxx.xxx
    Nov 24 10:56:32 charon: 03[IKE] <con1|4>establishing CHILD_SA con1{2}
    Nov 24 10:56:32 charon: 11[IKE] <con1|4>CHILD_SA con1{7} established with SPIs c13e7a48_i c029e068_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 24 10:56:32 charon: 11[IKE] <con1|4>closing CHILD_SA con1{6} with SPIs c7f7c055_i (13728 bytes) cc09d138_o (19536 bytes) and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    
    Nov 24 11:39:23 charon: 14[IKE] <con1|4>CHILD_SA con1{8} established with SPIs c7159f2e_i c1e986b7_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 24 11:39:24 charon: 14[IKE] <con1|4>closing CHILD_SA con1{7} with SPIs c13e7a48_i (20492129 bytes) c029e068_o (14363428 bytes) and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    
    Nov 24 12:21:25 charon: 09[IKE] <con1|4>CHILD_SA con1{9} established with SPIs c93d48db_i caa24cba_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 24 12:21:26 charon: 06[KNL] <con1|4>unable to query SAD entry with SPI c1e986b7: No such file or directory (2)
    Nov 24 12:21:26 charon: 06[IKE] <con1|4>closing CHILD_SA con1{8} with SPIs c7159f2e_i (27820560 bytes) c1e986b7_o (20367820 bytes) and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 24 12:21:26 charon: 06[KNL] <con1|4>unable to delete SAD entry with SPI c1e986b7: No such file or directory (2)
    
    Nov 24 13:03:25 charon: 06[KNL] creating rekey job for CHILD_SA ESP/0xcaa24cba/xxx.xxx.xxx.xxx
    Nov 24 13:03:25 charon: 14[IKE] <con1|4>establishing CHILD_SA con1{2}
    Nov 24 13:03:26 charon: 14[IKE] <con1|4>CHILD_SA con1{10} established with SPIs c1534ca4_i c4273af8_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 24 13:03:26 charon: 14[IKE] <con1|4>closing CHILD_SA con1{9} with SPIs c93d48db_i (21361250 bytes) caa24cba_o (17993000 bytes) and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    
    Nov 24 13:49:39 charon: 13[KNL] creating rekey job for CHILD_SA ESP/0xc1534ca4/xxx.xxx.xxx.xxx
    Nov 24 13:49:39 charon: 03[IKE] <con1|4>establishing CHILD_SA con1{2}
    Nov 24 13:49:40 charon: 13[IKE] <con1|4>CHILD_SA con1{11} established with SPIs c8e22b6a_i c1be74b8_o and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0
    Nov 24 13:49:40 charon: 13[IKE] <con1|4>closing CHILD_SA con1{10} with SPIs c1534ca4_i (34015339 bytes) c4273af8_o (20891432 bytes) and TS ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0 === yyy.yyy.yyy.yyy/24|/0</con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4> 
    

    Site B (remote):

    
    Nov 24 10:14:15 charon: 14[IKE] <con1|4>IKE_SA con1[4] established between xxx.xxx.xxx.xxx[xxx.xxx.xxx.xxx]...aaa.aaa.aaa.aaa[aaa.aaa.aaa.aaa]
    Nov 24 10:14:15 charon: 14[IKE] <con1|4>CHILD_SA con1{6} established with SPIs cc09d138_i c7f7c055_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    
    Nov 24 10:56:33 charon: 07[IKE] <con1|4>CHILD_SA con1{7} established with SPIs c029e068_i c13e7a48_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    Nov 24 10:56:33 charon: 07[IKE] <con1|4>closing CHILD_SA con1{6} with SPIs cc09d138_i (13728 bytes) c7f7c055_o (19536 bytes) and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    
    Nov 24 11:39:23 charon: 07[KNL] creating rekey job for CHILD_SA ESP/0xc13e7a48/aaa.aaa.aaa.aaa
    Nov 24 11:39:23 charon: 07[IKE] <con1|4>establishing CHILD_SA con1{2}
    Nov 24 11:39:24 charon: 12[IKE] <con1|4>CHILD_SA con1{8} established with SPIs c1e986b7_i c7159f2e_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    Nov 24 11:39:24 charon: 12[IKE] <con1|4>closing CHILD_SA con1{7} with SPIs c029e068_i (13049321 bytes) c13e7a48_o (21674744 bytes) and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    
    Nov 24 12:21:25 charon: 15[KNL] creating rekey job for CHILD_SA ESP/0xc7159f2e/aaa.aaa.aaa.aaa
    Nov 24 12:21:25 charon: 08[IKE] <con1|4>establishing CHILD_SA con1{2}
    Nov 24 12:21:26 charon: 07[IKE] <con1|4>CHILD_SA con1{9} established with SPIs caa24cba_i c93d48db_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    Nov 24 12:21:26 charon: 07[IKE] <con1|4>closing CHILD_SA con1{8} with SPIs c1e986b7_i (18524208 bytes) c7159f2e_o (29385416 bytes) and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    
    Nov 24 13:03:26 charon: 15[IKE] <con1|4>CHILD_SA con1{10} established with SPIs c4273af8_i c1534ca4_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    Nov 24 13:03:26 charon: 15[KNL] <con1|4>unable to query SAD entry with SPI c93d48db: No such file or directory (2)
    Nov 24 13:03:26 charon: 15[IKE] <con1|4>closing CHILD_SA con1{9} with SPIs caa24cba_i (16243253 bytes) c93d48db_o (22851496 bytes) and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    Nov 24 13:03:26 charon: 15[KNL] <con1|4>unable to delete SAD entry with SPI c93d48db: No such file or directory (2)
    
    Nov 24 13:49:40 charon: 06[IKE] <con1|4>CHILD_SA con1{11} established with SPIs c1be74b8_i c8e22b6a_o and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0
    Nov 24 13:49:40 charon: 06[IKE] <con1|4>closing CHILD_SA con1{10} with SPIs c4273af8_i (18852299 bytes) c1534ca4_o (35899664 bytes) and TS yyy.yyy.yyy.yyy/24|/0 === ccc.ccc.ccc.ccc/32|/0 bbb.bbb.bbb.bbb/24|/0</con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4></con1|4> 
    

    Observations (with a clear pattern):

    • The first and second rekeyings (1st by initiator, 2nd by responder) work fine

    • The third and fourth rekeyings (3rd by responder, 4th by initiator) fail to close the previous outgoing ("_o") SAD entry at the end that's "responding" to (not initiating) the rekeying

    • The fifth rekeying worked just fine again, so it's a transient issue…

    That brings me back to my original question above:

    In general: should a site-to-site tunnel have a fixed initiator/responder setup, IOW, should one site be set to "Responder Only" in phase 1? If one site is set to "Responder Only", should that site have DPD disabled? And related to that: is it expected that the responder site also initiates rekeying as seen in the log above?

    Right now both ends are again allowed to initiate the connection…

    Any idea?



  • Update: today I even got an extra SAD entry on each side again, accompanied by stalled connections and the other usual symptoms. So, we're back to square one and the whole thing is still unusable…

    Any idea? Anyone?


  • Banned

    You can keep banging your head against the wall for a couple more weeks with the Weakswan thing, or switch to OpenVPN.



  • Yeah, that's what I figured considering the feedback here so far (none) and the number of similar rekeying issues reported in the forum. It's really unfortunate. Am I really the only one experiencing this? My setup isn't really that exotic, right…?

    I'd really be interested in getting IPSec to work reliably but without any guidance it's hard to debug this any further. And, I really need to move this thing to production soon, so I'm going to look into OpenVPN as an alternative - hopefully it works well in conjunction with CARP. In the meantime, please keep your ideas coming. I'm all ears...

    Cheers


  • Banned

    @brevilo:

    Am I really the only one experiencing this? My setup isn't really that exotic, right…?

    No. Had pretty much the same experience with the brand new IPsec implementation. Disaster happening on rekey, or just tunnels losing connectivity altogether for no reason only fixable by reboot. Don't have time to debug such nonsense, nor wade through the tons of psychopatic crap in the logs; whoever designed the logging of this strongswan thing must have been smoking something seriously nasty.)



  • Hi,

    funny  but i have no trouble with strongswan. I have 4 different ipsec tunnels up and running
    1x ike v1 (2x pfsense)
    1x ike v2 (2x pfsense)
    1x ikev1 (pfsense /mikrotik)
    1x ikev2 roadwarrier (android strongswan app)

    Maybe there are a missmatch in Phase 2 (Lifetime?) or the time of the ipsec endpoints are not the same.
    With the mikrotik the setup was a little bit strange because of a working&matching setup.

    regards

    max



  • @MaxHeadroom:

    Maybe there are a missmatch in Phase 2 (Lifetime?)

    Checked n times, and just once more. They match perfectly.

    or the time of the ipsec endpoints are not the same.

    Nope, all set to UTC and synced by NTP, to the second.

    Thanks



  • Do you have two phase2 entries only on one site ?

    ccc.ccc.ccc.ccc/32 <–just a single host 
    (hope not s subset of bbb.bbb.bbb.bbb/24)



  • @MaxHeadroom:

    Do you have two phase2 entries only on one site ?

    Nope, the sites mirror each other.

    ccc.ccc.ccc.ccc/32 <–just a single host 
    (hope not s subset of bbb.bbb.bbb.bbb/24)

    bbb.bbb.bbb.bbb/24 and ccc.ccc.ccc.ccc/32 (yes, a single host) are subnets of the actual site A LAN which is a /9 network. The subnets don't overlap.

    Cheers



  • FYI, I found the following automatic outbound NAT mappings at site B:

    Interface  Source                            Source Port Destination    Destination Port  NAT Address NAT Port    Static Port Description

    WAN        127.0.0.0/8                    *                *                  500                    WAN address *              YES Auto created rule for ISAKMP - localhost to WAN
    WAN        127.0.0.0/8                    *                *                  *                        WAN address *              NO Auto created rule - localhost to WAN
    WAN        yyy.yyy.yyy.yyy/24          *                *                  500                    WAN address *              YES Auto created rule for ISAKMP - LAN to WAN
    WAN        yyy.yyy.yyy.yyy/24          *                *                  *                        WAN address  *                NO Auto created rule - LAN to WAN
    WAN        ###.###.###.###/24  *                *                  500                    WAN address *              YES Auto created rule for ISAKMP - OPT1 to WAN
    WAN        ###.###.###.###/24  *                *                  *                        WAN address *              NO Auto created rule - OPT1 to WAN

    (OPT1 is used for pfsync and XMLRPC Sync)

    Neither do I know how they came about, nor why they appeared only at site B, nor why they would be needed at all - in short, I removed them. After that the tunnel was stable for about 24 hours, without any traffic being sent through it though! When I started to route traffic through it things went south again rather quickly. That tells me that a) those automatic NAT mappings aren't needed and b) they didn't cause the issue either. In fact, the tunnel seems to get unstable when you use it and "works" fine if left alone. Go figure. Anyhow, I also tried to remove the single-host phase 2 child to make things easier but to no avail…

    I now moved on (in fact, back) to OpenVPN - for the time being :P



  • @doktornotor:

    You can keep banging your head against the wall for a couple more weeks with the Weakswan thing, or switch to OpenVPN.

    You can stop claiming issues fixed multiple releases and several months ago are actually still a problem any time now.

    @MaxHeadroom:

    funny  but i have no trouble with strongswan. I have 4 different ipsec tunnels up and running

    Nor do at least 99.999% of others. strongswan had some issues in some cases in early 2.2.x versions, which is why doktornotor is still griping (though he hasn't used it in months so how would he know with anything current), but nearly all of those were fixed several months ago. At this point it's no more problematic than racoon was, the issues are largely the typical misconfigurations, or connectivity problems, or something else unrelated entirely to the keying daemon.

    @brevilo:

    FYI, I found the following automatic outbound NAT mappings at site B:

    If those y.y.y.y or #.#.#.# matched your public IP, you were NATing traffic initiated by the firewall itself, which will cause problems with IPsec.



  • @cmb:

    [If those y.y.y.y or #.#.#.# matched your public IP
    [/quote]
    Of course not. Those are just the LAN and HA-sync networks (see above).

    But since you chimed in, can you shed some light on where these automatic rules came from at (just) one of the two sites?



  • It was switched to manual outbound NAT mode at some point, and those were the source networks that resided on non-Internet connections at that time.



  • Of course, that wasn't my question. I wanted to know how these specific automatic rules came about in the first place. What creates them and why?



  • Exactly what I said. When you switch to manual outbound NAT, the auto-added rules are added as manual entries.



  • I know how automatic rules turn into manual ones. My question is what created the automatic rules in the first place (IOW, what's their root cause?), in particular since they only appeared at one site, without a difference between the sites that could explain them (to me).


Log in to reply