Multiple/rogue SADs



  • Hi there.

    I have a problem with an IPsec tunnel between a pfSense 2.3.2 ("router on a stick" running on a first generation NUC) and a Sophos XG virtual appliance (SFOS 16.01.1).

    The tunnel goes up nicely, no problem with that.

    However, sometimes (it's not on each key regeneration) a second SA connexion happens (instead of two lines, one per way in /status_ipsec_sac.php, there's three of them).
    When this happens, a kind of loops happens too: the Sophos considers the connection needs to come down, then closes it then creates a new one and so on.
    The result for users is that each 7-8 seconds, there's a packet loss (because tunnel goes down and up).
    Some users even reported "we can not reach anything" (from branch office to datacenter) while we (admins) were able to connect to the pfSense webUI from the datacenter.

    In order to fix things, we have to either disable the tunnel on Sophos side and wait a couple of minutes (so all the SAD drops) before re-enabling it or find the bad SAD on pfSense side and delete it (while I'm not sure this really solves the issue Sophos-side).

    The problem happened 5-6 times yesterday morning (between 8am and 1pm)…
    But did not happen since 1pm yesterday (nearly 24 hours running ok).

    I've found several threads with people with the same problem, the usual answer seems to be "not the same parameters on each side of the tunnel" (or some "the two sides are using different versions of IPsec daemons that don't go along well, mostly about DPD or "Prefer old IPsec SAs" - that I didn't find in 2.3).

    https://forum.pfsense.org/index.php?topic=48259.0 (older pfSense version)
    https://forum.pfsense.org/index.php?topic=109044.0
    https://forum.pfsense.org/index.php?topic=32385.0 (older pfSense version)
    https://forum.pfsense.org/index.php?topic=35889.0 (older pfSense version)

    Both sides are "main" (not "aggressive"), it's a PSK authentication.
    The Sophos side (static IP, in datacenter) acts as "respond only") and the pfSense side (branch office) is supposed to start the tunnel.

    Here are the parameter on the Sophos side:
    Phase 1
    Encryption                          AES128/MD5
    DH Group                            2
    Key Life                            28800 seconds
    Re-Key margin                      360 seconds
    Randomize Re-Keying Margin by      100%
    DPD                                enable
    Check Peer After Every              30 seconds
    Wait for Response Upto              120 seconds

    Phase 2
    Encryption                          AES128/MD5
    DH Group                            2
    Key life                            3600 seconds

    And on pfSense side:
    Phase 1
    Encryption                                      AES128/MD5
    DH Group                                        2
    Lifetime                                        28800 seconds
    DPD                                            enable
    Delay between requesting peer acknowledgement  10 seconds

    Phase 2
    Encryption                                      AES128/MD5
    DH Group                                        2
    Lifetime                                        3600 seconds
    Automatic ping host                            enabled to an IP on the other subnet

    Do you have any hint on what I could try?


  • Banned

    Tried to play with the "Make before Break" checkbox in advanced setttings?



  • Unfortunately, we're running IKEv1 (as Sophos only handle IKEv1).

    This parameter seems to be IKEv2 related.


  • Banned

    Resolved - Sophos suxxx. ;D There's also the "Configure Unique IDs" to play with. Otherwise, post the logs and maybe someone can decipher some useful info from that mess (certainly not me, I'd have the guys who designed strongswan logging executed instantly).



  • Problem solved.

    In the remote branch there was another device (Sophos XG105) connected to the internet with a buggy 4G connection…
    This device was setup with the same parameter (IPsec initiator) than the pfSense box and was, sometimes (no idea when/why), connecting the main XG appliance.

    The message in the log (main XG appliance) is: "System received a P2 connexion request whose Localsubnet-Remotesubnet configuration conflicts with that of an already established connexion "XXXX-1". System is terminate connection "XXXX-1" to honor the incoming request."

    That message leaded me to thing there was an issue on the pfSense box, trying to start several tunnels.
    It was (obviously) not the case, it was another device...

    Once that other device is shutdown, problem is solved.