PFSense <–> PFsense: IPSEC Tunnels Losing Connectivity



  • Hey guys,
    I'm wondering if anyone has experienced anything like this. I have a number of PFsense routers for our organization (about 7) at each of our physical sites and we have inter site VPN on all of them. Some run as physical machines and others as virtual (under ESXi). I am having trouble on a number where the IPSEC tunnels say they are up (green arrow under diagnostics) but won't pass any traffic. This affects only some of the tunnels but is quite annoying. Simply unticking and reticking the "enable ipsec" button solves the problem with the tunnels coming back up and passing traffic within 2-3 seconds. Anyone experienced anything like this?

    On this router I have the following in the log. Please note that connectivity was lost between 0700-0800 (I restarted IPSEC at 0800).

    Jan 10 07:07:17 racoon: INFO: received Vendor ID: DPD
    Jan 10 07:07:17 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Selected NAT-T version: RFC 3947
    Jan 10 07:07:17 racoon: [Self]: [116.90.136.91] INFO: Hashing 116.90.136.91[500] with algo #2
    Jan 10 07:07:17 racoon: INFO: NAT-D payload #-1 verified
    Jan 10 07:07:17 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Hashing 60.234.X.X[500] with algo #2
    Jan 10 07:07:17 racoon: INFO: NAT-D payload #0 verified
    Jan 10 07:07:17 racoon: INFO: NAT not detected
    Jan 10 07:07:17 racoon: [northcote.example.co.nz]: [60.234.X.X] NOTIFY: couldn't find the proper pskey, try to get one by the peer's address.
    Jan 10 07:07:17 racoon: INFO: Adding remote and local NAT-D payloads.
    Jan 10 07:07:17 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Hashing 60.234.X.X[500] with algo #2
    Jan 10 07:07:17 racoon: [Self]: [116.90.136.91] INFO: Hashing 116.90.136.91[500] with algo #2
    Jan 10 07:07:17 racoon: [northcote.example.co.nz]: INFO: ISAKMP-SA established 116.90.136.91[500]-60.234.X.X[500] spi:966bd13e75469a53:a53165f1591f7419
    Jan 10 08:03:31 racoon: INFO: @(#)ipsec-tools 0.8.0 (http://ipsec-tools.sourceforge.net)
    Jan 10 08:03:31 racoon: INFO: @(#)This product linked OpenSSL 0.9.8n 24 Mar 2010 (http://www.openssl.org/)
    Jan 10 08:03:31 racoon: INFO: Reading configuration from "/var/etc/racoon.conf"
    Jan 10 08:03:31 racoon: [Self]: INFO: 116.90.136.91[4500] used for NAT-T
    Jan 10 08:03:31 racoon: [Self]: INFO: 116.90.136.91[4500] used as isakmp port (fd=14)
    Jan 10 08:03:31 racoon: [Self]: INFO: 116.90.136.91[500] used for NAT-T
    Jan 10 08:03:31 racoon: [Self]: INFO: 116.90.136.91[500] used as isakmp port (fd=15)
    Jan 10 08:03:31 racoon: INFO: unsupported PF_KEY message REGISTER
    Jan 10 08:03:31 racoon: NOTIFY: no in-bound policy found: 60.234.74.32/29[0] 192.168.1.0/24[0] proto=any dir=in
    Jan 10 08:03:31 racoon: [northcote.example.co.nz]: INFO: IPsec-SA request for 60.234.X.X queued due to no phase1 found.
    Jan 10 08:03:31 racoon: [northcote.example.co.nz]: INFO: initiate new phase 1 negotiation: 116.90.136.91[500]<=>60.234.X.X[500]
    Jan 10 08:03:31 racoon: INFO: begin Aggressive mode.
    Jan 10 08:03:31 racoon: INFO: received Vendor ID: RFC 3947
    Jan 10 08:03:31 racoon: INFO: received broken Microsoft ID: FRAGMENTATION
    Jan 10 08:03:31 racoon: INFO: received Vendor ID: DPD
    Jan 10 08:03:31 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Selected NAT-T version: RFC 3947
    Jan 10 08:03:31 racoon: [Self]: [116.90.136.91] INFO: Hashing 116.90.136.91[500] with algo #2
    Jan 10 08:03:31 racoon: INFO: NAT-D payload #-1 verified
    Jan 10 08:03:31 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Hashing 60.234.X.X[500] with algo #2
    Jan 10 08:03:31 racoon: INFO: NAT-D payload #0 verified
    Jan 10 08:03:31 racoon: INFO: NAT not detected
    Jan 10 08:03:31 racoon: [northcote.example.co.nz]: [60.234.X.X] NOTIFY: couldn't find the proper pskey, try to get one by the peer's address.
    Jan 10 08:03:31 racoon: INFO: Adding remote and local NAT-D payloads.
    Jan 10 08:03:31 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Hashing 60.234.X.X[500] with algo #2
    Jan 10 08:03:31 racoon: [Self]: [116.90.136.91] INFO: Hashing 116.90.136.91[500] with algo #2
    Jan 10 08:03:31 racoon: [northcote.example.co.nz]: INFO: ISAKMP-SA established 116.90.136.91[500]-60.234.X.X[500] spi:d7aaa4f1bb667250:a25dfd17c6719ac3
    Jan 10 08:03:32 racoon: [northcote.example.co.nz]: INFO: initiate new phase 2 negotiation: 116.90.136.91[500]<=>60.234.X.X[500]
    Jan 10 08:03:32 racoon: [northcote.example.co.nz]: INFO: IPsec-SA established: ESP 116.90.136.91[500]->60.234.X.X[500] spi=161069962(0x999bb8a)
    Jan 10 08:03:32 racoon: [northcote.example.co.nz]: INFO: IPsec-SA established: ESP 116.90.136.91[500]->60.234.X.X[500] spi=131905850(0x7dcb93a)
    Jan 10 08:03:33 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
    Jan 10 08:03:33 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
    Jan 10 08:03:38 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
    Jan 10 08:03:38 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
    Jan 10 08:03:43 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
    Jan 10 08:03:43 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
    Jan 10 08:03:48 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
    Jan 10 08:03:48 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
    Jan 10 08:03:53 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.



  • Oh and just to clarify that none of the PFsenses are behind firewalls and all have public IPs on the WAN interface.



  • We are experiencing the same problem. We have one pfSense in our datacenter, one pfSense in our office and another 3rp party ipsec vpn at a customers site. Both the tunnel from our office to the datacenter as the tunnel from the customer to the datacenters shows this problem.
    Sometime the tunnels stay up for a couple of days, in some cases we have restart the ipsec several times a day.

    We are using version 2.0.1 on both psSense boxes. We have experimented a bit with pinging a host to keep the tunnel open, also enable/disable DPD. Sofar we still have the problem.

    This is very annoying as it make the VPN unusable.

    Has anyone found a solution ?

    Lex



  • Same problem in here. Seems all the issues we are having are related to the 2.00 and 2.01 versions.

    Ipsec tunnels OK but no data traffic.

    In our office we run a 1.2.3 box with 30+ IPSEC connections. Timeouts occur thus far only on connections that have 2.00 or above. When I look within the log all the failure of traffic happens on exact the same intervals. There are no schedules or whatever running on the pfsense boxes.

    Could this be an issue with the racoonservice?

    It's a very strange issue which is occurring lately. If there is a solution I would really like to know.

    edit…

    Here is the interval from our monitoring service:



  • I am having the same issue since the upgrade to v2.0. Tried 2.0.1 but that did not fix the issue. I have to restart raccoon every couple of hours.



  • Same issue here.  I have 2.0 on the main IPSec tunnel and 1.2.3 on 8 different machines and randomly stop sending data across the tunnel.  I have to restart raccoon to get things working again. This only happens when I upgraded to 2.0.  I hope somebody can isolate this issue.



  • Hey guys,
    Just to let you all know I'm going to try what was suggested in this thread:
    http://forum.pfsense.org/index.php/topic,41617.0.html

    So remove the NAT-T traversal and dead peer detection and see how that goes.



  • Hi all.

    Same frustrating problem here with 2 VPN using pfSense 2.0.1 in all sides.

    I read in some post that this only happens from version 2.0 up, so I might downgrade to 1.2.3 as this issue makes unusable the VPN connection.

    Hope this is fixed soon.

    Regards,
    Jesus



  • @jmarquez:

    I read in some post that this only happens from version 2.0 up, so I might downgrade to 1.2.3 as this issue makes unusable the VPN connection.

    That's not true, it happens on occasion with every IPsec implementation on every device in the world. 2.0.x does not have any general IPsec problems. It's most always related to misconfigurations. Most commonly, mismatched lifetimes on P1 and/or P2 for the symptoms described here, though at times it can be circumstances where you need DPD enabled.

    There isn't enough info here on any of the reported issues to troubleshoot, and every issue is likely a different cause, so if you're having issues please start your own thread with specifics - IPsec logs from both sides in particular.

    Zeon - this one's your thread, post your IPsec logs from the other end. The bit shown here just shows one end renegotiated successfully.



  • Don't get me wrong cmb.

    I'm really happy using pfSense. I think that it is a great peace of code.
    I agree with you about every person's issue related to ipSec. My issue is similar to the ones related on this thread just in the fact that tunnels drop randomly.

    In my particular problem, I followed the steps described by Zeon post (http://forum.pfsense.org/index.php/topic,41617.0.html) and the tunnel have not dropped so far.

    All the best.



  • @cmb:

    @jmarquez:

    I read in some post that this only happens from version 2.0 up, so I might downgrade to 1.2.3 as this issue makes unusable the VPN connection.

    That's not true, it happens on occasion with every IPsec implementation on every device in the world. 2.0.x does not have any general IPsec problems. It's most always related to misconfigurations. Most commonly, mismatched lifetimes on P1 and/or P2 for the symptoms described here, though at times it can be circumstances where you need DPD enabled.

    There isn't enough info here on any of the reported issues to troubleshoot, and every issue is likely a different cause, so if you're having issues please start your own thread with specifics - IPsec logs from both sides in particular.

    Zeon - this one's your thread, post your IPsec logs from the other end. The bit shown here just shows one end renegotiated successfully.

    Hi CMB,
    Firstly, I can say after a few days of disabled DPD and NT-T that I have had no further dropouts and couldn't be happier. This is true across 6 separate tunnels with some having latency of 1ms and others as high as 30ms (throughput of the internet connections is anywhere between 100mbps to 30mbps).

    Unfortunately i don't have the logs of the problem anymore but will try to recreate them one weekend for the benefit of the other users on here.

    Out of interest when is DPD needed? I have had situation where I have knocked a cable out for up to 10 seconds and the tunnel still seems to work fine once I plug back in?



  • @Zeon:

    Firstly, I can say after a few days of disabled DPD and NT-T that I have had no further dropouts and couldn't be happier. This is true across 6 separate tunnels with some having latency of 1ms and others as high as 30ms (throughput of the internet connections is anywhere between 100mbps to 30mbps).

    Disabling NAT-T where you don't need it is a good thing to do. For DPD, as long as it's enabled on both sides with the same settings you should be good. That's what we use on all ours internally.

    @Zeon:

    Unfortunately i don't have the logs of the problem anymore but will try to recreate them one weekend for the benefit of the other users on here.

    Out of interest when is DPD needed? I have had situation where I have knocked a cable out for up to 10 seconds and the tunnel still seems to work fine once I plug back in?

    Circumstances where one end drops an SA and the other doesn't recognize when that SA is no longer valid is where DPD fixes having to force restart one or both ends. That may be a reboot on one side or the other (primarily an unplanned one like a power outage or yanking the plug, an orderly reboot should tell the other end to clear it), or an IP change on one of the sides where there are dynamic WANs. Those are the two most common that I can think of offhand. Just knocking a cable out for a few seconds or minutes even is no big deal, unless you happen to get a new IP when it's reconnected (with dynamic WANs, the link up will force reconnect to your ISP, which with some will get you a new IP). If you still have the same IP, the existing SA is still valid and will work fine.



  • struggeling across this thread reminds me of the same issue i had a while ago as well, quite annoying, including against Astaro 8.2.
    I wouldn't vow this but crosschecking my config now, one of the configuration change leftovers since the performance tests we did quiet a while ago (<v2.01) is="" that="" we're="" using="" <em="">Blowfish in Phase1 now. It never happened again so i completly forgot about this. I'm using the my 2.01(dyn-IP) now against both, pfsense 2.01(also dyn-ip) and Astaro V8.3 (fixed-ip):

    • All have public IPs (not nat involved, Nat Traversal disabled)
    • Default Mutual PSK, Main mode (btw i thought this cannot work with ipsec by definition? well done!!! :)) , My & Peer IP Address, Default Policy Gen. and Proposal Checking.
    • Phase1:
      Encryption algorithm:  Blowfish 256
      Hash algorithm: SHA1
      DH key group: 5   and  Lifetime: 86400
      DPD: Enabled, 10 Detection and 5 retries
    • Phase2:
      Encryption algorithms: AES 256  (Only this, no other proposal)
      Hash algorithms: MD5  (Only this, no other proposal)
      PFS key group: 5 and  Lifetime: 86400.  Auto Ping remote Host is set

    yes, not the same encryption and hashing in phase 1 and 2, but even the one with 2xphase2 works stable now. Sorry, can't provide more details,

    I'll let you guys know if i encounter a 'stalled' vpn again.

    cheers
    Josh</v2.01)>



  • on the pfsense side, try setting the P1 Policy Generation to "unique"

    i was having similar issues for subequent reconnects for the Shrew client where restarting the pfsense ipsec process would clear the issue

    i did NOT need to disable NAT-T or DPD, just changing the P1 Policy Generation setting from "default" to "unique" was the only change i made



  • It seems that several people are reporting IPsec VPN issues with pfsense 2.x (note: which includes the recent ipsec-tools 0.8.0). While some problems may be due to misconfiguration (e.g. the racoon / mpd conflict), the pfsense<->pfsense VPN scenario should be trouble-free.

    As most of the problems posted here seem to be related to rekeying,  I've been searching the ipsec-tools-devel mailing lists for clues. Check the following discussions:

    http://old.nabble.com/why-is-SA-lifetime-kilobyte-limit-disabled-in-racoon–td31648198.html

    Even if Node-A think IPsec-SA is expired at this time, Node-B doen't
    think so. i.e. the states of IPsec-SA is mismatched.

    Understand – similar things already happen with time-based
    lifetimes if there is a clock skew between the two boxes.
    (This is particulary bad if the oldest available SA is used
    by the kernel.)

    Racoon's strategy of rekeying is "Initiator do it." If Node-B
      is responder, Node-A doesn't start rekeying even if IPsec-SA is
      expired.
    That sounds like a bug in racoon.  It seems that if either end is
    unsatisfied with the SA, that end should trigger a new one.

    I'd also call this a shortcoming at least. The standards are
    weak, and one doesn't know how other implementations behave.
    It would be safer if both sides did care about renegotiations.

    But the key
    question is what the other implementions do, and what the standard says.

    I've just tried OpenBSD's isakmpd (the oldish version in pkgsrc).
    It initiates a Phase 2 exchange if the soft timeout on its
    side expires, even if it was responder initially. (It randomizes
    the soft timeouts to minimize the chance that both sides start
    the exchange simultanously.)
    PFC2409 says that both sides can initiate rekeying. "Can" --
    this is not much of a guideline for implementors.

    I can see the argument that especially with a 24h or less
    lifetime, AES doesn't need volume-based rekeying.

    OK, I was more concerned about interoperability. What if
    the other side insists in some volume limit?

    I've just tried OpenBSD's isakmpd (the oldish version in pkgsrc).
    It initiates a Phase 2 exchange if the soft timeout on its
    side expires, even if it was responder initially. (It randomizes
    the soft timeouts to minimize the chance that both sides start
    the exchange simultanously.)
    PFC2409 says that both sides can initiate rekeying. "Can" --
    this is not much of a guideline for implementors.

    True, but it seems the original responder initiating a renegotiation is
    the only reasonable behavior.

    At the very least, it would appear to suggest that if the original
    initiator rejects an attempt on the part of the original responder to
    rekey, that's a bug.

    True, but it seems the original responder initiating a renegotiation is
    the only reasonable behavior.

    If both side start rekeying at same time, there is/was a problem of
    SA selection.

    The two rekeying session makes two pair of IPsec-SAs. racoon can
    do this, and IPsec implementations (kernel side) do one of following:

    a. Use oldest IPsec-SA to send and keep all IPsec-SAs to receive(KAME)
    b. Use newest IPsec-SA to send and keep all IPsec-SAs to receive(Fast IPsec)
    c. Use newest IPsec-SA to send/receive and purge older IPsec-SAs

    Of cause, c. is bad behavior, but small implementations(kernel side)
    may handle only one sessions and one key pair at a time.
    Standards don't prohibit this. This problem is exist between IKE
    standards and IPsec standards. It seems IKEv2 makes this more clean.

    Today, most implementations select b. or have configuration for it.
    And racoon isn't used on other than KAME, Fast IPsec, or Linux(a. or b.)
    I think your logic actually works fine. But racoon is old product,
    so it doesn't catch recent trends up.

    http://marc.info/?l=ipsec-tools-devel&m=129905181832157&w=2
    http://marc.info/?l=ipsec-tools-devel&m=129916127621017&w=2

    let me revive the discussion on an active negotiation,
    as opposed to a passive daemon. Until recently my use
    of IPsec was tied to isakmpd, ipsecctl, and OpenBSD
    and my views are conditioned by this fact. There the
    IPsec daemon is normally active in initiating its
    negotiations at startup, unless told to configure
    a passive listener for a particular tunnel/transport.
    At the other extreme there is even a so called
    active-only setting.

    The implicit and default setting in racoon-0.7.3 is
    "passive off", but this still waits for a demand to be
    detected. Thus the mode is better described as "passive
    until harshly bugged to get going"! The need to ping
    and wait for a ridiculously long delay should not be
    acceptable in most circumstances. Forgive me for the
    critisism, but to me this is a design flaw. It is a
    question of dependability and of trust to erect the
    desired IPsec tunnels already at booting time.

    Funny: when we tried to switch from racoon to isakmpd at work, a long
    long time ago, this is one of the things we noticed on our TODO list:
    patch isakmpd to negociate SAs only when traffic comes to the tunnel :-)

    And this is how things should (can ?) be done according to RFC 2367
    which provide SADB_ACQUIRE PFkey message….

    Now, doing comparative browsing in the sources 0.7.3
    and 0.8, the actual use of the variable PASSIVE in
    "struct remoteconf" has indeed expanded somewhat.
    Is the code progressing or maturing into a state
    that allows an actively negotiating daemon? I.e.,
    without waiting for traffic demand before commencing?

    Not afaik.
    Feel free to provide a patch for that, this would not be so
    complicated to parse all config and start negociation for needed
    tunnels, but there are also setups where we want to have tunnels
    negociated only when needed (so when traffic comes to the tunnel), so
    a patch will need to provide this feature as optional.
    The best would be to have a peer-based (or sainfo based ?) token for
    that.

    Please also note that this is quite easy to also generate dummy
    traffic for the needed tunnels when you activate the configuration if
    you want.
    And of course generate dummy traffic from time to time to ensure the
    tunnel will always be up.


Log in to reply