Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    PFSense <–> PFsense: IPSEC Tunnels Losing Connectivity

    Scheduled Pinned Locked Moved IPsec
    15 Posts 10 Posters 38.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Z
      Zeon
      last edited by

      Hey guys,
      I'm wondering if anyone has experienced anything like this. I have a number of PFsense routers for our organization (about 7) at each of our physical sites and we have inter site VPN on all of them. Some run as physical machines and others as virtual (under ESXi). I am having trouble on a number where the IPSEC tunnels say they are up (green arrow under diagnostics) but won't pass any traffic. This affects only some of the tunnels but is quite annoying. Simply unticking and reticking the "enable ipsec" button solves the problem with the tunnels coming back up and passing traffic within 2-3 seconds. Anyone experienced anything like this?

      On this router I have the following in the log. Please note that connectivity was lost between 0700-0800 (I restarted IPSEC at 0800).

      Jan 10 07:07:17 racoon: INFO: received Vendor ID: DPD
      Jan 10 07:07:17 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Selected NAT-T version: RFC 3947
      Jan 10 07:07:17 racoon: [Self]: [116.90.136.91] INFO: Hashing 116.90.136.91[500] with algo #2
      Jan 10 07:07:17 racoon: INFO: NAT-D payload #-1 verified
      Jan 10 07:07:17 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Hashing 60.234.X.X[500] with algo #2
      Jan 10 07:07:17 racoon: INFO: NAT-D payload #0 verified
      Jan 10 07:07:17 racoon: INFO: NAT not detected
      Jan 10 07:07:17 racoon: [northcote.example.co.nz]: [60.234.X.X] NOTIFY: couldn't find the proper pskey, try to get one by the peer's address.
      Jan 10 07:07:17 racoon: INFO: Adding remote and local NAT-D payloads.
      Jan 10 07:07:17 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Hashing 60.234.X.X[500] with algo #2
      Jan 10 07:07:17 racoon: [Self]: [116.90.136.91] INFO: Hashing 116.90.136.91[500] with algo #2
      Jan 10 07:07:17 racoon: [northcote.example.co.nz]: INFO: ISAKMP-SA established 116.90.136.91[500]-60.234.X.X[500] spi:966bd13e75469a53:a53165f1591f7419
      Jan 10 08:03:31 racoon: INFO: @(#)ipsec-tools 0.8.0 (http://ipsec-tools.sourceforge.net)
      Jan 10 08:03:31 racoon: INFO: @(#)This product linked OpenSSL 0.9.8n 24 Mar 2010 (http://www.openssl.org/)
      Jan 10 08:03:31 racoon: INFO: Reading configuration from "/var/etc/racoon.conf"
      Jan 10 08:03:31 racoon: [Self]: INFO: 116.90.136.91[4500] used for NAT-T
      Jan 10 08:03:31 racoon: [Self]: INFO: 116.90.136.91[4500] used as isakmp port (fd=14)
      Jan 10 08:03:31 racoon: [Self]: INFO: 116.90.136.91[500] used for NAT-T
      Jan 10 08:03:31 racoon: [Self]: INFO: 116.90.136.91[500] used as isakmp port (fd=15)
      Jan 10 08:03:31 racoon: INFO: unsupported PF_KEY message REGISTER
      Jan 10 08:03:31 racoon: NOTIFY: no in-bound policy found: 60.234.74.32/29[0] 192.168.1.0/24[0] proto=any dir=in
      Jan 10 08:03:31 racoon: [northcote.example.co.nz]: INFO: IPsec-SA request for 60.234.X.X queued due to no phase1 found.
      Jan 10 08:03:31 racoon: [northcote.example.co.nz]: INFO: initiate new phase 1 negotiation: 116.90.136.91[500]<=>60.234.X.X[500]
      Jan 10 08:03:31 racoon: INFO: begin Aggressive mode.
      Jan 10 08:03:31 racoon: INFO: received Vendor ID: RFC 3947
      Jan 10 08:03:31 racoon: INFO: received broken Microsoft ID: FRAGMENTATION
      Jan 10 08:03:31 racoon: INFO: received Vendor ID: DPD
      Jan 10 08:03:31 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Selected NAT-T version: RFC 3947
      Jan 10 08:03:31 racoon: [Self]: [116.90.136.91] INFO: Hashing 116.90.136.91[500] with algo #2
      Jan 10 08:03:31 racoon: INFO: NAT-D payload #-1 verified
      Jan 10 08:03:31 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Hashing 60.234.X.X[500] with algo #2
      Jan 10 08:03:31 racoon: INFO: NAT-D payload #0 verified
      Jan 10 08:03:31 racoon: INFO: NAT not detected
      Jan 10 08:03:31 racoon: [northcote.example.co.nz]: [60.234.X.X] NOTIFY: couldn't find the proper pskey, try to get one by the peer's address.
      Jan 10 08:03:31 racoon: INFO: Adding remote and local NAT-D payloads.
      Jan 10 08:03:31 racoon: [northcote.example.co.nz]: [60.234.X.X] INFO: Hashing 60.234.X.X[500] with algo #2
      Jan 10 08:03:31 racoon: [Self]: [116.90.136.91] INFO: Hashing 116.90.136.91[500] with algo #2
      Jan 10 08:03:31 racoon: [northcote.example.co.nz]: INFO: ISAKMP-SA established 116.90.136.91[500]-60.234.X.X[500] spi:d7aaa4f1bb667250:a25dfd17c6719ac3
      Jan 10 08:03:32 racoon: [northcote.example.co.nz]: INFO: initiate new phase 2 negotiation: 116.90.136.91[500]<=>60.234.X.X[500]
      Jan 10 08:03:32 racoon: [northcote.example.co.nz]: INFO: IPsec-SA established: ESP 116.90.136.91[500]->60.234.X.X[500] spi=161069962(0x999bb8a)
      Jan 10 08:03:32 racoon: [northcote.example.co.nz]: INFO: IPsec-SA established: ESP 116.90.136.91[500]->60.234.X.X[500] spi=131905850(0x7dcb93a)
      Jan 10 08:03:33 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
      Jan 10 08:03:33 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
      Jan 10 08:03:38 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
      Jan 10 08:03:38 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
      Jan 10 08:03:43 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
      Jan 10 08:03:43 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
      Jan 10 08:03:48 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
      Jan 10 08:03:48 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.
      Jan 10 08:03:53 racoon: [northcote.example.co.nz]: [60.234.X.X] ERROR: unknown Informational exchange received.

      1 Reply Last reply Reply Quote 0
      • Z
        Zeon
        last edited by

        Oh and just to clarify that none of the PFsenses are behind firewalls and all have public IPs on the WAN interface.

        1 Reply Last reply Reply Quote 0
        • L
          lexl
          last edited by

          We are experiencing the same problem. We have one pfSense in our datacenter, one pfSense in our office and another 3rp party ipsec vpn at a customers site. Both the tunnel from our office to the datacenter as the tunnel from the customer to the datacenters shows this problem.
          Sometime the tunnels stay up for a couple of days, in some cases we have restart the ipsec several times a day.

          We are using version 2.0.1 on both psSense boxes. We have experimented a bit with pinging a host to keep the tunnel open, also enable/disable DPD. Sofar we still have the problem.

          This is very annoying as it make the VPN unusable.

          Has anyone found a solution ?

          Lex

          1 Reply Last reply Reply Quote 0
          • P
            pfsensedummie
            last edited by

            Same problem in here. Seems all the issues we are having are related to the 2.00 and 2.01 versions.

            Ipsec tunnels OK but no data traffic.

            In our office we run a 1.2.3 box with 30+ IPSEC connections. Timeouts occur thus far only on connections that have 2.00 or above. When I look within the log all the failure of traffic happens on exact the same intervals. There are no schedules or whatever running on the pfsense boxes.

            Could this be an issue with the racoonservice?

            It's a very strange issue which is occurring lately. If there is a solution I would really like to know.

            edit…

            Here is the interval from our monitoring service:

            1 Reply Last reply Reply Quote 0
            • K
              katdrvr
              last edited by

              I am having the same issue since the upgrade to v2.0. Tried 2.0.1 but that did not fix the issue. I have to restart raccoon every couple of hours.

              1 Reply Last reply Reply Quote 0
              • C
                csnf
                last edited by

                Same issue here.  I have 2.0 on the main IPSec tunnel and 1.2.3 on 8 different machines and randomly stop sending data across the tunnel.  I have to restart raccoon to get things working again. This only happens when I upgraded to 2.0.  I hope somebody can isolate this issue.

                1 Reply Last reply Reply Quote 0
                • Z
                  Zeon
                  last edited by

                  Hey guys,
                  Just to let you all know I'm going to try what was suggested in this thread:
                  http://forum.pfsense.org/index.php/topic,41617.0.html

                  So remove the NAT-T traversal and dead peer detection and see how that goes.

                  1 Reply Last reply Reply Quote 0
                  • J
                    jmarquez
                    last edited by

                    Hi all.

                    Same frustrating problem here with 2 VPN using pfSense 2.0.1 in all sides.

                    I read in some post that this only happens from version 2.0 up, so I might downgrade to 1.2.3 as this issue makes unusable the VPN connection.

                    Hope this is fixed soon.

                    Regards,
                    Jesus

                    1 Reply Last reply Reply Quote 0
                    • C
                      cmb
                      last edited by

                      @jmarquez:

                      I read in some post that this only happens from version 2.0 up, so I might downgrade to 1.2.3 as this issue makes unusable the VPN connection.

                      That's not true, it happens on occasion with every IPsec implementation on every device in the world. 2.0.x does not have any general IPsec problems. It's most always related to misconfigurations. Most commonly, mismatched lifetimes on P1 and/or P2 for the symptoms described here, though at times it can be circumstances where you need DPD enabled.

                      There isn't enough info here on any of the reported issues to troubleshoot, and every issue is likely a different cause, so if you're having issues please start your own thread with specifics - IPsec logs from both sides in particular.

                      Zeon - this one's your thread, post your IPsec logs from the other end. The bit shown here just shows one end renegotiated successfully.

                      1 Reply Last reply Reply Quote 0
                      • J
                        jmarquez
                        last edited by

                        Don't get me wrong cmb.

                        I'm really happy using pfSense. I think that it is a great peace of code.
                        I agree with you about every person's issue related to ipSec. My issue is similar to the ones related on this thread just in the fact that tunnels drop randomly.

                        In my particular problem, I followed the steps described by Zeon post (http://forum.pfsense.org/index.php/topic,41617.0.html) and the tunnel have not dropped so far.

                        All the best.

                        1 Reply Last reply Reply Quote 0
                        • Z
                          Zeon
                          last edited by

                          @cmb:

                          @jmarquez:

                          I read in some post that this only happens from version 2.0 up, so I might downgrade to 1.2.3 as this issue makes unusable the VPN connection.

                          That's not true, it happens on occasion with every IPsec implementation on every device in the world. 2.0.x does not have any general IPsec problems. It's most always related to misconfigurations. Most commonly, mismatched lifetimes on P1 and/or P2 for the symptoms described here, though at times it can be circumstances where you need DPD enabled.

                          There isn't enough info here on any of the reported issues to troubleshoot, and every issue is likely a different cause, so if you're having issues please start your own thread with specifics - IPsec logs from both sides in particular.

                          Zeon - this one's your thread, post your IPsec logs from the other end. The bit shown here just shows one end renegotiated successfully.

                          Hi CMB,
                          Firstly, I can say after a few days of disabled DPD and NT-T that I have had no further dropouts and couldn't be happier. This is true across 6 separate tunnels with some having latency of 1ms and others as high as 30ms (throughput of the internet connections is anywhere between 100mbps to 30mbps).

                          Unfortunately i don't have the logs of the problem anymore but will try to recreate them one weekend for the benefit of the other users on here.

                          Out of interest when is DPD needed? I have had situation where I have knocked a cable out for up to 10 seconds and the tunnel still seems to work fine once I plug back in?

                          1 Reply Last reply Reply Quote 0
                          • C
                            cmb
                            last edited by

                            @Zeon:

                            Firstly, I can say after a few days of disabled DPD and NT-T that I have had no further dropouts and couldn't be happier. This is true across 6 separate tunnels with some having latency of 1ms and others as high as 30ms (throughput of the internet connections is anywhere between 100mbps to 30mbps).

                            Disabling NAT-T where you don't need it is a good thing to do. For DPD, as long as it's enabled on both sides with the same settings you should be good. That's what we use on all ours internally.

                            @Zeon:

                            Unfortunately i don't have the logs of the problem anymore but will try to recreate them one weekend for the benefit of the other users on here.

                            Out of interest when is DPD needed? I have had situation where I have knocked a cable out for up to 10 seconds and the tunnel still seems to work fine once I plug back in?

                            Circumstances where one end drops an SA and the other doesn't recognize when that SA is no longer valid is where DPD fixes having to force restart one or both ends. That may be a reboot on one side or the other (primarily an unplanned one like a power outage or yanking the plug, an orderly reboot should tell the other end to clear it), or an IP change on one of the sides where there are dynamic WANs. Those are the two most common that I can think of offhand. Just knocking a cable out for a few seconds or minutes even is no big deal, unless you happen to get a new IP when it's reconnected (with dynamic WANs, the link up will force reconnect to your ISP, which with some will get you a new IP). If you still have the same IP, the existing SA is still valid and will work fine.

                            1 Reply Last reply Reply Quote 0
                            • M
                              maldex
                              last edited by

                              struggeling across this thread reminds me of the same issue i had a while ago as well, quite annoying, including against Astaro 8.2.
                              I wouldn't vow this but crosschecking my config now, one of the configuration change leftovers since the performance tests we did quiet a while ago (<v2.01) is="" that="" we're="" using="" <em="">Blowfish in Phase1 now. It never happened again so i completly forgot about this. I'm using the my 2.01(dyn-IP) now against both, pfsense 2.01(also dyn-ip) and Astaro V8.3 (fixed-ip):

                              • All have public IPs (not nat involved, Nat Traversal disabled)
                              • Default Mutual PSK, Main mode (btw i thought this cannot work with ipsec by definition? well done!!! :)) , My & Peer IP Address, Default Policy Gen. and Proposal Checking.
                              • Phase1:
                                – Encryption algorithm:  Blowfish 256
                                – Hash algorithm: SHA1
                                – DH key group: 5   and  Lifetime: 86400
                                – DPD: Enabled, 10 Detection and 5 retries
                              • Phase2:
                                – Encryption algorithms: AES 256  (Only this, no other proposal)
                                – Hash algorithms: MD5  (Only this, no other proposal)
                                – PFS key group: 5 and  Lifetime: 86400.  Auto Ping remote Host is set

                              yes, not the same encryption and hashing in phase 1 and 2, but even the one with 2xphase2 works stable now. Sorry, can't provide more details,

                              I'll let you guys know if i encounter a 'stalled' vpn again.

                              cheers
                              Josh</v2.01)>

                              1 Reply Last reply Reply Quote 0
                              • B
                                boogieshafer
                                last edited by

                                on the pfsense side, try setting the P1 Policy Generation to "unique"

                                i was having similar issues for subequent reconnects for the Shrew client where restarting the pfsense ipsec process would clear the issue

                                i did NOT need to disable NAT-T or DPD, just changing the P1 Policy Generation setting from "default" to "unique" was the only change i made

                                1 Reply Last reply Reply Quote 0
                                • D
                                  dhatz
                                  last edited by

                                  It seems that several people are reporting IPsec VPN issues with pfsense 2.x (note: which includes the recent ipsec-tools 0.8.0). While some problems may be due to misconfiguration (e.g. the racoon / mpd conflict), the pfsense<->pfsense VPN scenario should be trouble-free.

                                  As most of the problems posted here seem to be related to rekeying,  I've been searching the ipsec-tools-devel mailing lists for clues. Check the following discussions:

                                  http://old.nabble.com/why-is-SA-lifetime-kilobyte-limit-disabled-in-racoon–td31648198.html

                                  Even if Node-A think IPsec-SA is expired at this time, Node-B doen't
                                  think so. i.e. the states of IPsec-SA is mismatched.

                                  Understand – similar things already happen with time-based
                                  lifetimes if there is a clock skew between the two boxes.
                                  (This is particulary bad if the oldest available SA is used
                                  by the kernel.)

                                  Racoon's strategy of rekeying is "Initiator do it." If Node-B
                                    is responder, Node-A doesn't start rekeying even if IPsec-SA is
                                    expired.
                                  That sounds like a bug in racoon.  It seems that if either end is
                                  unsatisfied with the SA, that end should trigger a new one.

                                  I'd also call this a shortcoming at least. The standards are
                                  weak, and one doesn't know how other implementations behave.
                                  It would be safer if both sides did care about renegotiations.

                                  But the key
                                  question is what the other implementions do, and what the standard says.

                                  I've just tried OpenBSD's isakmpd (the oldish version in pkgsrc).
                                  It initiates a Phase 2 exchange if the soft timeout on its
                                  side expires, even if it was responder initially. (It randomizes
                                  the soft timeouts to minimize the chance that both sides start
                                  the exchange simultanously.)
                                  PFC2409 says that both sides can initiate rekeying. "Can" --
                                  this is not much of a guideline for implementors.

                                  I can see the argument that especially with a 24h or less
                                  lifetime, AES doesn't need volume-based rekeying.

                                  OK, I was more concerned about interoperability. What if
                                  the other side insists in some volume limit?

                                  I've just tried OpenBSD's isakmpd (the oldish version in pkgsrc).
                                  It initiates a Phase 2 exchange if the soft timeout on its
                                  side expires, even if it was responder initially. (It randomizes
                                  the soft timeouts to minimize the chance that both sides start
                                  the exchange simultanously.)
                                  PFC2409 says that both sides can initiate rekeying. "Can" --
                                  this is not much of a guideline for implementors.

                                  True, but it seems the original responder initiating a renegotiation is
                                  the only reasonable behavior.

                                  At the very least, it would appear to suggest that if the original
                                  initiator rejects an attempt on the part of the original responder to
                                  rekey, that's a bug.

                                  True, but it seems the original responder initiating a renegotiation is
                                  the only reasonable behavior.

                                  If both side start rekeying at same time, there is/was a problem of
                                  SA selection.

                                  The two rekeying session makes two pair of IPsec-SAs. racoon can
                                  do this, and IPsec implementations (kernel side) do one of following:

                                  a. Use oldest IPsec-SA to send and keep all IPsec-SAs to receive(KAME)
                                  b. Use newest IPsec-SA to send and keep all IPsec-SAs to receive(Fast IPsec)
                                  c. Use newest IPsec-SA to send/receive and purge older IPsec-SAs

                                  Of cause, c. is bad behavior, but small implementations(kernel side)
                                  may handle only one sessions and one key pair at a time.
                                  Standards don't prohibit this. This problem is exist between IKE
                                  standards and IPsec standards. It seems IKEv2 makes this more clean.

                                  Today, most implementations select b. or have configuration for it.
                                  And racoon isn't used on other than KAME, Fast IPsec, or Linux(a. or b.)
                                  I think your logic actually works fine. But racoon is old product,
                                  so it doesn't catch recent trends up.

                                  http://marc.info/?l=ipsec-tools-devel&m=129905181832157&w=2
                                  http://marc.info/?l=ipsec-tools-devel&m=129916127621017&w=2

                                  let me revive the discussion on an active negotiation,
                                  as opposed to a passive daemon. Until recently my use
                                  of IPsec was tied to isakmpd, ipsecctl, and OpenBSD
                                  and my views are conditioned by this fact. There the
                                  IPsec daemon is normally active in initiating its
                                  negotiations at startup, unless told to configure
                                  a passive listener for a particular tunnel/transport.
                                  At the other extreme there is even a so called
                                  active-only setting.

                                  The implicit and default setting in racoon-0.7.3 is
                                  "passive off", but this still waits for a demand to be
                                  detected. Thus the mode is better described as "passive
                                  until harshly bugged to get going"! The need to ping
                                  and wait for a ridiculously long delay should not be
                                  acceptable in most circumstances. Forgive me for the
                                  critisism, but to me this is a design flaw. It is a
                                  question of dependability and of trust to erect the
                                  desired IPsec tunnels already at booting time.

                                  Funny: when we tried to switch from racoon to isakmpd at work, a long
                                  long time ago, this is one of the things we noticed on our TODO list:
                                  patch isakmpd to negociate SAs only when traffic comes to the tunnel :-)

                                  And this is how things should (can ?) be done according to RFC 2367
                                  which provide SADB_ACQUIRE PFkey message….

                                  Now, doing comparative browsing in the sources 0.7.3
                                  and 0.8, the actual use of the variable PASSIVE in
                                  "struct remoteconf" has indeed expanded somewhat.
                                  Is the code progressing or maturing into a state
                                  that allows an actively negotiating daemon? I.e.,
                                  without waiting for traffic demand before commencing?

                                  Not afaik.
                                  Feel free to provide a patch for that, this would not be so
                                  complicated to parse all config and start negociation for needed
                                  tunnels, but there are also setups where we want to have tunnels
                                  negociated only when needed (so when traffic comes to the tunnel), so
                                  a patch will need to provide this feature as optional.
                                  The best would be to have a peer-based (or sainfo based ?) token for
                                  that.

                                  Please also note that this is quite easy to also generate dummy
                                  traffic for the needed tunnels when you activate the configuration if
                                  you want.
                                  And of course generate dummy traffic from time to time to ensure the
                                  tunnel will always be up.

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.