Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    IPsec fails to renegotiate after loss of a peer

    IPsec
    15
    71
    52911
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jimp
      jimp Rebel Alliance Developer Netgate last edited by

      I'm starting this thread to sort of consolidate the information available about IPSec tunnels that fail and won't reestablish until after racoon (or the router) is restarted.

      This happens on 1.2.2 and 1.2.3 (and 2.0), and with ipsec-tools 0.7.1 and 0.7.2. This seems to happen regardless of the 'DPD' option for most people, and it happens when connecting to all manner of devices (Other pfSense boxes, Cisco VPN concentrators, SonicWALLs, Fireboxes, &c).

      What appears to happen is that racoon sees the peer go away, and removes the ISAKMP-SA/Phase 1 information for the tunnel, leaving the Phase 2 info present. Once that happens, it never attempts to reestablish Phase 1.

      Here is a debug log from racoon showing the DPD failure, shows the SAD entries are still present, and shows the orphaned Phase 2 entries as racoon is stopped:

      .40 is the pfSense 1.2.3 box, .49 is a Cisco VPN concentrator.

      2009-05-14 15:42:47: DEBUG: DPD R-U-There sent (0)
      2009-05-14 15:42:47: DEBUG: rescheduling send_r_u (5).
      2009-05-14 15:42:52: DEBUG: DPD monitoring....
      2009-05-14 15:42:52: INFO: DPD: remote (ISAKMP-SA spi=4e71454083f1bb01:cb7350a8a507655d) seems to be dead.
      2009-05-14 15:42:52: DEBUG: purging ISAKMP-SA spi=4e71454083f1bb01:cb7350a8a507655d.
      2009-05-14 15:42:52: DEBUG2: getph1byaddr: start
      2009-05-14 15:42:52: DEBUG2: local: x.x.x.40[500]
      2009-05-14 15:42:52: DEBUG2: remote: x.x.x.49[500]
      2009-05-14 15:42:52: DEBUG2: no match
      2009-05-14 15:42:52: DEBUG: call pfkey_send_dump
      2009-05-14 15:42:52: DEBUG: pk_recv: retry[0] recv() 
      2009-05-14 15:42:52: DEBUG: pk_recv: retry[0] recv() 
      2009-05-14 15:42:52: DEBUG: purged ISAKMP-SA spi=4e71454083f1bb01:cb7350a8a507655d.
      2009-05-14 15:42:53: INFO: ISAKMP-SA deleted x.x.x.40[500]-x.x.x.49[500] spi:4e71454083f1bb01:cb7350a8a507655d
      2009-05-14 15:42:53: DEBUG: IV freed
      2009-05-14 15:44:37: DEBUG: msg 1 not interesting
      2009-05-14 15:57:40: DEBUG: msg 1 not interesting
      2009-05-14 16:01:54: DEBUG: msg 1 not interesting
      (In another terminal)
      # setkey -D
      x.x.x.40 x.x.x.49 
              esp mode=any spi=1523277686(0x5acb5f76) reqid=16399(0x0000400f)
              E: 3des-cbc  1b07ff4e 1ee98430 affd28f5 95acadb2 b5aed137 acb9e956
              A: hmac-sha1  72f4ee83 ed2e3bc4 0dd28778 317afdbf 780a4074
              seq=0x0000000d replay=4 flags=0x00000000 state=mature 
              created: May 14 15:41:27 2009   current: May 14 15:45:39 2009
              diff: 252(s)    hard: 86400(s)  soft: 69120(s)
              last: May 14 15:43:06 2009      hard: 0(s)      soft: 0(s)
              current: 1768(bytes)    hard: 0(bytes)  soft: 0(bytes)
              allocated: 13   hard: 0 soft: 0
              sadb_seq=1 pid=25748 refcnt=2
      x.x.x.49 x.x.x.40 
              esp mode=tunnel spi=124083230(0x07655c1e) reqid=16400(0x00004010)
              E: 3des-cbc  cd212cc9 ca1da6a2 e3190c8d 35d32612 e6a34822 c57f0258
              A: hmac-sha1  c8e57b53 2a12b167 10ecf94f 278ff656 6c92872c
              seq=0x00000005 replay=4 flags=0x00000000 state=mature 
              created: May 14 15:41:27 2009   current: May 14 15:45:39 2009
              diff: 252(s)    hard: 86400(s)  soft: 69120(s)
              last: May 14 15:41:37 2009      hard: 0(s)      soft: 0(s)
              current: 520(bytes)     hard: 0(bytes)  soft: 0(bytes)
              allocated: 5    hard: 0 soft: 0
              sadb_seq=0 pid=25748 refcnt=1
      (Back to the log)
      2009-05-14 16:10:48: INFO: caught signal 2
      2009-05-14 16:10:48: DEBUG: pk_recv: retry[0] recv() 
      2009-05-14 16:10:48: DEBUG: get pfkey FLUSH message
      2009-05-14 16:10:48: DEBUG2: 
      02090000 02000000 00000000 f9600000
      2009-05-14 16:10:48: DEBUG2: flushing all ph2 handlers...
      2009-05-14 16:10:48: DEBUG2: got a ph2 handler to flush...
      2009-05-14 16:10:48: DEBUG2: getph1byaddr: start
      2009-05-14 16:10:48: DEBUG2: local: x.x.x.40[500]
      2009-05-14 16:10:48: DEBUG2: remote: x.x.x.49[500]
      2009-05-14 16:10:48: DEBUG2: no match
      2009-05-14 16:10:48: DEBUG2: No ph1 handler found, could not send DELETE_SA
      2009-05-14 16:10:48: DEBUG: an undead schedule has been deleted.
      2009-05-14 16:10:48: DEBUG: IV freed
      2009-05-14 16:10:49: DEBUG: call pfkey_send_dump
      2009-05-14 16:10:49: DEBUG: pk_recv: retry[0] recv() 
      2009-05-14 16:10:49: INFO: racoon shutdown
      

      Also noteworthy is that a tcpdump of a session in the "failed" state where pfSense believes it is still active does show that the remote side is sending an "invalid SPI" reply any time traffic attempts to traverse the tunnel. It seems like this invalid SPI message is either ignored by, or not seen by, racoon.

      I have heard that there is discussion of a possible fix on the ipsec-tools-devel list, but nothing concrete yet.

      If you have more information or similar experiences to share, feel free to post them.

      Here are some related forum threads (I'm sure there are more):
      http://forum.pfsense.org/index.php/topic,15703.0.html
      http://forum.pfsense.org/index.php/topic,15678.0.html
      http://forum.pfsense.org/index.php/topic,10371.0.html
      http://forum.pfsense.org/index.php/topic,11851.0.html
      http://forum.pfsense.org/index.php/topic,15638.0.html
      http://forum.pfsense.org/index.php/topic,16086.0.html

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 0
      • F
        fastcon68 last edited by

        I have seen something similar with one of my VPN tunnels.  It worked for two months with no problem.  The other end rebooted and so did I and now we can't connect.  I get phase 1 time issues with this one brand of firewall.  I can't remember what the brand is.

        It's a pain since I help then with their XenServer.
        RC

        1 Reply Last reply Reply Quote 0
        • jimp
          jimp Rebel Alliance Developer Netgate last edited by

          @fastcon68:

          I have seen something similar with one of my VPN tunnels.  It worked for two months with no problem.  The other end rebooted and so did I and now we can't connect.  I get phase 1 time issues with this one brand of firewall.  I can't remember what the brand is.

          That may be a different problem. The problem I am seeing clears up after the pfSense side that did not drop is either restarted or racoon is restarted.

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          1 Reply Last reply Reply Quote 0
          • F
            focalguy last edited by

            I'm not sure if the issues I've been having fit with this issue. Mine will come back up eventually but it's hard to tell when. Maybe 2 minutes, maybe 8 hours later. I cannot restart the one which is at the remote location so I have to just wait for it to re-establish connection with my main router. They are both PFSense 1.2.2.

            It's kinda funny that just when I read this thread I went to check the last time that connectivity went down and it was down right at that moment for about 2 minutes to this specific site.

            1 Reply Last reply Reply Quote 0
            • E
              Eugene last edited by

              It seems SAD entries you have (setkey -D) after the tunnel fails are different from those that were used when this tunnel was established (your log), no? I see different spi's.
              I suspect the tunnel goes up again as soon as you setkey -F ?

              http://ru.doc.pfsense.org

              1 Reply Last reply Reply Quote 0
              • jimp
                jimp Rebel Alliance Developer Netgate last edited by

                @Eugene:

                It seems SAD entries you have (setkey -D) after the tunnel fails are different from those that were used when this tunnel was established (your log), no? I see different spi's.
                I suspect the tunnel goes up again as soon as you setkey -F ?

                The SPIs shown in the SAD entries are different than the SPI shown for the ISAKMP-SA which racoon deletes. They're not the same thing, as you can see from the log when it connects:

                May 20 12:53:15 pfsense-123test racoon: INFO: IPsec-SA request for x.x.x.49 queued due to no phase1 found.
                May 20 12:53:15 pfsense-123test racoon: INFO: initiate new phase 1 negotiation: x.x.x.40[500]<=>x.x.x.49[500]
                May 20 12:53:15 pfsense-123test racoon: INFO: begin Aggressive mode.
                May 20 12:53:16 pfsense-123test racoon: INFO: received Vendor ID: CISCO-UNITY
                May 20 12:53:16 pfsense-123test racoon: INFO: received Vendor ID: draft-ietf-ipsra-isakmp-xauth-06.txt
                May 20 12:53:16 pfsense-123test racoon: INFO: received Vendor ID: DPD
                May 20 12:53:16 pfsense-123test racoon: NOTIFY: couldn't find the proper pskey, try to get one by the peer's address.
                May 20 12:53:16 pfsense-123test racoon: INFO: ISAKMP-SA established x.x.x.40[500]-x.x.x.49[500] spi:1a985545ad4bdc71:345805a19d6bbb0e
                May 20 12:53:16 pfsense-123test racoon: INFO: initiate new phase 2 negotiation: x.x.x.40[500]<=>x.x.x.49[500]
                May 20 12:53:16 pfsense-123test racoon: ERROR: unknown Informational exchange received.
                May 20 12:53:16 pfsense-123test racoon: ERROR: unknown Informational exchange received.
                May 20 12:53:16 pfsense-123test racoon: WARNING: ignore RESPONDER-LIFETIME notification.
                May 20 12:53:16 pfsense-123test racoon: INFO: IPsec-SA established: ESP x.x.x.49[0]->x.x.x.40[0] spi=208784171(0xc71cb2b)
                May 20 12:53:16 pfsense-123test racoon: INFO: IPsec-SA established: ESP x.x.x.40[500]->x.x.x.49[500] spi=28110911(0x1acf03f)
                

                Note that the ISAKMP-SA has one set of SPIs, and the IPSec-SA has another, different set, which if I check setkey right now, does match up:

                pfsense-123test# setkey -D
                x.x.x.40 x.x.x.49 
                        esp mode=any spi=28110911(0x01acf03f) reqid=16389(0x00004005)
                        E: 3des-cbc  dc09605c 0317db13 830e984a 533e9c2c edfbf6dc 8695f896
                        A: hmac-sha1  dddfdfef 52667b41 2174ebb4 217ae230 d458a5a7
                        seq=0x00000026 replay=4 flags=0x00000000 state=mature 
                        created: May 20 12:53:16 2009   current: May 20 12:56:21 2009
                        diff: 185(s)    hard: 86400(s)  soft: 69120(s)
                        last: May 20 12:54:51 2009      hard: 0(s)      soft: 0(s)
                        current: 5168(bytes)    hard: 0(bytes)  soft: 0(bytes)
                        allocated: 38   hard: 0 soft: 0
                        sadb_seq=1 pid=14572 refcnt=2
                x.x.x.49 x.x.x.40 
                        esp mode=tunnel spi=208784171(0x0c71cb2b) reqid=16390(0x00004006)
                        E: 3des-cbc  2c130655 2b2d2a16 0333993b dff3264d f7251968 f2c3565b
                        A: hmac-sha1  34236785 b00b9344 19b671bd a5d710f9 ff1222f6
                        seq=0x00000026 replay=4 flags=0x00000000 state=mature 
                        created: May 20 12:53:16 2009   current: May 20 12:56:21 2009
                        diff: 185(s)    hard: 86400(s)  soft: 69120(s)
                        last: May 20 12:54:51 2009      hard: 0(s)      soft: 0(s)
                        current: 3952(bytes)    hard: 0(bytes)  soft: 0(bytes)
                        allocated: 38   hard: 0 soft: 0
                        sadb_seq=0 pid=14572 refcnt=1
                

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • jimp
                  jimp Rebel Alliance Developer Netgate last edited by

                  @Eugene:

                  I suspect the tunnel goes up again as soon as you setkey -F ?

                  That does bring the tunnel back up, but it's still a manual process for something that should be happening automatically… It's easier than restarting things though.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  1 Reply Last reply Reply Quote 0
                  • K
                    kapara last edited by

                    This is the version I am running on my Alix Box.

                    1.2.2
                    built on Thu Jan 8 23:09:11 EST 2009

                    Skype ID:  Marinhd

                    1 Reply Last reply Reply Quote 0
                    • jimp
                      jimp Rebel Alliance Developer Netgate last edited by

                      Great news everybody, it looks like this has been fixed in the development version of ipsec-tools!

                      I just got ipsec-tools 0.8-alpha20090525+natt to work on my 2.0 test box and things are working as they should.

                      Previously, DPD was removing the ISAKMP-SA, but not the IPsec-SA that went along with it. Now it appears to be clearing them all out.

                      Now this is what I'm seeing:

                      The connection establishes:

                      2009-05-28 12:51:17: INFO: respond new phase 1 negotiation: x.x.x.41[500]<=>x.x.x.40[500]
                      2009-05-28 12:51:17: INFO: begin Aggressive mode.
                      2009-05-28 12:51:17: INFO: received broken Microsoft ID: FRAGMENTATION
                      2009-05-28 12:51:17: INFO: received Vendor ID: DPD
                      2009-05-28 12:51:17: NOTIFY: couldn't find the proper pskey, try to get one by the peer's address.
                      2009-05-28 12:51:17: INFO: ISAKMP-SA established x.x.x.41[500]-x.x.x.40[500] spi:d75d671612ae7e75:07456176d8b6652c
                      2009-05-28 12:51:17: INFO: received INITIAL-CONTACT
                      2009-05-28 12:51:18: INFO: respond new phase 2 negotiation: x.x.x.41[500]<=>x.x.x.40[500]
                      2009-05-28 12:51:18: INFO: IPsec-SA established: ESP x.x.x.41[500]->x.x.x.40[500] spi=118325718(0x70d81d6)
                      2009-05-28 12:51:18: INFO: IPsec-SA established: ESP x.x.x.41[500]->x.x.x.40[500] spi=224293038(0xd5e70ae)
                      

                      And then when I unplug the cable:

                      2009-05-28 12:52:22: INFO: DPD: remote (ISAKMP-SA spi=d75d671612ae7e75:07456176d8b6652c) seems to be dead.
                      2009-05-28 12:52:22: INFO: purging ISAKMP-SA spi=d75d671612ae7e75:07456176d8b6652c.
                      2009-05-28 12:52:22: INFO: purged IPsec-SA spi=224293038.
                      2009-05-28 12:52:22: INFO: purged IPsec-SA spi=118325718.
                      2009-05-28 12:52:22: INFO: purged ISAKMP-SA spi=d75d671612ae7e75:07456176d8b6652c.
                      2009-05-28 12:52:23: INFO: ISAKMP-SA deleted x.x.x.41[500]-x.x.x.40[500] spi:d75d671612ae7e75:07456176d8b6652c
                      

                      And at that point, setkey -D shows nothing in the SA database, which is miles ahead of what I saw previously.

                      Since I compiled it on an 8-CURRENT box I can't get that same set of ipsec-tools binaries to run on a 1.2.3-RC system. Once I get that going I can confirm it works on my other test cases. I'll have to hunt down another box to (ab)use for more testing, but this looks very promising.

                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      1 Reply Last reply Reply Quote 0
                      • F
                        focalguy last edited by

                        Great news jimp! Thanks for continuing to work on this.

                        1 Reply Last reply Reply Quote 0
                        • D
                          drees last edited by

                          Awesome!  Hopefully this fix can make it into the 1.2.3 release…

                          1 Reply Last reply Reply Quote 0
                          • jimp
                            jimp Rebel Alliance Developer Netgate last edited by

                            The test version of ipsec-tools should be making its way into the snapshots fairly soon.

                            I'll try to post again once I'm sure it's working in a snapshot so that others can try, too.

                            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            1 Reply Last reply Reply Quote 0
                            • jimp
                              jimp Rebel Alliance Developer Netgate last edited by

                              The new ipsec-tools is in the snapshot for 1.2.3-RC1 on FreeBSD 7.2 that can be found here:

                              http://snapshots.pfsense.org/FreeBSD_RELENG_7_2/pfSense_RELENG_1_2/updates/

                              So far with the testing I have been able to perform it reestablishes dropped tunnels perfectly with DPD.

                              I have tested this Full Update:
                              http://snapshots.pfsense.org/FreeBSD_RELENG_7_2/pfSense_RELENG_1_2/updates/pfSense-Full-Update-1.2.3-20090528-2046.tgz

                              And this ISO:
                              http://snapshots.pfsense.org/FreeBSD_RELENG_7_2/pfSense_RELENG_1_2/livecd_installer/pfSense-1.2.3-20090528-2038.iso.gz

                              And the proper ipsec-tools is in both, and appears to work.

                              It should be working its way into the 1.2.3-RC1 based on FreeBSD 7.1 overnight as well.

                              I would appreciate as much testing as anyone can give this. I know this particular bug is fixed but with any change like this there is the potential to break other things. Please test and report back any issues (Especially people who can test NAT-T). Don't be afraid to report apocalyptic failures or anything else that happens.

                              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                              Need help fast? Netgate Global Support!

                              Do not Chat/PM for help!

                              1 Reply Last reply Reply Quote 0
                              • J
                                Jonb last edited by

                                Whoooooooooo so glad this has been looked at. Serious problems this cause me. Just need the traffic shaper fix now.

                                Hosted desktops and servers with support without complication.
                                www.blueskysystems.co.uk

                                1 Reply Last reply Reply Quote 0
                                • F
                                  fastcon68 last edited by

                                  Jimp,
                                  Some new for you, if the partner vpn starts a ping the tunnel reconnects.  But it if drops from my end it will not estiablish.  Thought I let you know.  I try t get some information on the typ of firewall they are using.
                                  RC

                                  1 Reply Last reply Reply Quote 0
                                  • jimp
                                    jimp Rebel Alliance Developer Netgate last edited by

                                    @fastcon68:

                                    Jimp,
                                    Some new for you, if the partner vpn starts a ping the tunnel reconnects.  But it if drops from my end it will not estiablish.  Thought I let you know.  I try t get some information on the typ of firewall they are using.
                                    RC

                                    At the moment we're in a holding pattern waiting on ipsec-tools 0.8 to get some fixes in, and I believe that's the way things are going to go.

                                    There are too many of these issues to fix in ipsec-tools 0.7.2 with patches, unfortunately.

                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    1 Reply Last reply Reply Quote 0
                                    • F
                                      fastcon68 last edited by

                                      That's cool.  I not updating at the momment.  I more concerned with a stable enviroment.  I can't afford any more down time.

                                      I am in the process of moving and have a ton of stuff going on.

                                      RC

                                      1 Reply Last reply Reply Quote 0
                                      • R
                                        Rockets last edited by

                                        @jimp:

                                        At the moment we're in a holding pattern waiting on ipsec-tools 0.8 to get some fixes in, and I believe that's the way things are going to go.

                                        There are too many of these issues to fix in ipsec-tools 0.7.2 with patches, unfortunately.

                                        Now ipsec-tools 0.7.3 is out any thoughts on using it?

                                        1 Reply Last reply Reply Quote 0
                                        • jimp
                                          jimp Rebel Alliance Developer Netgate last edited by

                                          It was even worse off than 0.8 in some respects. We had to stay at 0.7.2 but drop NAT-T to get some semblance of stability.

                                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                          Need help fast? Netgate Global Support!

                                          Do not Chat/PM for help!

                                          1 Reply Last reply Reply Quote 0
                                          • R
                                            Rockets last edited by

                                            Is IPsec renegotiating properly now in 1.2.3-RC3?

                                            1 Reply Last reply Reply Quote 0
                                            • jimp
                                              jimp Rebel Alliance Developer Netgate last edited by

                                              Yes, it is working now as far as all my tests have shown both in actual tests and in running it at home and having some Internet stability issues. Seems to work fine as far as I can tell.

                                              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                              Need help fast? Netgate Global Support!

                                              Do not Chat/PM for help!

                                              1 Reply Last reply Reply Quote 0
                                              • R
                                                Rockets last edited by

                                                Jimp wher'd I'd find RC3? It's not on the offical mirrors - only RC1. Or is RC3 a current snapshot? I'm using embedded.

                                                1 Reply Last reply Reply Quote 0
                                                • jimp
                                                  jimp Rebel Alliance Developer Netgate last edited by

                                                  @Rockets:

                                                  Jimp wher'd I'd find RC3? It's not on the offical mirrors - only RC1. Or is RC3 a current snapshot? I'm using embedded.

                                                  It's only in snapshots at the moment, but there will probably be an "official" cut of RC3 (or perhaps RC4?) before release.

                                                  Here are the NanoBSD (new embedded system) snapshots:
                                                  http://snapshots.pfsense.org/FreeBSD_RELENG_7_2/pfSense_RELENG_1_2/nanobsd/?C=M;O=D

                                                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                                  Need help fast? Netgate Global Support!

                                                  Do not Chat/PM for help!

                                                  1 Reply Last reply Reply Quote 0
                                                  • B
                                                    bkm last edited by

                                                    I am seeing an issue that seems to be the same. I am testing RC 1.2.3 20090924. I have tunnels set up to two separate sites to a Netopia router on the other end. The tunnels are working when I leave work in the evening. When I get to work in the morning, they are not working. The IPSec status page (SAD) shows that the tunnels are up. If I restart raccoon, the tunnel status goes down. I then ping a site and everything gets renegotiated and it works again.
                                                    I currently have a 28800 lifetime for phase 1 and 86400 for phase 2.
                                                    I am willing to test a couple things for a day or two if someone has a suggestion. After that I will need to put my pfsense box into production without the tunnels and I will be limited in what I can try.

                                                    1 Reply Last reply Reply Quote 0
                                                    • jimp
                                                      jimp Rebel Alliance Developer Netgate last edited by

                                                      @bkm:

                                                      I am seeing an issue that seems to be the same. I am testing RC 1.2.3 20090924. I have tunnels set up to two separate sites to a Netopia router on the other end. The tunnels are working when I leave work in the evening. When I get to work in the morning, they are not working. The IPSec status page (SAD) shows that the tunnels are up. If I restart raccoon, the tunnel status goes down. I then ping a site and everything gets renegotiated and it works again.
                                                      I currently have a 28800 lifetime for phase 1 and 86400 for phase 2.
                                                      I am willing to test a couple things for a day or two if someone has a suggestion. After that I will need to put my pfsense box into production without the tunnels and I will be limited in what I can try.

                                                      It might help to know more about these tunnels, at least this much: Are they static tunnels or mobile clients? Are they using main mode or aggressive mode? Do you have DPD enabled? Keep Alive? What shows up in the logs when the tunnels are broken?

                                                      And anything else you can think of.

                                                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                                      Need help fast? Netgate Global Support!

                                                      Do not Chat/PM for help!

                                                      1 Reply Last reply Reply Quote 0
                                                      • F
                                                        fairchild last edited by

                                                        im using the latest 2.0 snapshot, do you recommend leaving DPD enabled? I dont have access to the logs right now so i cant post them but it appears that when a tunnel goes down because of the internet connection on my end or the other end i have to restart the racoon service on both ends for the tunnel to reestablish, this is between 2 pfsense boxes… i dont even want to get started on my linksys vpn tunnel issues

                                                        1 Reply Last reply Reply Quote 0
                                                        • C
                                                          cmb last edited by

                                                          @fairchild:

                                                          im using the latest 2.0 snapshot

                                                          Don't. That's not going to be stable. Pretty sure the 7.2/2.0 builds still use NAT-T which has renegotiation issues, and the 8 snapshots likely don't have a proper ipsec-tools either.

                                                          1 Reply Last reply Reply Quote 0
                                                          • F
                                                            fairchild last edited by

                                                            @cmb:

                                                            @fairchild:

                                                            im using the latest 2.0 snapshot

                                                            Don't. That's not going to be stable. Pretty sure the 7.2/2.0 builds still use NAT-T which has renegotiation issues, and the 8 snapshots likely don't have a proper ipsec-tools either.

                                                            damn… well now that i think of it i really only switched to check out the new gui and multiple dyndns accounts for each of my wans, so ipsec is much more stable and up-to-date in the RC versions?, i would have figured newer code with more features was included in 2.0 for vpns specifically ipsec

                                                            1 Reply Last reply Reply Quote 0
                                                            • C
                                                              cmb last edited by

                                                              @fairchild:

                                                              so ipsec is much more stable and up-to-date in the RC versions?,

                                                              yes

                                                              1 Reply Last reply Reply Quote 0
                                                              • B
                                                                bkm last edited by

                                                                Ok, I enabled DPD (60 sec) and I believe that this fixed part of the problem. A couple of my tunnels stayed up overnight or at least reconnected. One of the tunnels wouldn't reconnect though until I deleted the numerous SADs and restarted raccoon. (Actually a second tunnel stopped after restarting raccoon and I again had to delete the SAD but not restart raccoon) It appears that some of the SADs were not getting dropped properly. The IPSec status showed that the tunnels were up. It may have coincided with the lifetime setting. I think that I may change my lifetimes to a shorter time frame so that I can try to duplicate the behavior. Below are some of the error messages.

                                                                racoon: ERROR: fatal INVALID-SPI notify messsage, phase1 should be deleted.
                                                                Sep 29 12:23:18 racoon: ERROR: fatal INVALID-SPI notify messsage, phase1 should be deleted.
                                                                Sep 29 12:22:48 racoon: ERROR: fatal INVALID-SPI notify messsage, phase1 should be deleted.
                                                                Sep 29 12:22:41 racoon: [VPN Name]: INFO: IPsec-SA established: ESP MyWanIPxx.xx[0]->RemoteIPxx.xx[0] spi=2670541922(0x9f2d3c62)
                                                                Sep 29 12:22:41 racoon: [VPN Name]: INFO: IPsec-SA established: ESP RemoteIPxx.xx[0]->MyWanIPxx.xx[0] spi=37322644(0x2397f94)
                                                                Sep 29 12:22:40 racoon: [VPN Name]: INFO: respond new phase 2 negotiation: MyWanIPxx.xx[0]<=>RemoteIPxx.xx[0]

                                                                Background: The tunnels are static tunnels to Netopia routers. One tunnel to each site or router.
                                                                Aggressive mode is used. For the Keep Alive I am using the remote Lan address of the Netopia router.

                                                                After I duplicate the problem, I will provide more log info. Most was overwritten before I thought of copying it.

                                                                1 Reply Last reply Reply Quote 0
                                                                • jimp
                                                                  jimp Rebel Alliance Developer Netgate last edited by

                                                                  @fairchild:

                                                                  I would have figured newer code with more features was included in 2.0 for vpns specifically ipsec

                                                                  Not in this case. Even so, newer code and more features don't usually translate to more stability, especially in an alpha release :)

                                                                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                                                  Need help fast? Netgate Global Support!

                                                                  Do not Chat/PM for help!

                                                                  1 Reply Last reply Reply Quote 0
                                                                  • jimp
                                                                    jimp Rebel Alliance Developer Netgate last edited by

                                                                    @bkm:

                                                                    Ok, I enabled DPD (60 sec) and I believe that this fixed part of the problem. A couple of my tunnels stayed up overnight or at least reconnected. One of the tunnels wouldn't reconnect though until I deleted the numerous SADs and restarted raccoon. (Actually a second tunnel stopped after restarting raccoon and I again had to delete the SAD but not restart raccoon) It appears that some of the SADs were not getting dropped properly. The IPSec status showed that the tunnels were up. It may have coincided with the lifetime setting. I think that I may change my lifetimes to a shorter time frame so that I can try to duplicate the behavior. Below are some of the error messages.

                                                                    racoon: ERROR: fatal INVALID-SPI notify messsage, phase1 should be deleted.
                                                                    Sep 29 12:23:18 racoon: ERROR: fatal INVALID-SPI notify messsage, phase1 should be deleted.
                                                                    Sep 29 12:22:48 racoon: ERROR: fatal INVALID-SPI notify messsage, phase1 should be deleted.
                                                                    Sep 29 12:22:41 racoon: [VPN Name]: INFO: IPsec-SA established: ESP MyWanIPxx.xx[0]->RemoteIPxx.xx[0] spi=2670541922(0x9f2d3c62)
                                                                    Sep 29 12:22:41 racoon: [VPN Name]: INFO: IPsec-SA established: ESP RemoteIPxx.xx[0]->MyWanIPxx.xx[0] spi=37322644(0x2397f94)
                                                                    Sep 29 12:22:40 racoon: [VPN Name]: INFO: respond new phase 2 negotiation: MyWanIPxx.xx[0]<=>RemoteIPxx.xx[0]

                                                                    Background: The tunnels are static tunnels to Netopia routers. One tunnel to each site or router.
                                                                    Aggressive mode is used. For the Keep Alive I am using the remote Lan address of the Netopia router.

                                                                    After I duplicate the problem, I will provide more log info. Most was overwritten before I thought of copying it.

                                                                    The full log is in /var/log/ipsec.log, and you can view it by executing the command: clog /var/log/ipsec.log

                                                                    It may have just been gone from the GUI, which only shows a limited number of lines. Also, it would help to show the logs in normal order, not reverse order. If you have the reverse order box checked on Status > System Logs, Settings tab, uncheck it and save, then copy/paste the logs. There was an old bug that caused the IPsec logs to ignore this setting, but it was fixed before the snapshot you said you were running. You may also want to update to the most recent snapshot to be sure you really have the most current updates.

                                                                    One more thing: When testing these settings, be sure to stop and restart racoon after making your changes, to be on the safe side and to be sure the SAD and SPD are clear.

                                                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                                                    Need help fast? Netgate Global Support!

                                                                    Do not Chat/PM for help!

                                                                    1 Reply Last reply Reply Quote 0
                                                                    • F
                                                                      fairchild last edited by

                                                                      @jimp:

                                                                      One more thing: When testing these settings, be sure to stop and restart racoon after making your changes, to be on the safe side and to be sure the SAD and SPD are clear.

                                                                      sorry if this sounds ignorant but do you mean delete all the SPD and SAD entries?

                                                                      1 Reply Last reply Reply Quote 0
                                                                      • jimp
                                                                        jimp Rebel Alliance Developer Netgate last edited by

                                                                        @fairchild:

                                                                        @jimp:

                                                                        One more thing: When testing these settings, be sure to stop and restart racoon after making your changes, to be on the safe side and to be sure the SAD and SPD are clear.

                                                                        sorry if this sounds ignorant but do you mean delete all the SPD and SAD entries?

                                                                        Stopping and restarting racoon should do this. If you want to do it by hand, run:
                                                                        setkey -F
                                                                        setkey -F -P

                                                                        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                                                        Need help fast? Netgate Global Support!

                                                                        Do not Chat/PM for help!

                                                                        1 Reply Last reply Reply Quote 0
                                                                        • B
                                                                          bkm last edited by

                                                                          Thanks for the advice. I knew the logs had to be somewhere. I will also restart raccoon from now on after changes are made.

                                                                          "sorry if this sounds ignorant but do you mean delete all the SPD and SAD entries?"

                                                                          In the GUI, I would go to Status-IPSec, then the SAD tab and delete the entries that corresponded to the tunnel that I was having problems with. Normally, I see two entries for each tunnel, MyWanIP to RemoteWanIP and RemoteWanIP to MyWanIP. When I was having problems, this area would get filled with entries where it was trying to connect. There is an X beside each one where they can be deleted. Restarting raccoon also deletes them though. (I was experimenting a little when deleting manually) I did not change anything on the SPD tab.

                                                                          I did change the lifetime setting to 1800 and 3600, on two of the tunnels so that I could see if they came back up after an expiration without waiting until tomorrow. They did come up this time. If it is down in the morning, I will post a few log entries.

                                                                          1 Reply Last reply Reply Quote 0
                                                                          • B
                                                                            bkm last edited by

                                                                            One of the tunnels went down again last night. Attached is my IPSec log. It was working at 17:00 on 9/29/09 and was not working at the end of the log on 9/30/09. The log was modified to remove the IP addresses. My Wan IP address is listed as MyWanIP. The remote Wan IP of the tunnel that went down is listed as RemoteWanIPTunnel1. Any advice welcome. Thanks.

                                                                            tunnel1down.txt

                                                                            1 Reply Last reply Reply Quote 0
                                                                            • B
                                                                              bkm last edited by

                                                                              Well, the tunnel came back up without any action from me other than I did try to ping the site twice. The first time it timed out. An hour later I tried and got an immediate reply. For the Keep Alive address, I am using the remote LAN IP. Should I be using the remote Wan IP instead?

                                                                              1 Reply Last reply Reply Quote 0
                                                                              • B
                                                                                bkm last edited by

                                                                                I ran a ping test on my tunnels last night to see how long and how often they were down. The test ran for about 15 hours. Tunnel 1 went down 6 times averaging 30-35 minutes each time. Tunnel 2 went down 3 times also averaging 30-35 minutes each time. Tunnel 3 stayed up the whole time. I didn't count the times where the tunnel was only down for about a minute, however this would also be a problem for our VPNs. I cannot verify that times under a couple minutes were not due to a line problem since I alternated my pings between sites. The range for the downtime was between 25 and 45 minutes. Most occurrences appeared to be at 30-35 minutes.

                                                                                My lifetime settings for tunnel 1 and 2 are at 1800 (30 min) for phase 1 and 3600 (60 min) for phase 2. My lifetimes for tunnel 3 are 28800 and 86400. Tunnel 3's phase 2 lifetime did not expire during the test.

                                                                                Attached is a partial IPSec log. Tunnel 1 went down sometime between 5:56 and 6:01. It came back up between 6:41 and 6:47 in case someone wants to look. The clocks between the two machines could be slightly off.

                                                                                I will try to upgrade again in the next day or so and test again, but I am not very hopeful at this point.
                                                                                Any suggestions are welcome.
                                                                                Thanks

                                                                                IPSec100109.txt

                                                                                1 Reply Last reply Reply Quote 0
                                                                                • A
                                                                                  althornin last edited by

                                                                                  I have noticed similar behavior with a pfsense<–>Linksys VPN.
                                                                                  Basically, at some unspecified time, the tunnel appears to die, and stays down for the time the phase 2 lifetime (for me).
                                                                                  I've seen the same behavoir without pfsense in the mix though, using just the Linksys VPN routers (which I am replacing with pfsense boxes, slowly).

                                                                                  1 Reply Last reply Reply Quote 0
                                                                                  • B
                                                                                    bkm last edited by

                                                                                    I'm making some progress on my problem, but I am not going to trust it for production yet. I did have a tunnel stay up all night. I am posting this in case it helps someone else. I don't profess to completely understand all of the intricacies of IPSec, but I do have a basic understanding of what happens.
                                                                                    One of the problems with IPSec is different understandings and implementations of the standard. Because the standard is not clear on some issues such as re-keying, different vendors choose different methods.

                                                                                    One of my problems was related to "dangling phase 2 SAs". The option on the other end of my tunnels were set to Not allow dangling phase 2 SA's. There is no GUI option for this in pfsense nor is there any option related to either using the newest SAs immediately or to wait until they expire. Either of these can cause a tunnel to go down after the timeout expires.

                                                                                    Pfsense is not wrong in its method, it just has not been made clear in the documentation which method works for it.

                                                                                    I changed one of my tunnels to "Allow Dangling Phase 2 SAs" and it has made a huge difference. This will not help those with a pfsense to pfsense tunnel, but for those who have this option on one end of their tunnel, they should experiment with it.

                                                                                    If either of these options are available in a config file somewhere, I would be interested in knowing.

                                                                                    Thanks

                                                                                    1 Reply Last reply Reply Quote 0
                                                                                    • First post
                                                                                      Last post