Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    IPsec fails to renegotiate after loss of a peer

    Scheduled Pinned Locked Moved IPsec
    71 Posts 15 Posters 66.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B Offline
      bkm
      last edited by

      I'm making some progress on my problem, but I am not going to trust it for production yet. I did have a tunnel stay up all night. I am posting this in case it helps someone else. I don't profess to completely understand all of the intricacies of IPSec, but I do have a basic understanding of what happens.
      One of the problems with IPSec is different understandings and implementations of the standard. Because the standard is not clear on some issues such as re-keying, different vendors choose different methods.

      One of my problems was related to "dangling phase 2 SAs". The option on the other end of my tunnels were set to Not allow dangling phase 2 SA's. There is no GUI option for this in pfsense nor is there any option related to either using the newest SAs immediately or to wait until they expire. Either of these can cause a tunnel to go down after the timeout expires.

      Pfsense is not wrong in its method, it just has not been made clear in the documentation which method works for it.

      I changed one of my tunnels to "Allow Dangling Phase 2 SAs" and it has made a huge difference. This will not help those with a pfsense to pfsense tunnel, but for those who have this option on one end of their tunnel, they should experiment with it.

      If either of these options are available in a config file somewhere, I would be interested in knowing.

      Thanks

      1 Reply Last reply Reply Quote 0
      • B Offline
        bkm
        last edited by

        Actually, there is an option for using the newest or older SAs. I haven't found anything related to dangling phase 2 SAs though.

        1 Reply Last reply Reply Quote 0
        • F Offline
          focalguy
          last edited by

          Has there been any progress with this issue is 1.2.3RC3? I am setting up a new router and heard on the mailing list that not much will change between RC3 and the final release. I am currently using 1.2.2 and everything works great on most of my installations. The major feature I want from 1.2.3 is the ability to make changes to one IPSEC tunnel and not restart each of the 30 tunnels running.

          1 Reply Last reply Reply Quote 0
          • jimpJ Offline
            jimp Rebel Alliance Developer Netgate
            last edited by

            @focalguy:

            Has there been any progress with this issue is 1.2.3RC3? I am setting up a new router and heard on the mailing list that not much will change between RC3 and the final release. I am currently using 1.2.2 and everything works great on most of my installations. The major feature I want from 1.2.3 is the ability to make changes to one IPSEC tunnel and not restart each of the 30 tunnels running.

            It appears to be resolved now. I've not had any trouble with my usually-volatile tunnels that used to give me no end of grief making me restart racoon all the time. It's been working like a dream with RC3.

            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            1 Reply Last reply Reply Quote 0
            • C Offline
              cmb
              last edited by

              @focalguy:

              Has there been any progress with this issue is 1.2.3RC3?

              It's been resolved since NAT-T was removed over a month ago.

              1 Reply Last reply Reply Quote 0
              • F Offline
                focalguy
                last edited by

                Thanks to both of you for the information. I already put RC3 on the new router but I was starting to second guess myself because of some other issues that may be unrelated. Hearing that these problems are at the very least not nearly as common gives me the confidence to continue with the project.

                1 Reply Last reply Reply Quote 0
                • N Offline
                  netmethods
                  last edited by

                  I'm not sure if I would consider this resolved… I'm running the current 1.2.3-RC3 build and having plenty of IPsec issues that fit this bill. So far, enabling DPD (30 Sec) and 'Prefer old IPsec SA's' seem to have resolved it. It's been ~20 hours and no tunnels have gone down, but I'll give it another day before I feel like it is stable. In any case, this is more of a work around than a resolution.

                  2x Nexcom 1088n8 in HA config
                  2.4 GHz Quad Core / 4GB DDR2 / SATAII 160GB / 4x1GB Intel module

                  1 Reply Last reply Reply Quote 0
                  • C Offline
                    cmb
                    last edited by

                    Depending on your scenario, you may need DPD. It's not a work around, it's a desirable thing to have enabled most of the time. It would be sensible to always use it where the other end supports it.

                    There are no known issues, all our major deployments are running RC3 with no problems.

                    1 Reply Last reply Reply Quote 0
                    • N Offline
                      netmethods
                      last edited by

                      Well, I've been running my ipsec tunnels on RC1 for several months in a production environment without any issues until I upgraded to RC3, so I'd say that's an issue. No? The configuration worked fine without DPD, until the upgrade. If you're interested, here is my post on my issue: http://forum.pfsense.org/index.php/topic,20043.0.html

                      I'd love to have some additional feedback on my issue.

                      2x Nexcom 1088n8 in HA config
                      2.4 GHz Quad Core / 4GB DDR2 / SATAII 160GB / 4x1GB Intel module

                      1 Reply Last reply Reply Quote 0
                      • F Offline
                        focalguy
                        last edited by

                        I'm actually wondering how bkm is faring now because I am having the exact same issue with one location that is still using a Linksys router

                        In the GUI, I would go to Status-IPSec, then the SAD tab and delete the entries that corresponded to the tunnel that I was having problems with. Normally, I see two entries for each tunnel, MyWanIP to RemoteWanIP and RemoteWanIP to MyWanIP. When I was having problems, this area would get filled with entries where it was trying to connect. There is an X beside each one where they can be deleted. Restarting raccoon also deletes them though. (I was experimenting a little when deleting manually) I did not change anything on the SPD tab.

                        Manually deleting these SAD items brings the site right back up… but only for the lifetime of phase2 which is 3600 at this time. I still have another router with 1.2.2 connecting to the same Linksys BEFVP41 routers and they do not have this problem.

                        1 Reply Last reply Reply Quote 0
                        • B Offline
                          bkm
                          last edited by

                          Well, I'm still experimenting and trying to find the best settings for my situation. I currently have nine tunnels in production. I have slightly different settings on a few a them to try to find out what works best on my equipment. The tunnels had been up for 4 1/2 days (some a little longer) but they all went down today for some reason. I had one other occurrence of this happening after they were up for a few days but I thought it was a fluke.
                          My tunnels have been renegotiating after the lifetime is up most of the time. It seems like pfSense has some trouble starting a larger number of tunnels after they all go down or if you are restarting racoon. (maybe I am not waiting long enough) I usually have to start disabling some of the tunnels before any will start working. Once they start, I can enable the others and they seem to work fine.
                          Since my tunnels are staying up longer, it takes a longer period of time to test each change. Many of my tweaks are being done on the far router side (non-pfSense), so, they will probably not be very helpful to most users. I think the main thing that users should know is that if you are using a non-pfSense box on one side of a tunnel, do not assume that everyone else's settings are the ones that you should use. They are a good staring point and if they work for you then that's great. If not, you will need to test various settings.
                          One setting that I believe has helped me was disabling PFS. This does reduce security though. If I can get my tunnels to stay up for a few weeks, I may try to enable it again. I have also set the lifetimes on each of my tunnels to a slightly different time so that they do not all expire at the same time. I don't really know if that helps.

                          Does anyone know when tunnels are actually renegotiated? For instance, if the lifetime is set for 28800 seconds, does a renegotiation start at half that time or a certain number of seconds before expiration?

                          If I get everything to work completely stable, I will post my settings. I am currently using the released version of RC3. I have not updated to any of the snapshots since RC3 was released.

                          1 Reply Last reply Reply Quote 0
                          • B Offline
                            bkm
                            last edited by

                            I would like to see others post a message if they are having IPSec problems after upgrading to RC3. I am not as interested in seeing posts from those who have never set up a tunnel before, but from people like netmethods who had a stable system before RC3.
                            The developers are not going to fix anything if they believe that everything has been resolved. This is not meant as a criticism to any developer. I think all of you are great for putting your time into this project. I would just like to know if my problems are unique because of the equipment I am using or if problems are still widespread.
                            Thanks

                            1 Reply Last reply Reply Quote 0
                            • N Offline
                              netmethods
                              last edited by

                              I posted an update on the thread I started, if anyone is interested. It might help some people that are still having issues.
                              http://forum.pfsense.org/index.php/topic,20043.0.html

                              bkm, what error messages are you getting? I think you also posted somewhere about the tunnels taking a long time to come back up? I've noticed that it is taking much longer now to bring them up as well. It used to be pretty quick, but now it takes several minutes (5-10) to bring 6 tunnels up if I restart racoon.

                              2x Nexcom 1088n8 in HA config
                              2.4 GHz Quad Core / 4GB DDR2 / SATAII 160GB / 4x1GB Intel module

                              1 Reply Last reply Reply Quote 0
                              • F Offline
                                focalguy
                                last edited by

                                I'm interested in what other devices everyone is using at the other end of the tunnels. I see in your other thread netmethods, you say

                                All ipsec tunnels are to sonicwalls with standard and enhanced os and one watchguard

                                I currently have a pfSense 1.2.2 box connecting to about 28 remote locations which are all using a mixture of pfSense 1.2.2 (1 - 1.2.3-PRE-testing but it's working so I don't want to upgrade just yet) and Linksys BEFVP41 and Linksys BEFSX41. These are stable for the most part, meaning they rarely go down and if they do it is usually something other than the equipment that causes it.

                                I am setting up a new box now because we are changing over to a new ISP and doing it slowly starting with our site-to-site VPNs. I built the box using 1.2.3-RC3 and moved 3 of the remote locations to it. 2 of these are pfSense (1.2.2 and the 1.2.3-PRE-testing) and one is a BEFVP41. The BEFVP41 has been going down every hour when the phase 2 expires. I could increase the lifetime but I want to know sooner when my changes don't work so I can try something else. I'm going to try the "prefer old SAs" later tonight now that the office is closed and see what happens.

                                1 Reply Last reply Reply Quote 0
                                • jimpJ Offline
                                  jimp Rebel Alliance Developer Netgate
                                  last edited by

                                  I just updated one of my routers to 1.2.3-RC3 which has several tunnels to different places, a pfSense 1.2.3-RC3 box, two pfSense 1.2.2 boxes, and a Linksys BEFSX41.

                                  All my tunnels came up immediately when it booted, but I'll try to keep track of how well it does overnight/tomorrow.

                                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                  Need help fast? Netgate Global Support!

                                  Do not Chat/PM for help!

                                  1 Reply Last reply Reply Quote 0
                                  • C Offline
                                    cmb
                                    last edited by

                                    Those who are having problems with renegotiation, try going to Diagnostics -> Command and running:

                                    fetch -o /etc/inc/vpn.inc http://cvs.pfsense.org/~cmb/rekeyforcevpn.inc

                                    Then restart the racoon service, or reboot, and see what happens.

                                    There is a known issue in the latest stable racoon in some circumstances that this may fix. This sets rekey to force.

                                    rekey - Enable automatic renegotiation of expired phase1 when
                                                        there are non-dying phase2 SAs.  Possible values are:
                                                        force  Rekeying is done unconditionally.
                                                        on      Rekeying is done only if DPD monitoring is
                                                                active.  This is the default.
                                                        off    No automatic rekeying.  Do note that turning off
                                                                automatic rekeying will result in inaccurate DPD
                                                                monitoring.

                                    1 Reply Last reply Reply Quote 0
                                    • F Offline
                                      focalguy
                                      last edited by

                                      Thanks cmb. I don't think I'll be able to test that just now because I seem to have found a setting that worked. I enabled "Prefer old IPSEC SAs" option and the tunnel that was going down every hour (phase 2 lifetime) has been up for about 12 hours now. I wasn't sure if that option would affect the two other tunnels I have on the box that are connected to pfSense boxes in a negative way but they seem to still be working fine with this option as well. I'm going to move another Linksys BEFVP41 router to this box and see if that setting continues to make the tunnel work for this other one as well.

                                      1 Reply Last reply Reply Quote 0
                                      • B Offline
                                        bkm
                                        last edited by

                                        I cannot say for certain that I am still having issues with renegotiation. My tunnels had stayed up for 4-5 days. I was out of the office yesterday when they went down. The IPSec log did not show anything at all when this happened until I restarted racoon. This latest problem does not look like a renegotiation issue. I am guessing that everything was fine until after 12:51. At 14:17 I restarted racoon.

                                        Oct 27 12:51:03 pffw racoon: INFO: IPsec-SA established: ESP tunnelIP[0]->WANIP[0] spi=9536865(0x918561)
                                        Oct 27 12:51:03 pffw racoon: INFO: IPsec-SA established: ESP WANIP[0]->tunnelIP[0] spi=3936270732(0xea9eb98c)
                                        Oct 27 14:17:17 pffw racoon: INFO: @(#)ipsec-tools 0.7.2 (http://ipsec-tools.sourceforge.net)
                                        Oct 27 14:17:17 pffw racoon: INFO: @(#)This product linked OpenSSL 0.9.8e 23 Feb 2007 (http://www.openssl.org/)
                                        Oct 27 14:17:17 pffw racoon: INFO: Reading configuration from "/var/etc/racoon.conf"
                                        Oct 27 14:17:17 pffw racoon: INFO: 127.0.0.1[500] used as isakmp port (fd=14)
                                        Oct 27 14:17:17 pffw racoon: INFO: LanIP[500] used as isakmp port (fd=15)
                                        Oct 27 14:17:17 pffw racoon: INFO: OPT1IP[500] used as isakmp port (fd=16)
                                        Oct 27 14:17:17 pffw racoon: INFO: WanIP[500] used as isakmp port (fd=17)
                                        Oct 27 14:17:17 pffw racoon: INFO: unsupported PF_KEY message REGISTER
                                        Oct 27 14:17:17 pffw racoon: ERROR: such policy already exists. anyway replace it: …

                                        Below are some error messages that have shown up in the logs. During the time that these messages occurred, I was not notified that there were any problems and I did nothing to resolve them. A tunnel may or may not have went down. These are just random entries and not specific to any tunnel. I put a "Reg" in front of errors that seem to appear regularly.

                                        Reg -Oct 23 16:00:35 pffw racoon: ERROR: fatal INVALID-SPI notify messsage, phase1 should be deleted.
                                        Reg -Oct 23 16:00:35 pffw last message repeated 2 times

                                        Reg -Oct 26 01:37:08 pffw racoon: WARNING: ignore RESPONDER-LIFETIME notification.

                                        Reg -Oct 23 16:06:51 pffw racoon: ERROR: none message must be encrypted
                                        Reg -Oct 23 16:07:11 pffw last message repeated 2 times
                                        Reg -Oct 23 16:07:13 pffw racoon: INFO: unsupported PF_KEY message REGISTER
                                        Reg -Oct 23 16:07:13 pffw racoon: ERROR: no iph2 found: ESP tunnelIP[0]->WANIP[0] spi=250476046(0xeedf60e)
                                        Reg -Oct 23 16:07:13 pffw racoon: ERROR: no iph2 found: ESP tunnelIP[0]->WANIP[0] spi=257766112(0xf5d32e0)
                                        (None since last change) -Oct 23 16:07:13 pffw racoon: ERROR: pfkey DELETE received: ESP WANIP[0]->tunnelIP[0] spi=53406546(0x32eeb52)

                                        (None since last change) -Oct 23 16:08:02 pffw racoon: ERROR: tunnelIP give up to get IPsec-SA due to time up to wait.

                                        Reg -Oct 23 16:08:19 pffw racoon: NOTIFY: couldn't find the proper pskey, try to get one by the peer's address.

                                        Oct 24 11:13:01 pffw racoon: ERROR: no policy found for spid:59.
                                        Oct 24 11:13:01 pffw racoon: ERROR: failed to get ID.
                                        Oct 24 11:13:01 pffw racoon: ERROR: failed to start post getspi.

                                        Reg -Oct 26 03:15:00 pffw racoon: ERROR: unknown Informational exchange received.

                                        Reg -Oct 26 04:53:39 pffw racoon: ERROR: couldn't find configuration.

                                        –------------
                                        cmb - what is the command to change the rekey back to the default if I later change it to force as below:

                                        fetch -o /etc/inc/vpn.inc http://cvs.pfsense.org/~cmb/rekeyforcevpn.inc

                                        I think you also posted somewhere about the tunnels taking a long time to come back up?

                                        I will try to restart racoon some evening when I have time to wait for everything to come back up on its own. When everything goes down, I get a little impatient.

                                        I'm interested in what other devices everyone is using at the other end of the tunnels.

                                        I am currently using a bunch of Netopia routers that I inherited at this job. Tunnels between these devices worked fine. I switched to pfSense for the multi-wan feature and the ability to have more than 15 tunnels (a Netopia limitation).

                                        Thanks to everyone for their input.

                                        1 Reply Last reply Reply Quote 0
                                        • C Offline
                                          cmb
                                          last edited by

                                          @bkm:

                                          cmb - what is the command to change the rekey back to the default if I later change it to force as below:

                                          Edit /etc/inc/vpn.inc and remove the line:

                                          rekey force;

                                          1 Reply Last reply Reply Quote 0
                                          • B Offline
                                            bkm
                                            last edited by

                                            Thanks cmb.
                                            The rekey option looks similar to an option on my Netopias that I am testing on a few tunnels. I haven't been able to tell yet if it has helped. I may try the rekey option in pfSense in a couple weeks if I am still having issues.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.