Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    My IPSEC service hangs

    Scheduled Pinned Locked Moved IPsec
    76 Posts 15 Posters 27.1k Views 17 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      auroramus
      last edited by

      I do not have a support package in place.

      G 1 Reply Last reply Reply Quote 0
      • G Offline
        glreed735 @auroramus
        last edited by

        @auroramus - Not yet, the first pass through the logs highlighted some issues, but they wanted a larger sample of data to work from pending the next failure.

        1 Reply Last reply Reply Quote 0
        • A Offline
          auroramus
          last edited by

          I am no coding expert but it seems like once logs reach maxiumum capacity rather than overwriting the logs it crashes the ipsec service.

          Thats what it looks like to me.

          No matter what setting i change it to wether it is a low log count or high it maxes then kills service and unless you restart it will not work.

          1 Reply Last reply Reply Quote 0
          • A Offline
            auroramus
            last edited by

            once i clear the logs i manage to go past the screen above i mentioned of collecting ipsec status info and see my connection but when you hit connect it attemps and stops doesnt do anything only way to get them connected back is restart

            1 Reply Last reply Reply Quote 0
            • A Offline
              auroramus
              last edited by

              i also found this post;

              This might be entirely normal behaviour; IPSec and many other forms of VPN tunnels connect only when there is traffic to transmit.
              Take for example you have an 8 hour lifetime on the IKE (Phase 1) tunnel. The tunnel will connect upon some traffic being transmitted down the tunnel and will always terminate as soon as 8 hours has passed since it came up. Only if packets are still trying to be sent down the tunnel will the tunnel come back up again and continue transmitting traffic for another 8 hours. The down and up happens very quickly and packets may not even be lost. This is for security reasons to refresh the security associations.
              Some people choose to run a ping or similar constantly down the tunnels so it always looks to be connected except for the brief milliseconds to reassociate. I find this to be generally unnecessary.

              abliznoA G 2 Replies Last reply Reply Quote 0
              • abliznoA Offline
                ablizno @auroramus
                last edited by

                @auroramus were you ever able to run netstat -Lan and provide the output when all your tunnels are down?

                1 Reply Last reply Reply Quote 0
                • G Offline
                  gassyantelope @auroramus
                  last edited by

                  @auroramus The behavior occurring is definitely not normal. I understand what that post is saying and completely agree that is normal IPsec behavior. The issue here is completely different though. The tunnels will never come back up once they all go down. I can ping, send data another way, etc., and they won't ever come back up until a restart is performed.

                  I've had multiple cases where I had active connections over the tunnel (sending data the whole time) and then the issue occurs and all tunnels go down. This has occurred way before the default 8 hour life span (sometimes within an hour or two).

                  1 Reply Last reply Reply Quote 0
                  • A Offline
                    auroramus
                    last edited by

                    @gassyantelope Yes 100% the behaviour is wrong.

                    as it seems to crash the service. and this shouldnt happen.

                    1 Reply Last reply Reply Quote 0
                    • M Offline
                      mr.ortizx
                      last edited by

                      I just paid for Enterprise support and I was told the following:

                      "Hello,

                      Unfortunately, this is a somewhat rare issue that has not been solved yet. It is much less prevalent in pfSense CE 2.5.2, 2.7, and pfSense Plus 22.05. There aren't any workarounds currently, so rolling back or upgrading are the only steps you can currently take to mitigate the issue. You may track the issue here:

                      https://redmine.pfsense.org/issues/13014
                      "
                      I hope this helps you guys. event though redmine says all tunnels continue to operate normally, Netgate support mentioned that they also see instances where all tunnels will drop which is the case for all of us.

                      A G 2 Replies Last reply Reply Quote 0
                      • A Offline
                        auroramus @mr.ortizx
                        last edited by auroramus

                        @mr-ortizx really appreciate you letting us know.

                        1 Reply Last reply Reply Quote 0
                        • A Offline
                          auroramus
                          last edited by

                          I have updated to 2.7 i will keep you guys updated.

                          1 Reply Last reply Reply Quote 1
                          • G Offline
                            gassyantelope @mr.ortizx
                            last edited by

                            @mr-ortizx Thanks man! At least we finally got an official response from them. I'm gonna do what @auroramus did and update to 2.7 as well to see if it helps at all. It can't hurt at this point.

                            M 1 Reply Last reply Reply Quote 1
                            • M Offline
                              mr.ortizx @gassyantelope
                              last edited by

                              @gassyantelope @auroramus Please let me know how it went after upgrading to the version 2.7

                              abliznoA 1 Reply Last reply Reply Quote 0
                              • A Offline
                                auroramus
                                last edited by

                                Hi Guys

                                So far so good with 2.7 have not had a single drop in the tunnels for days now soo ye give it a go and let me know.

                                A 1 Reply Last reply Reply Quote 0
                                • A Offline
                                  auroramus @auroramus
                                  last edited by auroramus

                                  I have been running 2.7 since 30th June and i have not had a single blip.

                                  Let me know how you guys get on.

                                  G 1 Reply Last reply Reply Quote 0
                                  • G Offline
                                    gassyantelope @auroramus
                                    last edited by

                                    @auroramus I updated to 2.7 yesterday. It's only been 24 hours, but I haven't had the issue yet. That's already an improvement for me, seeing as I had to reboot the firewall once or twice a day when on 2.6. I'll provide another update in a few days. I'm crossing my fingers.

                                    G 1 Reply Last reply Reply Quote 0
                                    • G Offline
                                      gassyantelope @gassyantelope
                                      last edited by gassyantelope

                                      @gassyantelope I spoke too soon. I just had the issue occur on 2.7.

                                      Disclosure: Potentially justifiable rant below :)

                                      Investigating and fixing this issue really needs to be a higher priority at this point. There are reports about the issue from 5+ years ago, yet it still exists. The latest redmine issue report (from 3 months ago) hasn't had much traction, as far as someone actually investigating the problem. It just keeps having its target version pushed back over and over.

                                      I get that there are other issues that need to be fixed as well, but this is an issue that, essentially, makes pfSense a nonviable option to use as a firewall in a production environment. Netgate states it to be a "somewhat rare" issue, yet there are many threads and redmine reports, spanning years, that show that this issue is more common than they make it out to be.

                                      My company has primarily used WatchGuard firewalls for years, which are decent enough, but their capabilities are lacking in various areas (I'd prefer to move away from them, personally). We started installing some Netgate/pfSense devices for some "smaller" networks, that only have 5-10 IPsec tunnels, and found pfSense to run stably and have far superior capabilities. We were ready to purchase ~30 Netgate firewalls to replace all of the WatchGuards, but wanted to test pfSense on a "larger" networks (50+ IPsec tunnels) to make sure there were no issues before we pulled the trigger. That large network test led us to where we are today, exposing this issue that completely breaks IPsec VPNs constantly.

                                      As much as I like pfSense (which I'll continue to use for my home lab) and really want to move away from WatchGuard and transition to Netgate/pfSense firewalls, that can't be done for as long as this issue continues to exist. A firewall with lackluster capabilities, but fully working IPsec VPNs, is better than a very capable firewall that has to be rebooted 1-2 times per day to get IPsec VPNs, which I'd consider a core feature of all firewalls, to stay up and work properly.

                                      I'll be putting the WatchGuards back in place for now. I'll continue to monitor this thread and the redmine issue page for updates. I'm still willing to swap the pfSense firewall back in to assist with the testing of possible solutions, as I'd like to see this problem fixed some day. I just can't have pfSense be our day to day, primary, firewall in its current state.

                                      Rant over.

                                      1 Reply Last reply Reply Quote 0
                                      • abliznoA Offline
                                        ablizno @mr.ortizx
                                        last edited by

                                        @mr-ortizx Updated to latest 2.7 dev build, issue still occurs with the same frequency as before.

                                        1 Reply Last reply Reply Quote 0
                                        • M Offline
                                          mr.ortizx
                                          last edited by

                                          I was asked by Netgate technical support to upgrade to the version. Pfsense Plus 22.05
                                          Issue persisted. I will continue working with support.

                                          abliznoA 1 Reply Last reply Reply Quote 1
                                          • abliznoA Offline
                                            ablizno @mr.ortizx
                                            last edited by

                                            @mr-ortizx wish there was some way to help point them towards the root of the issue. We know its due to the vici socket getting overwhelmed/locked up. When it happens if you run sockstat | grep -i vici you can see charon is overwhelmed. It started as like once a week for me and now its every ~12 hours it seems. Tunnels expire every 8 hours, so it doesn't appear to be directly related to the tunnels reconnecting. Opening Command Prompt and running pgrep -f charon to get the PIDs then kill -9 [pid] [pid] works as long as you restart the IPSEC service twice (not sure why it needs to be restarted twice) seems to fix it. We know what the problem is, and I'd be willing to provide any logs that help as I understand it is some sort of "rare" issue.

                                            If anyone from netgate sees this, I'd be more than willing to assist in getting this resolved.

                                            R 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.