Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    My IPSEC service hangs

    Scheduled Pinned Locked Moved IPsec
    76 Posts 15 Posters 27.1k Views 17 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • keyserK Offline
      keyser Rebel Alliance @david11717
      last edited by

      @david11717 I don’t know if this applies to you, but I have discovered an IPSec issue today when deploying a bunch of SG-2100 boxes (ARM64 CPU boxes).
      If I use AES-CGM for encryption (both 128 and 256bit) as guided by Netgate, the boxes will stall/become unresponsive after a while if there is more than one Phase2 tunnels active in the Tunnel. Boxes with only one Phase2 tunnel does not seem to suffer the issue.

      Disabling SafeXcel (HW Acceleration) does not mitigate the Issue.
      But changing the cipher on the tunnel to AES256 (not CGM - I believe it is really AES256-CBC) resolves the issue.

      I have a lot of testing to do still, but it’s quite evident the change of cipher resolves the issue.

      Love the no fuss of using the official appliances :-)

      david11717D 1 Reply Last reply Reply Quote 0
      • david11717D Offline
        david11717 @keyser
        last edited by

        @keyser 99% of my IPSec VPNs use AES256. I have one that uses AES256-GCM and I changed it to AES256 in the testing process (a month or two ago). The issue persisted. That being said, the devices I'm using are x86_64 and not ARM based.

        1 Reply Last reply Reply Quote 0
        • david11717D Offline
          david11717 @slimjim2321
          last edited by

          This post is deleted!
          1 Reply Last reply Reply Quote 0
          • david11717D Offline
            david11717 @slimjim2321
            last edited by

            @slimjim2321 said in My IPSEC service hangs:

            @david11717 Upgrading to 2.7 didn't fix my issue either. Thank you so much for the script.

            Before the holiday weekend I greatly increased the log size (since I have hundreds of free gbs on that device) before swapping, Disabled Ipsec logging entirely, and disabled split connections on one of my bigger tunnels. One of these three changes seems to have kept it from happening over the weekend. I have a feeling it's being triggered by log swapping.

            Did any of these steps completely resolve your problem since then? My script is still working great but I'd love to fix the issue passively than to have a CRON job running every minute.

            S 1 Reply Last reply Reply Quote 0
            • F Offline
              Flukester
              last edited by

              Hi all

              We have been getting similar to this... have around 25 IPsec site to site VPNs and they have been very unstable. Sometimes VPNs function for weeks, then they all drop and we cannot log into management UI and need a reboot to fix.... sometimes they drop twice in one day.

              Been working with TAC enterprise support on this...

              We applied a ipsec kernal hotfix but still died...

              Currently trying a few things to fix -

              Increased IPsec log file size to 1MB
              Put in a 6hr cron job to remove IPsec log file archives
              Disabled VPNs that not currently finished so they not spamming logs
              Removed IPsec widget from dashboard

              Now just waiting to see if we have any stability... if stable it could be down to any one of these changes. Touch wood

              1 Reply Last reply Reply Quote 0
              • F Offline
                Flukester
                last edited by

                new changes on top of those I made before...

                reduced all the 'chatter' clogging up ipsec logs, also makes then much more readable for diags puposes.

                vpn / ipsec / adv settings / ipsec logging

                ike sa - diag > audit

                ike child sa - diag > audit

                networking - control > audit

                message encoding - control > audit

                1 Reply Last reply Reply Quote 1
                • S Offline
                  slimjim2321 @david11717
                  last edited by

                  @david11717 Since my last post 18days ago I haven't experienced the issue. I'm unsure which of the fixes I put in place actually solved it for me though. I have since then reenabled the "rc.peroidic daily" cron jobs that are in pfsense by default and it hasn't returned so that's not it. So my issue was either solved by completely disabling ipsec logging (which isn't ideal) along with greatly increasing the log rotation size (I have it set to 100mb, default is 500kb)....Or Disabling Split Connections for my largest tunnel. I still have split connections enabled for all of my other tunnels and I don't think this was the cause either.
                  For me my best guess is that it was being triggered when the logs rotated and simply drastically removing the majority of where my logs came from (Ipsec) helped slow it down. If I'm right eventually it'll happen again in a few months time given the slow rate the logs fill up now.

                  david11717D 1 Reply Last reply Reply Quote 1
                  • david11717D Offline
                    david11717 @slimjim2321
                    last edited by

                    @slimjim2321 I guess I'll try each of those separately for a few days and see how it goes. Thanks for the update!

                    1 Reply Last reply Reply Quote 0
                    • F Offline
                      Flukester
                      last edited by

                      Just a update since changes I made we now been up for 7days... too early to say yet for sure, but things looking good

                      david11717D 1 Reply Last reply Reply Quote 0
                      • david11717D Offline
                        david11717 @Flukester
                        last edited by

                        @flukester I made the same changes you detailed above and the issue still happens for me. At this point I've stopped playing with it and my cron job fixes the issue within about 60 seconds of it happening. It's kind of ridiculous, honestly.

                        F 1 Reply Last reply Reply Quote 0
                        • R Offline
                          romczak
                          last edited by romczak

                          I have the same problem with the firewall, so I purchased the Enterprise support. They pointed me to the link https://redmine.pfsense.org/issues/13014 and said to follow what is there... without any explanation... Working with Palo Alto and Cisco daily I was quite stunned to get typical online forum response.
                          I guess you got what you pay for, but If anyone is considering buying the subscription I DO NOT recommend.

                          Right now I am running on the above script except the service restarts don't fix IPSec, so I replaced it with system reboot. I am using it for IPSec only so when it goes down I already have outage.

                          G 1 Reply Last reply Reply Quote 0
                          • G Offline
                            gassyantelope @romczak
                            last edited by gassyantelope

                            @romczak Yeah, it's been pretty concerning how this whole situation has been handled. It's one thing if there's problems when using pfSense community edition, since that's always been free and open source and you can't expect everything to be fixed ASAP. It's another thing when Netgate sells devices, support, their paid edition of pfSense, and market them as being stable and robust, yet they can't even do IPsec reliably.

                            I'd expect such common/core functionality to work properly when buying their devices and/or their "enterprise" pfSense edition. We're now going on 8+ months since the issue was submitted on Redmine, but there were other similar issues submitted over a year or two ago. It's a really bad look for them to take people's money for a product that lacks reliable IPsec functionality, something that I've never seen any other firewall software/company struggle with. It's just as bad that you can pay for support, but they don't seem to do much more than point you to the public issue tracker (Redmine, which is free for everyone) or the community forum.

                            The good thing is that the issue on Redmine is finally getting some responses from some developers. Hopefully that means we'll have a real fix soon. Though, it still makes me worry about how future problems may be handled. If there's another issue with core functionality in the future, is it going to be another 8+ month wait to get that fixed as well?

                            R 1 Reply Last reply Reply Quote 0
                            • T Offline
                              Topogigio
                              last edited by

                              I'm leaving pfSense, this is the only solution. The product (considering it from all points of views) simply is now too bad to count on it for production environments and real life.

                              M 1 Reply Last reply Reply Quote 0
                              • M Offline
                                michmoor LAYER 8 Rebel Alliance @Topogigio
                                last edited by

                                @topogigio
                                I feel your frustration. I will say this. Ive had network-breaking bugs in high-profile vendors on products that do simple tasks - i.e. Switching or Firewalling.
                                If you are in the Palo Alto space then i will tell you the 10.X release has been troublesome and that's putting it lightly. In the end, after 5 months we got a specially developed hotfix. Juniper supports switch stacking (VC) and its fun when multicast stops working in a crucial deployment. To this day my S1 ticket still has no solution from the development team. This is a pretty important function - switching - that apparently is not being taken seriously enough even though their internal PRs have numerous customers impacted.

                                Just trying to illustrate that having all of these prodcuts from high profile vendors, eac have issues but at least with Netgate i would argue that the Pros outweigh the Cons.

                                Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                Routing: Juniper, Arista, Cisco
                                Switching: Juniper, Arista, Cisco
                                Wireless: Unifi, Aruba IAP
                                JNCIP,CCNP Enterprise

                                T 1 Reply Last reply Reply Quote 0
                                • T Offline
                                  Topogigio @michmoor
                                  last edited by

                                  @michmoor also Fortinet asks for a lot of money and has a lot of bugs (7.2.x version is and hell) and a bad support service.

                                  Still I think that so many months without a basic function as IPSEC working is not acceptable. This is not some secondary feature.

                                  M 1 Reply Last reply Reply Quote 0
                                  • M Offline
                                    michmoor LAYER 8 Rebel Alliance @Topogigio
                                    last edited by

                                    @topogigio said in My IPSEC service hangs:

                                    Still I think that so many months without a basic function as IPSEC working is not acceptable. This is not some secondary feature.

                                    I dont think IPsec is a secondary feature. I dont think any feature supported by a vendor is secondary. If you pay for TAC than you are within your right to hammer them every single day until a resolution is found otherwise you will have to move to another vendor.
                                    Im only saying that Netgate is not alone in this. Leave Netgate then you may find that vendor X has some game breaking bug and the cycle repeats.
                                    Ultimately its down to the business. If the risk is worth it then you have to find another solution.

                                    Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                    Routing: Juniper, Arista, Cisco
                                    Switching: Juniper, Arista, Cisco
                                    Wireless: Unifi, Aruba IAP
                                    JNCIP,CCNP Enterprise

                                    1 Reply Last reply Reply Quote 0
                                    • R Offline
                                      romczak @gassyantelope
                                      last edited by

                                      @gassyantelope Bug is one thing, but the response of the support is unacceptable. And I selected the Enterprise level subscription.
                                      This was the response to the ticket I've submitted:
                                      """
                                      It looks like you are hitting the bug: https://redmine.pfsense.org/issues/13014

                                      Please refer to that page for updates.

                                      Thanks,
                                      """

                                      M 1 Reply Last reply Reply Quote 0
                                      • M Offline
                                        michmoor LAYER 8 Rebel Alliance @romczak
                                        last edited by

                                        @romczak yeah… that’s not a good way of responding to a customer issue. Can’t defend that.

                                        Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                        Routing: Juniper, Arista, Cisco
                                        Switching: Juniper, Arista, Cisco
                                        Wireless: Unifi, Aruba IAP
                                        JNCIP,CCNP Enterprise

                                        1 Reply Last reply Reply Quote 0
                                        • J Offline
                                          jdemmer
                                          last edited by jdemmer

                                          Any updates on this? Running 4 other pfsense instances in Azure 2.4.5-RELEASE with zero IPSEC service issues. My 5th instance is running 22.05-RELEASE pfsense + as this seems to be the only version of pfsense available in the azure marketplace. I wanted to just deploy regular pfsense 2.4.5 but it is no longer and option. So I am stuck with 22.05 pfSense+ and the IPSEC service hangs exactly as others have in this post and the only fix is to restart the VM. Not seeing much progress @redmine

                                          R 1 Reply Last reply Reply Quote 0
                                          • R Offline
                                            romczak @jdemmer
                                            last edited by

                                            @jdemmer From my observation it looks like the problem occurs when the interesting traffic is unable to bring the VPN up.
                                            I have disabled all keepalives, and since we don't have any traffic towards remotes, the IPSec service stopped failing. It has been up for a month already.

                                            J 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.