Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    IPSEC suddenly stops working

    Scheduled Pinned Locked Moved IPsec
    9 Posts 3 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      Paulk201270
      last edited by

      I need a little help or advice if possible. I currently have 4 sites that were all running 2.4.5p1 pfSense with IPSEC connecting all 4 together without any major issues.

      Internal IPs in /24s using 172.16.0.x, 172.16.1.x, 172.16.2.x and 172.16.3.x.

      With the release of 2.5.0 I ran the upgrade on 172.16.0.x (which is ideally a test-lab location) which kinda screwed up (I know, should have clean installed…) The environment was using a Lanner box running an older Atom processor which is pretty-much end-of-life, so have some Watchguard Firebox XTM 5’s with C2D processors, 4Gb RAM - which was my short-term upgrade path for greater use of IDS as the Atom ran too high on utilization when doing a lot…

      Built the XTM5, restored a configuration and after a lot of tweaking got it running with all packages and IPSEC tunnels. No biggie, just took longer and a little more complex than I had hoped.

      Herein lies the issue… After running for a while, the IPSEC on that location just appears to stop, VPN offline, clicking connect from there or from one of the other sites doesn’t resolve anything. Clicking stop on the GUI doesn’t stop, restart also seems to do nothing. Am unable to run ‘swanctl --list-conns’ or ‘swanctl --load-all --file /var/etc/ipsec/swanctl.conf --debug 1’ as it doesn’t respond with anything

      If I reboot, all is good for a while until the same happens again.

      Believing the issue is with 2.5.0, I just rebuilt that system to 2.4.5p1, restored some config to keep my IPSEC tunnels, interfaces etc, NAT, Firewall rules and so on an so forth. System was up and running from midnight.

      Just realized a short while ago that the tunnel is now not responding again. Internet is not dropping as I have remote access to computers at that location. Logged into firewall and checked Status, IPSEC which says the usual collecting information, nothing. Ran shell, cannot issue swanctl commands just like before. Checking the IPSEC log from the shell shows corruption occurring @ 10:52 -
      Apr 15 10:52:04 FCU-Group-FW charon: 11[IKE] <con1000|5> activatCLOG^A^@^@^@\xc2\xf2^A^@\xec\xcd^G^@^@^@^@^@

      Can this ACTUALLY be hardware related to the XTM5 or am I missing something absolutely obvious??? I mean, I put it back to 2.4.5p1 so same version as the others etc…

      Obviously I can’t change the others to 2.5.0 or 2.5.1 until I know for sure what is the root cause and ensure stability…

      Any help would be greatly appreciated…!

      L 1 Reply Last reply Reply Quote 0
      • L
        lst_hoe @Paulk201270
        last edited by

        @paulk201270 Looks like bad RAM to me. Do you have ECC memory?

        P 1 Reply Last reply Reply Quote 0
        • P
          Paulk201270 @lst_hoe
          last edited by

          @lst_hoe Nope, regular RAM, but I had changed it as a test but the same kept happening. In the interim I have now rebuilt to the new 251 image and aside from Unbound crashing - adding watch to restart it, the IPSEC appears to be working better,

          P 1 Reply Last reply Reply Quote 0
          • P
            Paulk201270 @Paulk201270
            last edited by

            @paulk201270 Still having the same issue, even having switched RAM.

            Another firewall (different hardware) exhibiting the same issue. Both running 2.5.1, both built clean and reconfigured manually to remove any doubt of upgrade issues. Both built on Watchguard hardware XTM5s.

            If selecting Stop for IPSEC on the services page it never stops. Rebooting Firewall normalizes and it works for a day or so then stops again.

            Log shows the following and then nothing for days till rebooted...

            May 7 00:16:57 charon 59608 12[ENC] <con100000|63> generating INFORMATIONAL response 716 [ ]
            May 7 00:16:57 charon 59608 12[NET] <con100000|63> sending packet: from XXX.XXX.XXX.XXX[500] to XXX.XXX.XXX.XX[500] (57 bytes)
            May 7 00:17:00 newsyslog 25803 logfile turned over due to size>500K
            May 7 00:17:00 newsyslog 25803 logfile turned over due to size>500K
            May 7 00:17:06 charon 59608 15[NET] <con300000|66> received packet: from XXX.XXX.XXX.XX[500] to XXX.XXX.XX.XX[500] (57 bytes)
            May 7 00:17:06 charon 59608 15[ENC] <con300000|66> parsed INFORMATIONAL request 344 [ ]
            May 7 00:17:06 charon 59608 15[ENC] <con300000|66> generating INFORMATIONAL response 344 [ ]
            May 7 00:17:06 charon 59608 15[NET] <con300000|66> sending packet: from XXX.XXX.XX.XXX[500] to XXX.XXX.XXX.XX[500] (57 bytes)
            May 7 00:28:45 charon 59608 03[KNL] creating rekey job for CHILD_SA ESP/0xc4427143/XXX.XXX.XXX.XXX
            May 7 00:29:32 charon 59608 03[KNL] creating rekey job for CHILD_SA ESP/0xc3cd1301/XXX.XXX.XXX.XXX
            May 7 00:35:33 charon 59608 03[KNL] creating rekey job for CHILD_SA ESP/0xc2535822/XXX.XXX.XXX.XXX
            May 7 00:37:14 charon 59608 03[KNL] creating rekey job for CHILD_SA ESP/0xc6823624/XXX.XXX.XXX.XXX
            May 7 00:38:50 charon 59608 03[KNL] creating delete job for CHILD_SA ESP/0xc4427143/XXX.XXX.XXX.XXX
            May 7 00:38:50 charon 59608 03[KNL] creating delete job for CHILD_SA ESP/0xc3cd1301/XXX.XXX.XXX.XXX
            May 7 00:46:02 charon 59608 03[KNL] creating delete job for CHILD_SA ESP/0xc2535822/XXX.XXX.XXX.XXX
            May 7 00:46:02 charon 59608 03[KNL] creating delete job for CHILD_SA ESP/0xc6823624/XXX.XXX.XXX.XXX
            May 7 00:51:12 charon 59608 03[KNL] creating rekey job for CHILD_SA ESP/0xc12d5134/XXX.XXX.XXX.XXX
            May 7 00:54:35 charon 59608 03[KNL] creating rekey job for CHILD_SA ESP/0xc2f81b76/XXX.XXX.XXX.XXX
            May 7 01:02:12 charon 59608 03[KNL] creating delete job for CHILD_SA ESP/0xc12d5134/XXX.XXX.XXX.XXX
            May 7 01:02:12 charon 59608 03[KNL] creating delete job for CHILD_SA ESP/0xc2f81b76/XXX.XXX.XXX.XXX
            May 11 21:56:19 charon 59608 03[KNL] interface pppoe0 activated
            May 11 21:56:19 charon 59608 03[KNL] XXX.XXX.XXX.XXX disappeared from pppoe0
            May 11 21:56:19 charon 59608 03[KNL] interface pppoe0 deactivated
            May 11 21:56:34 charon 59608 03[KNL] XXX.XXX.XXX.XXX appeared on pppoe0
            May 12 12:41:43 charon 59608 00[DMN] SIGTERM received, shutting down

            Even selecting stop multiple times nothing else adds or changes in log.

            If it was just happening on one machine I could understand something iffy, but it's same style hardware but multiple locations etc.

            Would appreciate if anyone has some suggestions as to how to proceed...

            Many thanks
            Paul.

            P 1 Reply Last reply Reply Quote 0
            • P
              Paddy @Paulk201270
              last edited by

              @paulk201270

              I'm seeing this exact same behaviour. It started after upgrading to 2.5.1.
              The tunnels all stop working. The IPSEC status page shows no tunnels. The IPSEC widget on the Dashboard has the spinning cog permanently. The CPU widget also just displays the spinning cog.

              The stop and the restart IPSEC service buttons do nothing and sometimes even kills the web gui.

              A reboot sorts the problem for a while but it always returns.

              My logs are pretty much the same as above.

              P 1 Reply Last reply Reply Quote 0
              • P
                Paulk201270 @Paddy
                last edited by

                @paddy Are you using similar/same hardware???

                P 1 Reply Last reply Reply Quote 0
                • P
                  Paddy @Paulk201270
                  last edited by

                  @paulk201270 Yes I'm using the Watchguard xtm5.

                  I've gone back to the previous 2.4 pfsense and so far so good.

                  P 1 Reply Last reply Reply Quote 0
                  • P
                    Paulk201270 @Paddy
                    last edited by

                    @paddy Ah, thanks for that confirmation. In the interim I've just put the first site back to my older Lanner solution to see if it is specific to the hardware, which I think it is in some way. Older hardware not as powerful but not sure how to get Netgate to investigate the specifics of the XTM 5 series - perhaps someone can advise what to request...

                    P 1 Reply Last reply Reply Quote 0
                    • P
                      Paulk201270 @Paulk201270
                      last edited by Paulk201270

                      By mistake posted this to Redmine as a 'potential' bug, but was told that they do not support this particular hardware. Would appreciate it if anyone else could potentially reproduce or add additional info that might make further investigation possible...

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.