Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    IPSec tunnels work for several hours to days but then stop routing traffic

    Scheduled Pinned Locked Moved IPsec
    10 Posts 3 Posters 2.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N
      nbegley
      last edited by

      Hi,
      We have been using PFSense for several years, however for the past 3-4 months we have been having an issue where the tunnels suddenly stop passing traffic despite being established, have SPD and SAD entries and the firewall not showing as blocking any traffic. Dropping the VPN on either side causes the tunnel to rebuild, but it still doesn't pass traffic. The only solution is to reboot the pfsense server.

      The Draytek routers are set to dial out only using the following phase 1 settings IKEv1 Main Mode, Always on, AES128/SHA1/G1 and are using shared keys. There are then two Phase2 tunnels using the AES128/SHA1/no PFS, the option to create a separate SA per subnet is enabled.

      PFSense is configured with the same settings, but has "Disable Rekey", "Responder Only" and "Enable DPD" options enabled.

      I've intentionally dropped the VPN and then taken a sample of the logs, which i'll post up shortly, but they seem to show the tunnel building fine and I get the SAD and SPD entries come back in.

      Would anyone have any suggestions on what could be causing this. There are multiple sites connecting using the Draytek routers, but they all show the same symptoms eventually.

      1 Reply Last reply Reply Quote 0
      • N
        nbegley
        last edited by nbegley

        Here are the logs from the Draytek side when I dropped the VPN tunnel (it's in reverse order).

        2020-02-13 15:02:39 [L2L][UP][IPsec][@1:SITENAME]
        2020-02-13 15:02:39 Delete exist flowstate of static route 0A150000/FFFF0000 ...
        2020-02-13 15:02:39 sent QI2, IPsec SA established with xx.xx.xxx.xx (pfsenseIP). In/Out Index: 0/-1
        2020-02-13 15:02:39 IPsec SA #1016 will be replaced after 2996 seconds
        2020-02-13 15:02:39 IKE ==>, Next Payload=ISAKMP_NEXT_HASH, Exchange Type = 0x20, Message ID = 0x6ef620c4
        2020-02-13 15:02:39 Accept ESP prorosal ENCR ESP_AES, HASH AUTH_ALGORITHM_HMAC_SHA2_256
        2020-02-13 15:02:39 IKE <==, Next Payload=ISAKMP_NEXT_HASH, Exchange Type = 0x20, Message ID = 0x6ef620c4
        2020-02-13 15:02:39 [IPSEC/IKE][L2L][1:SITENAME][@xx.xx.xxx.xx (pfsenseIP)] quick_outI1: match network
        2020-02-13 15:02:39 Client L2L remote network setting is 10.21.0.0/16
        2020-02-13 15:02:39 IKE ==>, Next Payload=ISAKMP_NEXT_HASH, Exchange Type = 0x20, Message ID = 0x6ef620c4
        2020-02-13 15:02:38 Start IKE Quick Mode to xx.xx.xxx.xx (pfsenseIP)
        2020-02-13 15:02:38 Dialing Node1 (SITENAME) :
        2020-02-13 15:02:38 [L2L][UP][IPsec][@1:SITENAME]
        2020-02-13 15:02:38 [IPsec] Dial-out for 'More route' of L2L[0]
        2020-02-13 15:02:38 Delete exist flowstate of static route AC150000/FFFF0000 ...
        2020-02-13 15:02:38 sent QI2, IPsec SA established with xx.xx.xxx.xx (pfsenseIP). In/Out Index: 0/-1
        2020-02-13 15:02:38 IPsec SA #1015 will be replaced after 2850 seconds
        2020-02-13 15:02:38 IKE ==>, Next Payload=ISAKMP_NEXT_HASH, Exchange Type = 0x20, Message ID = 0x5b369a20
        2020-02-13 15:02:38 Accept ESP prorosal ENCR ESP_AES, HASH AUTH_ALGORITHM_HMAC_SHA2_256
        2020-02-13 15:02:38 IKE <==, Next Payload=ISAKMP_NEXT_HASH, Exchange Type = 0x20, Message ID = 0x5b369a20
        2020-02-13 15:02:38 [IPSEC/IKE][L2L][1:SITENAME][@xx.xx.xxx.xx (pfsenseIP)] quick_outI1: match network
        2020-02-13 15:02:38 Client L2L remote network setting is 172.21.0.0/16
        2020-02-13 15:02:38 IKE ==>, Next Payload=ISAKMP_NEXT_HASH, Exchange Type = 0x20, Message ID = 0x5b369a20
        2020-02-13 15:02:38 Start IKE Quick Mode to xx.xx.xxx.xx (pfsenseIP)
        2020-02-13 15:02:38 ISAKMP SA established with xx.xx.xxx.xx (pfsenseIP). In/Out Index: 0/-1
        2020-02-13 15:02:38 ISAKMP SA #1014 will be replaced after 21375 seconds
        2020-02-13 15:02:38 IKE <==, Next Payload=ISAKMP_NEXT_ID, Exchange Type = 0x2, Message ID = 0x0
        2020-02-13 15:02:38 IKE ==>, Next Payload=ISAKMP_NEXT_ID, Exchange Type = 0x2, Message ID = 0x0
        2020-02-13 15:02:38 NAT-Traversal: Using RFC 3947, no NAT detected
        2020-02-13 15:02:38 IKE <==, Next Payload=ISAKMP_NEXT_KE, Exchange Type = 0x2, Message ID = 0x0
        2020-02-13 15:02:38 IKE ==>, Next Payload=ISAKMP_NEXT_KE, Exchange Type = 0x2, Message ID = 0x0
        2020-02-13 15:02:38 Accept Phase1 prorosals : ENCR OAKLEY_AES_CBC, HASH OAKLEY_SHA2_256
        2020-02-13 15:02:38 IKE <==, Next Payload=ISAKMP_NEXT_SA, Exchange Type = 0x2, Message ID = 0x0
        2020-02-13 15:02:38 IKE ==>, Next Payload=ISAKMP_NEXT_SA, Exchange Type = 0x2, Message ID = 0x0
        2020-02-13 15:02:38 [IPSEC/IKE][L2L][1:SITENAME][@xx.xx.xxx.xx (pfsenseIP)] Initiating IKE Main Mode
        2020-02-13 15:02:38 Initiating IKE Main Mode to xx.xx.xxx.xx (pfsenseIP)
        2020-02-13 15:02:38 Dialing Node1 (SITENAME) :
        2020-02-13 15:02:37 IKE_RELEASE VPN : L2L Dial-out, Profile index = 1, Name = SITENAME, ifno = 11
        2020-02-13 15:02:37 Delete exist flowstate of VPN ifno: 11 ....
        2020-02-13 15:02:37 [L2L][DOWN][IPsec][@1:SITENAME]
        2020-02-13 15:02:37 DropVPN() VPN : L2L Dial-out, Profile index = 1, Name = SITENAME, ifno = 11
        2020-02-13 15:02:37 IKE_RELEASE VPN : L2L Dial-out, Profile index = 1, Name = SITENAME, ifno = 10
        2020-02-13 15:02:37 IKE ==>, Next Payload=ISAKMP_NEXT_HASH, Exchange Type = 0x5, Message ID = 0x61712547
        2020-02-13 15:02:37 IKE ==>, Next Payload=ISAKMP_NEXT_HASH, Exchange Type = 0x5, Message ID = 0xf446b1b4
        2020-02-13 15:02:37 Delete exist flowstate of VPN ifno: 10 ....
        2020-02-13 15:02:37 [L2L][DOWN][IPsec][@1:SITENAME]
        2020-02-13 15:02:37 DropVPN() VPN : L2L Dial-out, Profile index = 1, Name = SITENAME, ifno = 10

        1 Reply Last reply Reply Quote 0
        • N
          nbegley
          last edited by

          I don't seem to be able to post the PFSense log, as it gets detected as spam when I post, so here's an image of it instead:

          6d820dc8-146f-40b2-83f8-036424479dd9-image.png

          K D 2 Replies Last reply Reply Quote 0
          • K
            Konstanti @nbegley
            last edited by

            @nbegley

            Hi
            It is possible that your problem is described here

            https://forum.netgate.com/topic/148857/ipsec-ikev2-error-trap-not-found-unable-to-acquire-reqid

            N 1 Reply Last reply Reply Quote 0
            • N
              nbegley @Konstanti
              last edited by

              @Konstanti thanks for the reply, I've added a cron job as you suggested, since i'm sure i'll run into that limitation at some point.

              I think my issue may be slightly different though, since rebooting the ipsec service doesn't fix the issue, and sometimes the issue occurs after only a number of hours too.

              I'll check the reqid next time it happens though.

              1 Reply Last reply Reply Quote 0
              • D
                dusan @nbegley
                last edited by

                @nbegley Are you sure you disabled PFS on both sides?

                From the log on pfSense side, I don't think so.

                N 1 Reply Last reply Reply Quote 0
                • N
                  nbegley @dusan
                  last edited by

                  @dusan The default configuration we are using has PFS on the Phase1 but no PFS on Phase2.

                  I have tried using PFS on both phases on a few sites to see if that improved things, which it didn't. I've just had a look at the particular site those logs are for and that site is using AES128/SHA256/PFS14 for both phases, so yes it has PFS on both sides for both phases.

                  D 1 Reply Last reply Reply Quote 0
                  • D
                    dusan @nbegley
                    last edited by dusan

                    @nbegley I had similar problem many years ago with IPv4 and now I'm having a similar problem with IPv6. Both are similar to yours in that a non-pfSense device trying to connect to pfSense. Both differs from yours in that the IKEv1 Phase is indicated as "connected" on one side, or on both sides but there're no traffic, i.e. no ESP packets are delivered to the other end of the tunnel in the first place (not after several hours). Both cases trace back to mis-routing problem of the ISP on the side of non-pfSense device (both my cases was connection between different ISPs).

                    Until the ISPs resolve their own problem, the temporary solution (successful in my case) was setting short margin time, disabling Responder only, enabling automatically ping host on pfSense and, if Phase 1 is "connected" only on one side, disable Internet connection on pfSense side for awhile.

                    Automatically ping host obviously helps. Short margin time and disabled Responder only help because otherwise pfSense would time up first (most of the cases). Disabled Internet connection helps because it clears the mis-route in route cache of ISP's router (possibly border router.)

                    N 1 Reply Last reply Reply Quote 0
                    • N
                      nbegley @dusan
                      last edited by

                      @dusan I've made the changes to reduce the tunnel lifetime etc and this does seem to have helped quite a bit, and it now seem more consistent on when it stops e.g. it's taken around 2 days on several occasions now. I can also confirm that I do see the ReqID errors every time the issue occurs, however i've now got a cron job restarting the IPSEC service twice a day, yet I still get ReqID trap errors. If I try manually restarting the service when I get the ReqID errors then this seems to stop any further errors being logged, but it doesn't get the connections that were failing to start, I have to reboot for that.

                      I'm thinking of using cron to restart PFSense daily as that seems to be the only thing that gets it going again.

                      D 1 Reply Last reply Reply Quote 0
                      • D
                        dusan @nbegley
                        last edited by

                        @nbegley I'm not sure why you disable PFS Disable Rekey Disable Reauth or set Responder Only. The more change you make to pfSense's default settings the less chance you'll keep tunnels connected. According to my test (10 years ago), Draytek is compatible to pfSense, but I suggest you do your own interoperability test.

                        -- Set margin time = 30s.
                        -- Set short lifetime, like 30m Phase 1 and 15m Phase 2.
                        -- Do not set Responder Only. Don't Disable Reauth, Disable Rekey or turn off PFS.
                        -- (Just for the purpose of testing) Use different ciphersuit for Phase 1 and Phase 2 (say, DH group 15 and 14 respectively).

                        If the tunnel can't be established or stops working after 1h, problem is yours. If it stops after 2 days, go after your ISP.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.