Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    2.4.5 <-> 2.4.4-p3 IPsec tunnel stops passing traffic after ~48 hours

    Scheduled Pinned Locked Moved IPsec
    5 Posts 3 Posters 679 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      monotypeTattoo
      last edited by

      Hello,

      We have three sites with pfSense firewalls in redundant pairs. We are relying on combination of OpenVpn and IPsec for site-to-site links*.
      We have started to roll out pfSense 2.4.5. However, we've found that after a period of time (usually 48 hours, give or take an hour), traffic will stop flowing over the IPSec tunnel.

      This is the IPsec configuration from the site which has been upgraded to pfSense 2.4.5 (0.50.50.50):

      conn con2000
              fragmentation = yes
              keyexchange = ikev2
              reauth = yes
              forceencaps = no
              mobike = no
      
              rekey = no
              installpolicy = yes
              type = tunnel
              dpdaction = restart
              dpddelay = 10s
              dpdtimeout = 60s
      
              auto = route
              left = 0.50.50.50
              right = 0.40.40.40
              leftid = 0.50.50.50
              ikelifetime = 28800s
              lifetime = 3600s
              ike = aes256gcm128-sha256-modp2048!
              esp = aes256gcm128-sha256-modp2048,aes256gcm96-sha256-modp2048,aes256gcm64-sha256-modp2048!
              leftauth = psk
              rightauth = psk
              rightid = 0.40.40.40
              rightsubnet = 172.16.0.0/21,172.16.16.0/21
              leftsubnet = 172.16.8.0/21,172.16.24.0/21
      

      This is the IPsec configuration from the site which is still running pfSense 2.4.4-p3 (0.40.40.40):

      conn con1000
              fragmentation = yes
              keyexchange = ikev2
              reauth = yes
              forceencaps = no
              mobike = no
      
              rekey = no
              installpolicy = yes
              type = tunnel
              dpdaction = restart
              dpddelay = 10s
              dpdtimeout = 60s
              auto = route
              left = 0.40.40.40
              right = 0.50.50.50
              leftid = 0.40.40.40
              ikelifetime = 28800s
              lifetime = 3600s
              ike = aes256gcm128-sha256-modp2048!
              esp = aes256gcm128-sha256-modp2048,aes256gcm96-sha256-modp2048,aes256gcm64-sha256-modp2048,aes256gcm128-sha256-modp2048,aes256gcm96-sha256-modp2048,aes256gcm64-sha256-modp2048,aes256gcm128-sha256-modp2048,aes256gcm96-sha256-modp2048,aes256gcm64-sha256-modp2048,aes256gcm128-sha256-modp2048,aes256gcm96-sha256-modp2048,aes256gcm64-sha256-modp2048!
              leftauth = psk
              rightauth = psk
              rightid = 0.50.50.50
              rightsubnet = 172.16.8.0/21,172.16.24.0/21
              leftsubnet = 172.16.0.0/21,172.16.16.0/21
      

      First thing to draw on, is that the esp ciphers (for the P2 proposal) appear to be repeated 4 times in the second sites configuration.

      When the problem occurs, the only indication of an issue is that DPD requests and responses are getting transmitted over the tunnel:

      Apr 19 00:31:57 	charon 		15[ENC] <con2000|63> parsed INFORMATIONAL response 54 [ ]
      Apr 19 00:31:57 	charon 		15[NET] <con2000|63> received packet: from 0.40.40.40[500] to 0.50.50.50[500] (57 bytes)
      Apr 19 00:31:57 	charon 		15[NET] <con2000|63> sending packet: from 0.50.50.50[500] to 0.40.40.40[500] (57 bytes)
      Apr 19 00:31:57 	charon 		15[ENC] <con2000|63> generating INFORMATIONAL request 54 [ ]
      Apr 19 00:31:57 	charon 		15[IKE] <con2000|63> sending DPD request
      Apr 19 00:31:46 	charon 		05[ENC] <con2000|63> parsed INFORMATIONAL response 53 [ ]
      Apr 19 00:31:46 	charon 		05[NET] <con2000|63> received packet: from 0.40.40.40[500] to 0.50.50.50[500] (57 bytes)
      Apr 19 00:31:46 	charon 		05[NET] <con2000|63> sending packet: from 0.50.50.50[500] to 0.40.40.40[500] (57 bytes)
      Apr 19 00:31:46 	charon 		05[ENC] <con2000|63> generating INFORMATIONAL request 53 [ ]
      Apr 19 00:31:46 	charon 		05[IKE] <con2000|63> sending DPD request
      

      These are the equivalent logs from the other firewall (0.40.40.40)

      Apr 19 00:29:46 	charon 		05[NET] <con1000|119> sending packet: from 0.40.40.40[500] to 0.50.50.50[500] (57 bytes)
      Apr 19 00:29:46 	charon 		05[ENC] <con1000|119> generating INFORMATIONAL response 41 [ ]
      Apr 19 00:29:46 	charon 		05[ENC] <con1000|119> parsed INFORMATIONAL request 41 [ ]
      Apr 19 00:29:46 	charon 		05[NET] <con1000|119> received packet: from 0.50.50.50[500] to 0.40.40.40[500] (57 bytes)
      Apr 19 00:29:36 	charon 		05[NET] <con1000|119> sending packet: from 0.40.40.40[500] to 0.50.50.50[500] (57 bytes)
      Apr 19 00:29:36 	charon 		05[ENC] <con1000|119> generating INFORMATIONAL response 40 [ ]
      Apr 19 00:29:36 	charon 		05[ENC] <con1000|119> parsed INFORMATIONAL request 40 [ ]
      Apr 19 00:29:36 	charon 		05[NET] <con1000|119> received packet: from 0.50.50.50[500] to 0.40.40.40[500] (57 bytes)
      

      After the above instance, we increased the logging level to diag for some aspects of IPsec. In this example, traffic stopped flowing over the tunnel between 00:00:06 and 00:00:12:

      Apr 21 00:00:06 PF101 charon: 11[MGR] checkout IKEv2 SA with SPIs 8d092fb85ad68e18_i 3d019220e36fccd3_r
      Apr 21 00:00:06 PF101 charon: 11[MGR] checkout IKEv2 SA with SPIs 8d092fb85ad68e18_i 3d019220e36fccd3_r
      Apr 21 00:00:06 PF101 charon: 11[MGR] IKE_SA con2000[193] successfully checked out
      Apr 21 00:00:06 PF101 charon: 11[MGR] IKE_SA con2000[193] successfully checked out
      Apr 21 00:00:06 PF101 charon: 11[MGR] checkin IKE_SA con2000[193]
      Apr 21 00:00:06 PF101 charon: 11[MGR] <con2000|193> checkin IKE_SA con2000[193]
      Apr 21 00:00:06 PF101 charon: 11[MGR] checkin of IKE_SA successful
      Apr 21 00:00:06 PF101 charon: 11[MGR] <con2000|193> checkin of IKE_SA successful
      Apr 21 00:00:12 PF101 charon: 11[MGR] checkout IKEv2 SA with SPIs 8d092fb85ad68e18_i 3d019220e36fccd3_r
      Apr 21 00:00:12 PF101 charon: 11[MGR] checkout IKEv2 SA with SPIs 8d092fb85ad68e18_i 3d019220e36fccd3_r
      Apr 21 00:00:12 PF101 charon: 11[MGR] IKE_SA con2000[193] successfully checked out
      Apr 21 00:00:12 PF101 charon: 11[MGR] IKE_SA con2000[193] successfully checked out
      Apr 21 00:00:12 PF101 charon: 11[IKE] sending DPD request
      Apr 21 00:00:12 PF101 charon: 11[IKE] <con2000|193> sending DPD request
      Apr 21 00:00:12 PF101 charon: 11[IKE] queueing IKE_DPD task
      Apr 21 00:00:12 PF101 charon: 11[IKE] <con2000|193> queueing IKE_DPD task
      Apr 21 00:00:12 PF101 charon: 11[IKE] activating new tasks
      Apr 21 00:00:12 PF101 charon: 11[IKE] <con2000|193> activating new tasks
      Apr 21 00:00:12 PF101 charon: 11[IKE]   activating IKE_DPD task
      Apr 21 00:00:12 PF101 charon: 11[IKE] <con2000|193>   activating IKE_DPD task
      Apr 21 00:00:12 PF101 charon: 11[ENC] <con2000|193> generating INFORMATIONAL request 0 [ ]
      Apr 21 00:00:12 PF101 charon: 11[NET] <con2000|193> sending packet: from 0.50.50.50[500] to 0.40.40.40[500] (57 bytes)
      Apr 21 00:00:12 PF101 charon: 11[MGR] checkin IKE_SA con2000[193]
      Apr 21 00:00:12 PF101 charon: 11[MGR] <con2000|193> checkin IKE_SA con2000[193]
      Apr 21 00:00:12 PF101 charon: 11[MGR] checkin of IKE_SA successful
      Apr 21 00:00:12 PF101 charon: 11[MGR] <con2000|193> checkin of IKE_SA successful
      Apr 21 00:00:12 PF101 charon: 11[MGR] checkout IKEv2 SA by message with SPIs 8d092fb85ad68e18_i 3d019220e36fccd3_r
      Apr 21 00:00:12 PF101 charon: 11[MGR] checkout IKEv2 SA by message with SPIs 8d092fb85ad68e18_i 3d019220e36fccd3_r
      Apr 21 00:00:12 PF101 charon: 11[MGR] IKE_SA con2000[193] successfully checked out
      Apr 21 00:00:12 PF101 charon: 11[MGR] IKE_SA con2000[193] successfully checked out
      Apr 21 00:00:12 PF101 charon: 11[NET] <con2000|193> received packet: from 0.40.40.40[500] to 0.50.50.50[500] (57 bytes)
      pr 21 00:00:12 PF101 charon: 11[ENC] <con2000|193> parsed INFORMATIONAL response 0 [ ]
      Apr 21 00:00:12 PF101 charon: 11[IKE] activating new tasks
      Apr 21 00:00:12 PF101 charon: 11[IKE] <con2000|193> activating new tasks
      Apr 21 00:00:12 PF101 charon: 11[IKE] nothing to initiate
      Apr 21 00:00:12 PF101 charon: 11[IKE] <con2000|193> nothing to initiate
      Apr 21 00:00:12 PF101 charon: 11[MGR] checkin IKE_SA con2000[193]
      

      The notable new difference here, between working and broken states are the 'activating new tasks' and 'nothing to initiate' log messages. These are not logged on the 2.4.4-p3 firewall when the problem happens.

      We have now set the log level to 'diag' for more components and will hopefully have some more information when the problem occurs again. Are there any commands we can run or particular things we should look out for, the next we hit this situation? We are trying to determine whether we should plow on with the pfSense 2.4.5 upgrade or rollback the firewalls that have already received it.

      Thanks

      *We recently found, upon upgrading to 2.4.4-p3 that OpenVPN was CPU bound on our virtualised instances, and will be flipping out for IPsec shortly after this issue is resolved.

      1 Reply Last reply Reply Quote 0
      • R
        rmccall2k16
        last edited by rmccall2k16

        Why do you have rekeying disabled? Are you 100% sure that your MTUs are configured? PMUTD has never worked for me in any vpn scenario.

        Also, why is your lifetime so low? Its almost a 5th of the original value that Pfsense sets by default. Try going back to that.

        In the IPSEC phase 2 of each host, have you set it to automatically ping the other side (Under "Advanced Options")? There should be pings back and forth between the hosts at all times.

        1 Reply Last reply Reply Quote 0
        • M
          monotypeTattoo
          last edited by

          Thank you for your reply.

          @rmccall2k16 said in 2.4.5 <-> 2.4.4-p3 IPsec tunnel stops passing traffic after ~48 hours:

          Why do you have rekeying disabled? Are you 100% sure that your MTUs are configured? PMUTD has never worked for me in any vpn scenario.

          No idea why rekeying was disabled. We have enabled it now.
          I don't believe MTUs are a problem here.

          Also, why is your lifetime so low? Its almost a 5th of the original value that Pfsense sets by default. Try going back to that.

          The values P1/IKE lifetime and P2/SA lifetime are the same as the ones I see when we set a new connection up from scratch. Do you think changing them would affect the issue we have?

          In the IPSEC phase 2 of each host, have you set it to automatically ping the other side (Under "Advanced Options")? There should be pings back and forth between the hosts at all times.

          Yes. However, this tunnel is always carrying traffic in both directions.

          Thanks again
          MT

          1 Reply Last reply Reply Quote 0
          • M
            marcquark
            last edited by marcquark

            Just to clarify, are you seeing the tunnels as up (both P1 and P2), but no traffic passing from one side to the other?
            I'm seeing that on 2.4.4-p3 too, using VTIs though. I posted about it here https://forum.netgate.com/topic/149043/gateway-monitoring-gets-stuck-in-infinite-loop-when-using-multiple-vtis-on-sg-3100
            Back then i thought i had drilled the root cause down to gateway monitoring, but with some more reports i've read in this forum, i believe i was following a symptom rather than a cause.

            Some more threads
            https://forum.netgate.com/topic/150508/ipsec-tunnels-work-for-several-hours-to-days-but-then-stop-routing-traffic/10
            https://forum.netgate.com/topic/148857/ipsec-ikev2-error-trap-not-found-unable-to-acquire-reqid

            It might all get fixed by the strongSwan patch mentioned in the last thread. Time will tell when it's going to be implemented

            1 Reply Last reply Reply Quote 0
            • M
              monotypeTattoo
              last edited by

              @marcquark said in 2.4.5 <-> 2.4.4-p3 IPsec tunnel stops passing traffic after ~48 hours:

              Just to clarify, are you seeing the tunnels as up (both P1 and P2), but no traffic passing from one side to the other?

              I'm sorry I'm not sure it is the same issue. We started the IPSec Tunnel and everything was fine, until around 48 hours afterwards, at which point traffic seems to stop flowing over the tunnel, save for the DPD requests and responses suggesting the tunnel itself is fine.

              I think we have resolved that problem too. Again, we have no idea why rekeying was disabled on the P1s, but having enabled it the tunnels have been working faultlessly for just over 10 days.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.