Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    IPSEC VTI tunnels lost packets

    Scheduled Pinned Locked Moved IPsec
    26 Posts 5 Posters 3.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      Jose Carlos S
      last edited by

      Hi everybody,

      We have an installation with several pfsense connected via IPSEC (about 20) with P2 routed running and we are migrating them to P2 type VTI and we are finding that when we change the P2 type "routed" to type "VTI" we are registering constant packet losses that can be very varied, from 5% to 30% and these losses are always constant over time.

      In order to determine the problem we have put several probes to locate the problem, one of them monitors the public IP of the pfsense and the other monitors the internal IP of the VTI tunnel. The probe that monitors the public IP does not have any type of packet loss while the probe that monitors the internal IP of the tunnel loses packets. We attach a diagram and some screenshots:

      Captura de pantalla 2020-10-29 a las 14.52.43.png

      Gateways status:

      Captura de pantalla 2020-10-29 a las 14.54.23.png

      Note that the red boxes refer to the same line monitored by the external IP or through the VTI tunnel 39% versus 0.0% packet lost

      Across VTI IPSEC graph

      Captura de pantalla 2020-10-29 a las 14.59.27.png

      External IP graph (WAN Monitoring):

      Captura de pantalla 2020-10-29 a las 15.00.18.png

      Pfsense version: 2.4.5-RELEASE-p1 (amd64)
      Hw: Several hardware (APU2D4, APU3D4, VMs, etc...)

      IPSEC P1:
      -IKE v2
      -PSK
      -AES-256-GCM
      -128 bits
      -SHA256
      -DH 2048 bits
      -Dead Peer Detection: Disabled

      IPSEC P2:
      -VTI
      -AES-256-GCM
      -128 bits
      -SHA256
      -DH 2048 bits

      Please, note that the installation before was working perfectly without any kind of packet loss in "routed" mode and we have simply changed it to "VTI" mode while maintaining the same level of encryption and security, so we ruled out any hardware or IPSEC configuration problem in P1.

      Additional notes:
      -VTI Interface configured as netgate recomended.
      -Gateways default configuration (monitoring enabled).
      -The packet loss is not the same between all firewalls, for example the pfsense-1 can lose 35% of packets against the pfsense-2 and the same with the pfsense-3 10% of lost packets and the pfsense-3 with the pfsense-1 17% packet loss ... While there is no packet loss between WANs.

      That could be happening?

      Thanks in advance.

      1 Reply Last reply Reply Quote 0
      • P
        pete35
        last edited by

        You may try to set "mtu clamping" on all tunnels and you can search for "ipsec mtu" in this forum to find some more hints. Probably the most common problem within ipsec tunnels.

        <a href="https://carsonlam.ca">bintang88</a>
        <a href="https://carsonlam.ca">slot88</a>

        J 1 Reply Last reply Reply Quote 0
        • B
          bbrendon
          last edited by

          I'm a bit confused because in my pfSense it shows "Routed (VTI)". There is no routed & VTI option.

          Also, VTI has been problematic since day 1. It is now much better but we still have problems with it on occasion where the only solution seems to be a reboot.

          Maybe check Async Crypto setting and make sure it's unchecked?

          Sorry I know none of this is specific to your exact issue.

          J 1 Reply Last reply Reply Quote 0
          • J
            Jose Carlos S @pete35
            last edited by

            @pete35 Thanks you, but ICMP packets normaly has 64 k size... are too far of mtu limit, are you agree? And MTU problems shoud be -always-, I mean either it always works (make it the same package size) or it never works, don't you agree? and the problem is that there is a packet loss, completely random.

            If I remember correctly, the packet size limitation is a global limit in the IPSEC configuration and that if it were configured wrong with a P2 in "Tunnel" mode it should give the same problem.

            Thanks for your help.

            1 Reply Last reply Reply Quote 0
            • J
              Jose Carlos S @bbrendon
              last edited by

              @bbrendon Yes you are right, when it said routed it meant tunnel mode. I am with you in that P2 with VTI have always given problems, but offer many advantages, we want to configure it to activate OSPF and the truth is that it is something that seems that it has not yet been worked on. For example, when you restart the pfsense with VTI tunnels it takes a lot because in the boot process it configures the VTI interfaces before the rest of the interfaces and consequently before the IPSEC and it is something that I do not quite understand, really.

              We are going to deactivate the async crypto and tell you how we have done.

              Thanks for your help.

              1 Reply Last reply Reply Quote 0
              • P
                pete35
                last edited by

                It may be a dpinger problem. You may try to disable monitoring and transfer some real data and look for transfer problems.

                <a href="https://carsonlam.ca">bintang88</a>
                <a href="https://carsonlam.ca">slot88</a>

                J 1 Reply Last reply Reply Quote 0
                • J
                  Jose Carlos S @pete35
                  last edited by

                  @pete35 Manual ping inside VTI tunnel (from console)

                  PING 10.0.0.6 (10.0.0.6) from 10.0.0.5: 56 data bytes
                  64 bytes from 10.0.0.6: icmp_seq=0 ttl=64 time=8.693 ms
                  64 bytes from 10.0.0.6: icmp_seq=1 ttl=64 time=9.009 ms
                  64 bytes from 10.0.0.6: icmp_seq=2 ttl=64 time=8.533 ms
                  64 bytes from 10.0.0.6: icmp_seq=3 ttl=64 time=8.127 ms
                  64 bytes from 10.0.0.6: icmp_seq=4 ttl=64 time=8.372 ms
                  64 bytes from 10.0.0.6: icmp_seq=5 ttl=64 time=7.789 ms
                  64 bytes from 10.0.0.6: icmp_seq=6 ttl=64 time=9.185 ms
                  64 bytes from 10.0.0.6: icmp_seq=7 ttl=64 time=9.128 ms
                  64 bytes from 10.0.0.6: icmp_seq=8 ttl=64 time=8.564 ms
                  64 bytes from 10.0.0.6: icmp_seq=9 ttl=64 time=8.986 ms
                  64 bytes from 10.0.0.6: icmp_seq=10 ttl=64 time=8.397 ms
                  64 bytes from 10.0.0.6: icmp_seq=11 ttl=64 time=8.365 ms
                  64 bytes from 10.0.0.6: icmp_seq=12 ttl=64 time=8.565 ms
                  64 bytes from 10.0.0.6: icmp_seq=13 ttl=64 time=8.018 ms
                  64 bytes from 10.0.0.6: icmp_seq=14 ttl=64 time=8.314 ms
                  64 bytes from 10.0.0.6: icmp_seq=15 ttl=64 time=8.468 ms
                  64 bytes from 10.0.0.6: icmp_seq=16 ttl=64 time=7.918 ms
                  64 bytes from 10.0.0.6: icmp_seq=40 ttl=64 time=8.476 ms
                  64 bytes from 10.0.0.6: icmp_seq=42 ttl=64 time=8.745 ms
                  64 bytes from 10.0.0.6: icmp_seq=44 ttl=64 time=9.085 ms
                  64 bytes from 10.0.0.6: icmp_seq=45 ttl=64 time=8.446 ms
                  64 bytes from 10.0.0.6: icmp_seq=46 ttl=64 time=8.421 ms
                  64 bytes from 10.0.0.6: icmp_seq=47 ttl=64 time=7.987 ms
                  64 bytes from 10.0.0.6: icmp_seq=48 ttl=64 time=8.730 ms
                  64 bytes from 10.0.0.6: icmp_seq=49 ttl=64 time=8.659 ms
                  64 bytes from 10.0.0.6: icmp_seq=50 ttl=64 time=7.910 ms
                  64 bytes from 10.0.0.6: icmp_seq=51 ttl=64 time=8.086 ms
                  64 bytes from 10.0.0.6: icmp_seq=52 ttl=64 time=8.591 ms
                  64 bytes from 10.0.0.6: icmp_seq=53 ttl=64 time=8.262 ms
                  64 bytes from 10.0.0.6: icmp_seq=54 ttl=64 time=8.945 ms
                  64 bytes from 10.0.0.6: icmp_seq=55 ttl=64 time=9.079 ms
                  64 bytes from 10.0.0.6: icmp_seq=56 ttl=64 time=9.513 ms
                  64 bytes from 10.0.0.6: icmp_seq=57 ttl=64 time=9.241 ms
                  64 bytes from 10.0.0.6: icmp_seq=58 ttl=64 time=8.343 ms
                  64 bytes from 10.0.0.6: icmp_seq=65 ttl=64 time=8.575 ms
                  64 bytes from 10.0.0.6: icmp_seq=66 ttl=64 time=7.544 ms
                  64 bytes from 10.0.0.6: icmp_seq=67 ttl=64 time=7.647 ms
                  64 bytes from 10.0.0.6: icmp_seq=69 ttl=64 time=8.109 ms
                  64 bytes from 10.0.0.6: icmp_seq=70 ttl=64 time=8.974 ms
                  64 bytes from 10.0.0.6: icmp_seq=71 ttl=64 time=8.377 ms
                  64 bytes from 10.0.0.6: icmp_seq=72 ttl=64 time=8.489 ms
                  64 bytes from 10.0.0.6: icmp_seq=73 ttl=64 time=8.321 ms
                  64 bytes from 10.0.0.6: icmp_seq=74 ttl=64 time=8.157 ms
                  64 bytes from 10.0.0.6: icmp_seq=75 ttl=64 time=8.351 ms
                  64 bytes from 10.0.0.6: icmp_seq=76 ttl=64 time=7.531 ms

                  As you can see from the sequence number packets are lost.

                  But if you look at the latency of the previous packet and the later packet they are very similar, so I can assume that, rather than packet loss, there is packet discarding ...

                  1 Reply Last reply Reply Quote 0
                  • P
                    pete35
                    last edited by

                    Have you checked these:

                    'Disable hardware checksum offload',
                    'Disable hardware TCP segmentation offload'
                    'Disable hardware large receive offload'

                    all of them should be checked.

                    <a href="https://carsonlam.ca">bintang88</a>
                    <a href="https://carsonlam.ca">slot88</a>

                    J 1 Reply Last reply Reply Quote 0
                    • J
                      Jose Carlos S @pete35
                      last edited by

                      @pete35 Yes, everywhere

                      Captura de pantalla 2020-10-30 a las 17.08.26.png

                      1 Reply Last reply Reply Quote 0
                      • P
                        pete35
                        last edited by pete35

                        The packet loss in the ping log is for about 14 seconds... You disabled DPD ( try to configure it) .... is the tunnel renewed in that time ? Do you see multiple SA child entries for the tunnel at this time?

                        <a href="https://carsonlam.ca">bintang88</a>
                        <a href="https://carsonlam.ca">slot88</a>

                        J 2 Replies Last reply Reply Quote 0
                        • J
                          Jose Carlos S @pete35
                          last edited by

                          @pete35 Hi, we have also activated the DPD and we have the same problem.

                          As you know multiple SA child its "too normal" but sometime with P2 tunnel type we usualy view multiple SA childs.

                          Captura de pantalla 2020-10-30 a las 18.01.38.png

                          But, how can we fix this?

                          1 Reply Last reply Reply Quote 0
                          • J
                            Jose Carlos S @pete35
                            last edited by

                            @pete35 Timeouts:
                            P1 28800
                            P2 3600

                            1 Reply Last reply Reply Quote 0
                            • P
                              pete35
                              last edited by

                              It may be, that that frequent ( some seconds) rekeying causes packet loss.

                              You need to reconfigure the rekey/reauth parameters to overcome that:

                              Side 1: IKEv2, Rekey configured, Reauth disabled, child SA close action set to restart/reconnect
                              Side 2: IKEv2, Rekey configured, Reauth disabled, responder only set, child SA close action left at default (clear)

                              look at this: https://redmine.pfsense.org/issues/10176

                              <a href="https://carsonlam.ca">bintang88</a>
                              <a href="https://carsonlam.ca">slot88</a>

                              J 1 Reply Last reply Reply Quote 0
                              • P
                                pete35
                                last edited by

                                You wrote that you want to implement ospf with that vti tunnels. Be aware, that there are multiple issues with the ospf integration in pfsense. If you have less than 50 routes, the efforts to implement this isnt worth the outcome. And on top, there is no smooth sailing after that, because there are frequent interrupts in data traffic depending on instable tunnels, even on redundant routes.

                                <a href="https://carsonlam.ca">bintang88</a>
                                <a href="https://carsonlam.ca">slot88</a>

                                J 1 Reply Last reply Reply Quote 0
                                • J
                                  Jose Carlos S @pete35
                                  last edited by

                                  @pete35 Hi Pete, I have applied the parameters that you have commented and, for now, it continues to lose packages.

                                  Anyway, if it was a renegotiation problem it would happen every 3600 seconds, right? and the graph we have of loss is "practically" constant, don't you think?

                                  Look at the timeline

                                  Captura de pantalla 2020-10-30 a las 18.45.46.png

                                  1 Reply Last reply Reply Quote 0
                                  • J
                                    Jose Carlos S @pete35
                                    last edited by

                                    @pete35 I agree, this is the reason why I am looking for link stability, because if not implementing OSPF over unstable links, it might be the closest thing to hell...

                                    1 Reply Last reply Reply Quote 0
                                    • P
                                      pete35
                                      last edited by

                                      The timemarks on your multiple sa's is showing a reinstall /rekey every 20 seconds. 3600 seconds would be fine, but it doesnt do that. Please check your SA situation again.

                                      <a href="https://carsonlam.ca">bintang88</a>
                                      <a href="https://carsonlam.ca">slot88</a>

                                      J 1 Reply Last reply Reply Quote 0
                                      • P
                                        pete35
                                        last edited by

                                        About OSPF, not only unstable links are causing interrupts, even simple adding or removing a route is causing an interrupt in the whole OSPF system, all pfsense routing devices will renew their routing table per restart ... , not only the OSPF routes will dissappear also any other routes in tne routing table. Interrupt time depends on settings but around 30 seconds is quite usual. If your applications can survive that, you are lucky.

                                        <a href="https://carsonlam.ca">bintang88</a>
                                        <a href="https://carsonlam.ca">slot88</a>

                                        J 1 Reply Last reply Reply Quote 0
                                        • J
                                          Jose Carlos S @pete35
                                          last edited by

                                          @pete35 Perfect

                                          Captura de pantalla 2020-10-30 a las 19.35.16.png

                                          1 Reply Last reply Reply Quote 0
                                          • J
                                            Jose Carlos S @pete35
                                            last edited by

                                            @pete35 Yes, this is a issue, our intention, to minimize these situations is to increase the number of areas and play with death times and execution timeouts in order to minimize this problems.

                                            The main objective of the project is to offer high availability between two IPSEC tunnels, what we would like to do is have one IPSEC on WAN1 and another IPSEC (with the same destination) on WAN2. There are more reasons why we would like to implement dynamic routes, but it could be a separate topic.

                                            Do you recommend another solution?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.