Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    FRR seeing IPsec tunnels disappearing

    Scheduled Pinned Locked Moved General pfSense Questions
    29 Posts 5 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      michmoor LAYER 8 Rebel Alliance
      last edited by michmoor

      I have an IPsec VTI deployment with FRR (bgp) running.
      Everything works as expected but i have been noticing that all my IPsec tunnels that have BGP running fail at the exact same time.

      1. These are tunnels to diverse locations
      2. Gateway monitoring is healthy so don't believe its any local issue.
      3. As an experiment i switched one IPsec tunnel to Wireguard and when there is a routing flap, only the IPsec tunnels are impacted so now i believe its something specifically to that.

      Combing through the system logs something stuck out
      [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF ipsec5 in VRF 0
      [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF ipsec4 in VRF 0
      [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF ipsec3 in VRF 0

      There is no specific timing that i can see (assuming there's some scheduled clean up job).
      9124541a-608b-47bc-bbeb-6775e4d62be2-image.png

      I have a suspicion that IPsec (charon) is adjusting/re-applying/removing and adding the IPsec configuration as this seems to relate to redmine 14483 but would need additional assistance in troubleshooting but i think this is closely related. The only difference between this situation and the redmine is that the redmine is reproducible by simply making ANY changes in ipsec(eg update description) which would cause a traffic outage for all ipsec tunnels where as this requires no administrator involvement so not sure how to reproduce.

      If so, it makes sense why Wireguard tunnels with FRR are never impacted.

      edit: The IPsec implementation is deeply broken on some level but I'm surprised its never been noticed until i opened up the Redmine. These types of issues shouldn't be possible, but it is scary that today's documented issue requires no changes in IPsec configuration to trigger the outage.

      Firewall: NetGate,Palo Alto-VM,Juniper SRX
      Routing: Juniper, Arista, Cisco
      Switching: Juniper, Arista, Cisco
      Wireless: Unifi, Aruba IAP
      JNCIP,CCNP Enterprise

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Nothing is logged at that time in the IPSec or System logs?

        If the interfaces really do appear to go AWOL to FRR I'd expevct to see something. And removing the interfaces entirely it would be something significant like IPSec restarting.

        M 1 Reply Last reply Reply Quote 0
        • M
          michmoor LAYER 8 Rebel Alliance @stephenw10
          last edited by

          @stephenw10
          Nothing is sticking out in the IPsec logs. What should i change in the logging to see more? Theres a lot of options but not sure which ones would be relevant

          Firewall: NetGate,Palo Alto-VM,Juniper SRX
          Routing: Juniper, Arista, Cisco
          Switching: Juniper, Arista, Cisco
          Wireless: Unifi, Aruba IAP
          JNCIP,CCNP Enterprise

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Anything that would cause the interface to disappear should be shown by the default IPSec log settings.

            I'd expect it to be in the system log too. Nothing else in the routing log either? Just FRR suddenly unable to find the interfaces?

            M 1 Reply Last reply Reply Quote 0
            • M
              michmoor LAYER 8 Rebel Alliance @stephenw10
              last edited by

              @stephenw10

              IPsec acknowledges that the interface went away.

              4af1fa54-5c14-4726-bae2-a892ebbd328f-image.png

              Its even saying that it disappeared

              da7957c7-b257-454f-8867-e6a3dc4659b7-image.png

              Deactivated and Disappeared are red flags to me.

              Firewall: NetGate,Palo Alto-VM,Juniper SRX
              Routing: Juniper, Arista, Cisco
              Switching: Juniper, Arista, Cisco
              Wireless: Unifi, Aruba IAP
              JNCIP,CCNP Enterprise

              M 1 Reply Last reply Reply Quote 0
              • M
                michmoor LAYER 8 Rebel Alliance @michmoor
                last edited by michmoor

                @stephenw10

                I think i found the restart event with cause.
                ipsecdns

                a22ee80f-60d7-4a15-8d39-c2ac243e2771-image.png

                edit: Oh man....so i do have a few IPsec tunnels using the DNS name of a remote gateway instead of an IPv4 address. My theory is that the IP changes, the ipsecdns process picks it up, and restarts all tunnels. I really hope that's not the case but if so that's really bad.

                Firewall: NetGate,Palo Alto-VM,Juniper SRX
                Routing: Juniper, Arista, Cisco
                Switching: Juniper, Arista, Cisco
                Wireless: Unifi, Aruba IAP
                JNCIP,CCNP Enterprise

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Mmm, indeed. IIRC there's something specific about the way FRR interacts with it there.

                  Looks like it may be related to this:
                  https://redmine.pfsense.org/issues/10503

                  Though in your case no gateway actually goes down?

                  M 1 Reply Last reply Reply Quote 0
                  • M
                    michmoor LAYER 8 Rebel Alliance @stephenw10
                    last edited by

                    @stephenw10 No gateways go down.

                    The incident happened just now and whats good is that i now know what to look for.

                    7c582b6e-cab2-43c1-b1d6-a6a979528a6e-image.png

                    Is there a way to find out which gateway is changing its IP.

                    Also should i open a redmine?

                    Firewall: NetGate,Palo Alto-VM,Juniper SRX
                    Routing: Juniper, Arista, Cisco
                    Switching: Juniper, Arista, Cisco
                    Wireless: Unifi, Aruba IAP
                    JNCIP,CCNP Enterprise

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, if it's actually a gateway I'd expect to see that logged there and in the gateways log.

                      If it's just a remote IPSec node that changed IP that's probably in the resolver log.

                      M 1 Reply Last reply Reply Quote 0
                      • M
                        michmoor LAYER 8 Rebel Alliance @stephenw10
                        last edited by

                        @stephenw10

                        The only times in the gateway log are the following. Packet loss but nothing seen showing a complete loss. Considering all VPN tunnels bounce, these error messages make sense.

                        b1e8de8c-8332-457c-ba05-c9e040afd8c5-image.png

                        Resolver log shows nothing useful. I see pfsense checking local cache for DNS but i don't see any related errors

                        Firewall: NetGate,Palo Alto-VM,Juniper SRX
                        Routing: Juniper, Arista, Cisco
                        Switching: Juniper, Arista, Cisco
                        Wireless: Unifi, Aruba IAP
                        JNCIP,CCNP Enterprise

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Hmm, it does seem to have triggered something at 14:19:13 though. Was there anything in the system log leading up to that? I can just about see there was a newipsecdns call then.

                          M 1 Reply Last reply Reply Quote 0
                          • M
                            michmoor LAYER 8 Rebel Alliance @stephenw10
                            last edited by

                            @stephenw10 Yep a restart event

                            2e51a1dc-e6d5-4d16-875b-05b528046a90-image.png

                            Firewall: NetGate,Palo Alto-VM,Juniper SRX
                            Routing: Juniper, Arista, Cisco
                            Switching: Juniper, Arista, Cisco
                            Wireless: Unifi, Aruba IAP
                            JNCIP,CCNP Enterprise

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Hmm, so in both cases the first thing logged is 'Restarting IPsec tunnels' ?

                              That would normally be triggered by something else. Were any tunnels being renewed at that point?

                              M 1 Reply Last reply Reply Quote 0
                              • M
                                michmoor LAYER 8 Rebel Alliance @stephenw10
                                last edited by

                                @stephenw10 That is correct, that is the first thing logged.

                                Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                Routing: Juniper, Arista, Cisco
                                Switching: Juniper, Arista, Cisco
                                Wireless: Unifi, Aruba IAP
                                JNCIP,CCNP Enterprise

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Is it possible that coincided with the renew time for the tunnel using an FQDN remote endpoint?

                                  M 1 Reply Last reply Reply Quote 0
                                  • M
                                    michmoor LAYER 8 Rebel Alliance @stephenw10
                                    last edited by michmoor

                                    @stephenw10

                                    I believe it does. For both incidents. Even though the time between a change of IP and the restart are a few minutes apart so it doesn't seem to occur right away.

                                    Incident one. Time of restart event was around 09:38

                                    ./pfblockerng/dns_reply.log:DNS-reply,Oct 7 09:32:25,resolver,A,A,300,vpn.server4u.in,127.0.0.1,124.123.66.69,IN
                                    ./pfblockerng/dns_reply.log:DNS-reply,Oct 7 09:37:25,resolver,A,A,300,vpn.server4u.in,127.0.0.1,103.127.188.125,IN
                                    ./pfblockerng/dns_reply.log:DNS-reply,Oct 7 09:42:41,resolver,A,A,300,vpn.server4u.in,127.0.0.1,124.123.66.69,IN
                                    

                                    Incident two: 14:18

                                    ./dns_reply.log:DNS-reply,Oct 7 14:14:03,resolver,A,A,300,vpn.networkzz.co.in,127.0.0.1,210.89.55.63,IN  <---
                                    ./dns_reply.log:DNS-reply,Oct 7 14:18:33,resolver,A,A,300,vpn.networkzz.co.in,127.0.0.1,202.88.209.151,IN
                                    

                                    Im happy that we found something that is reproducable.

                                    Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                    Routing: Juniper, Arista, Cisco
                                    Switching: Juniper, Arista, Cisco
                                    Wireless: Unifi, Aruba IAP
                                    JNCIP,CCNP Enterprise

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Hmm, OK so did those endpoints actually change? Are they FQDNs that resolve to several IPs?

                                      I'd guess there is some timeout there that has to add-up over those 4mins.

                                      Either way I agree it should not affect all IPSec tunnels.

                                      M 1 Reply Last reply Reply Quote 0
                                      • M
                                        michmoor LAYER 8 Rebel Alliance @stephenw10
                                        last edited by

                                        @stephenw10 said in FRR seeing IPsec tunnels disappearing:

                                        Hmm, OK so did those endpoints actually change? Are they FQDNs that resolve to several IPs?

                                        Yep those endpoints do resolve to several IPs. One of those i know for sure because i remember the set up for that recently.

                                        I did open the redmine for it for tracking purposes. Dont think there is any workaround for this other than getting into the weeds of how IPsec is configured/built

                                        Firewall: NetGate,Palo Alto-VM,Juniper SRX
                                        Routing: Juniper, Arista, Cisco
                                        Switching: Juniper, Arista, Cisco
                                        Wireless: Unifi, Aruba IAP
                                        JNCIP,CCNP Enterprise

                                        T 1 Reply Last reply Reply Quote 0
                                        • T
                                          tedquade @michmoor
                                          last edited by

                                          @michmoor "...getting into the weeds..."

                                          I love that!

                                          Ted

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            Hmm, do they all resolve IPs? Conversely do you have any that only resolve to one IP that doesn't cause this?

                                            Like is this being triggered because it's resolving a different IP address everytime or just because it is re-resolving at all?

                                            M 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.