Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    IPSec Down after Upgrade to 2.3

    Scheduled Pinned Locked Moved IPsec
    72 Posts 30 Posters 42.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • O
      obrienmd
      last edited by

      @owczi:

      …OK, some progress.

      ...

      Started ipsec via GUI which re-generated the configs, started openbgpd. Guess what - tunnels came back up, I can see SADs and SPDs again, and some of the BGP sessions are up again (those to Cisco, funny enough). I have now rebooted all pfSense instances and will see how long they will last.

      Thanks,
      owczi

      Still up?

      1 Reply Last reply Reply Quote 0
      • O
        owczi
        last edited by

        @obrienmd:

        Still up?

        Nope - shat itself after about 24 hours. HOWEVER, I don't have to reboot to get the tunnels and BGP sessions back up. The setting I added to charon config may not have anything to do with it. I will keep trying various combinations to get a sensible answer: on some of the pfSense instances I did not have to restart IPSec at all, only bgpd, but it could have been that they had BGP down because of the other peers, and an ipsec restart is still required. I have no time to investigate right now.

        Basically:```
        /usr/local/etc/rc.d/bgpd.sh stop; ipsec stop; sleep 1; ipsec start; sleep 2; /usr/local/etc/rc.d/bgpd.sh start

        
        I need to write a monitoring script that will do this when all tunnels go down. For now I will just make it a cron job every few hours, maybe even every hour. offset so it doesn't happen on all instances at the same time. This will at least keep me going.
        1 Reply Last reply Reply Quote 0
        • O
          owczi
          last edited by

          EDIT: full path for ipsec - required when invoked from cron; do not reset ipsec / bgpd if there are no connections.
          EDIT2: fixed to correctly pick up connections when nothing is up and check for buffer errors
          Crude as can be, but will do the job… I run this every 5 minutes via a cron job:

          
          #!/bin/sh
          estabcount=0
          p2count=0
          totalcount=0
          buferr=0
          
          bounceall() {
          /usr/local/etc/rc.d/bgpd.sh stop
          sleep 1
          $ipsecpath stop
          sleep 1
          $ipsecpath start
          sleep 3
          /usr/local/etc/rc.d/bgpd.sh start
          }
          
          ipsecpath=/usr/local/sbin/ipsec
          
          echo "=== started at `date` ==="
          
          for con in `$ipsecpath status | grep "\[" | sed 's/\[.*//g' | sort | uniq` ; do 
          echo $con
          estab=0
          p2=0
          
          $ipsecpath status $con | grep ESTAB >/dev/null 2>&1 && estab=1
          $ipsecpath status $con | grep INSTALLED >/dev/null 2>&1 && p2=1
          
          [ $estab -eq 1 ] && { 
          	echo $con p1 up
          	estabcount=$(( $estabcount + 1 ))
          	[ $p2 -eq 0 ] && {
          	 	echo $con p2 down, restarting
                          echo stopping $con...
          		$ipsecpath down $con >/dev/null 2>&1
          		sleep 1
                          echo starting $con...
          		$ipsecpath up $con | grep error | grep "buffer space" >/dev/null 2>&1  && { echo "PF_KEY buffer error while starting $con"; buferr=$(( $buferr + 1 )); }
          	}
          
          }
          [ $estab -eq 0 ] && { echo $con p1 down; }
          [ $p2 -eq 1 ] && { echo $con p2 up; p2count=$(( $p2count + 1 )); }
          totalcount=$(( $totalcount + 1 ))
          done
          
          echo
          echo ===
          echo estab $estabcount / $totalcount
          echo p2 $p2count / $totalcount
          echo buf_err $buferr / $totalcount
          echo ===
          echo
          
          [ $totalcount -gt 0 ] && [ $buferr -gt 0 ] && {
          echo $buferr connections show buffer space errors - bouncing openbgpd and ipsec
          bounceall
          exit
          }
          
          [ $totalcount -gt 0 ] && [ $estabcount -eq 0 ] && {
          echo no connections have p1 up - bouncing openbgpd and ipsec
          bounceall
          exit
          }
          
          [ $totalcount -gt 0 ] && [ $estabcount -eq $totalcount ] && [ $p2count -eq 0 ] && {
          echo all connections have p1 up but no connections have p2 up - bouncing openbgpd and ipsec
          bounceall
          exit
          }
          
          

          It will bounce all tunnels which have phase 2 down, and if no tunnels have p1 it will bounce ipsec and bgpd. We'll see how long this will last.

          1 Reply Last reply Reply Quote 0
          • O
            obrienmd
            last edited by

            How has this worked for you?

            1 Reply Last reply Reply Quote 0
            • O
              owczi
              last edited by

              I have just edited the post with the latest version of the script. It didn't work when everything was down, and didn't correctly react to the situation where all P1 is up but no P2 is up. With these modifications This has been working fine for me for the past two weeks or so. It still crashes, but at least it recovers now.

              So: The charon setting I added does not fix the problem, but allows you to recover by stopping bgpd, stopping ipsec, starting ipsec and then starting bgpd again - which did not seem possible before. Better than nothing…

              1 Reply Last reply Reply Quote 0
              • O
                owczi
                last edited by

                Still going strong - up to 7 days of BGP session uptime, after which IPSec and BGP get restarted once charon starts showing the buffer space errors while attepting to restart tunnels. My setup has got redundancy so this is reliable; obviously a single connection will be impacted, albeit not for long.

                EDIT: Some two months later and I have pretty much forgotten about IPSec + OpenBGPD issues. Tunnels occasionally die, but with redundancy I have not experienced a single outage since putting my solution in place. BGP session duration varies, but it is quietly restored whenever the buffer problem rears its ugly head.

                1 Reply Last reply Reply Quote 0
                • O
                  obrienmd
                  last edited by

                  Has anyone tried to repro with 2.4 beta?

                  1 Reply Last reply Reply Quote 0
                  • T
                    tmuxr
                    last edited by

                    I believe I have the same / similar problem regarding strongswan + openbgpd for AWS VPC VPNs.

                    I saw the discusion on kernel tunables and the link to Redmine bug #6223.

                    I saw owcz1's bash script. It sounds like an effective workaround but not an actual problem resolution.

                    I have not yet implemented it, but will give it a close look.

                    My status is as follows:

                    First time pfSense user

                    Installed on one appliance this week from pfSense-CE-memstick-serial-2.3.2-RELEASE-amd64 image

                    Appliance has a static public IP on WAN and a static RFC1918 IP on LAN

                    Stood up two strongswan tunnels to AWS VPC per their instructions http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpn-connections.html

                    Installed openbgpd, configured a neighbor group with two neighbors for AWS routes; also configured a third neighbor on LAN segment for internal routing

                    With only one tunnel is enabled, service starts properly and BGP routes propagate.

                    Second tunnel subsequently enabled (and changes applied): Tunnel comes up, remote endpoint pingable, BGP looks good

                    With both tunnels enabled, service does not start properly. Remote endpoints not pingable, BGP routes not propagated

                    Seems to go into the weeds when left alone. When I logged out yesterday both tunnels were working, and when I logged in today they were not.

                    Google search on charon "error sending to PF_KEY socket: No buffer space available" log event led to this thread.

                    Have already verified the code fix is present in /etc/inc/vpn.inc which was committed by Chris Buechler committed on Apr 14.

                    If I am unable to stabilize this deployment via the documented workaround, I will be available to collaborate.

                    Any time or assistance the community can offer will be greatly appreciated.

                    Regards,

                    1 Reply Last reply Reply Quote 0
                    • M
                      mraymond
                      last edited by

                      Looks like the pfsense OpenBGPD package has an update available that showed up within the past couple months. (0.11_7)
                      Does anyone know if this resolves the PF_KEY buffer issue? I haven't tried it.

                      1 Reply Last reply Reply Quote 0
                      • F
                        frgiaws
                        last edited by

                        @mraymond:

                        Looks like the pfsense OpenBGPD package has an update available that showed up within the past couple months. (0.11_7)
                        Does anyone know if this resolves the PF_KEY buffer issue? I haven't tried it.

                        Running 0.11_9 of OpenBGPD and experience the same problem, not updated pfsense to 2.3.2_1 (Still on 2.3) but

                        kernel-pfkey {

                        events_buffer_size = 1048576

                        }

                        In charon plugins in /etc/inc/vpn.inc

                        Gives me the ability to stop&start to resume the tunnel via restarts of ipsec and openbgpd.

                        With 2.4.0 (FreeBSD pfSense.localdomain 11.0-RELEASE-p7 FreeBSD 11.0-RELEASE-p7 #29 6317ca7fc42(RELENG_2_4): Sun Feb 12 08:31:05 CST 2017    root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense  amd64)

                        It's even worse, much more unstable and openbgpd dies constantly, doesn't require intervention but BGP sessions just die

                        169.254.41.77 = AWS BGP peer, reachable via IPSec

                        Feb 12 17:21:46 bgpd 72880 neighbor 169.254.41.77 (VPC): state change Connect -> OpenSent, reason: Connection opened
                        Feb 12 17:21:46 bgpd 72880 neighbor 169.254.41.77 (VPC): state change OpenSent -> OpenConfirm, reason: OPEN message received
                        Feb 12 17:21:46 bgpd 72880 neighbor 169.254.41.77 (VPC): state change OpenConfirm -> Established, reason: KEEPALIVE message received
                        Feb 12 17:21:47 bgpd 72601 nexthop 169.254.41.77 now valid: via XXX.XXX.XXX.XXX (my public IP)
                        Feb 12 17:22:26 bgpd 72880 neighbor 169.254.41.77 (VPC): write error: Permission denied
                        Feb 12 17:22:26 bgpd 72880 neighbor 169.254.41.77 (VPC): state change Established -> Idle, reason: Fatal error
                        Feb 12 17:23:05 bgpd 72611 route decision engine exiting
                        Feb 12 17:23:05 bgpd 72880 session engine exiting
                        Feb 12 17:23:05 bgpd 72601 dispatch_imsg in main: pipe closed
                        Feb 12 17:23:05 bgpd 72601 dispatch_imsg in main: pipe closed
                        Feb 12 17:23:05 bgpd 72601 Lost child: session engine exited
                        Feb 12 17:23:05 bgpd 72601 Lost child: route decision engine exited
                        Feb 12 17:23:05 bgpd 72601 kernel routing table 0 (Loc-RIB) decoupled
                        Feb 12 17:23:05 bgpd 72601 Terminating

                        [2.4.0-BETA][admin@pfSense.localdomain]/usr/local/etc/rc.d: /usr/local/sbin/bgpd -d -f /var/etc/openbgpd/bgpd.conf -v
                        startup
                        rereading config
                        route decision engine ready
                        new ktable rdomain_0 for rtableid 0
                        RDE reconfigured
                        session engine ready
                        listening on 169.254.41.78
                        SE reconfigured
                        neighbor 169.254.41.77 (VPC): state change None -> Idle, reason: None
                        neighbor 169.254.41.77 (VPC): state change Idle -> Connect, reason: Start
                        neighbor 169.254.41.77 (VPC): state change Connect -> OpenSent, reason: Connection opened
                        neighbor 169.254.41.77 (VPC): state change OpenSent -> OpenConfirm, reason: OPEN message received
                        neighbor 169.254.41.77 (VPC): state change OpenConfirm -> Established, reason: KEEPALIVE message received
                        nexthop 169.254.41.77 now valid: via XXX.XXX.XXX.XXX
                        Traffic stops here
                        neighbor 169.254.41.77 (VPC): write error: Permission denied
                        neighbor 169.254.41.77 (VPC): state change Established -> Idle, reason: Fatal error
                        neighbor 169.254.41.77 (VPC): state change Idle -> Connect, reason: Start
                        neighbor 169.254.41.77 (VPC): state change Connect -> OpenSent, reason: Connection opened
                        neighbor 169.254.41.77 (VPC): state change OpenSent -> OpenConfirm, reason: OPEN message received
                        neighbor 169.254.41.77 (VPC): state change OpenConfirm -> Established, reason: KEEPALIVE message received
                        nexthop 169.254.41.77 now valid: via XXX.XXX.XXX.XXX

                        1 Reply Last reply Reply Quote 0
                        • A
                          aarodynamics
                          last edited by

                          Would the pfSense developers possibly consider moving to Quagga instead of OpenBGPd?

                          It seems like Quagga with a proper pfSense GUI front-end would solve two problems:

                          1. It would solve this current instability/conflict with strongSwan
                          2. Quagga would allow for simultaneous use of OSPF & BGP

                          From 50k foot perspective, this seems like the best choice for the platform.

                          1 Reply Last reply Reply Quote 0
                          • RuddimasterR
                            Ruddimaster
                            last edited by

                            does this problem still exist in 2.3.4 p1?

                            1 Reply Last reply Reply Quote 0
                            • B
                              BEB Consulting
                              last edited by

                              For me yes it does…

                              https://forum.pfsense.org/index.php?topic=134925

                              1 Reply Last reply Reply Quote 0
                              • T
                                TimmZahn
                                last edited by

                                owczi - did you ever find a more elegant solution to the issue, or are you still running this script via cron?

                                1 Reply Last reply Reply Quote 0
                                • C
                                  cradulescu
                                  last edited by cradulescu

                                  @timmzahn said in IPSec Down after Upgrade to 2.3:

                                  ou ever find a more elegant solution to the issue, or are you sti

                                  I know this topic is old, but since I found it via google I will post my solution.

                                  I did replace OpenBGP with FrrBGP. I have been able to restore my IPSEC tunneling with AWS and also use the BGP services on PfSense for my needs.

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.