IPSec Down after Upgrade to 2.3



  • @obrienmd:

    Same issue, pair of SG-8860s with CARP failover, dual IPSec tunnels to a Verizon Private Network with OpenBGPd required for their routing. Exact same errors, even changing both tunables:

    net.inet.raw.maxdgram="131072"
    net.inet.raw.recvspace="131072"

    It's not just those two. Add:

    net.raw.recvspace=65535
    net.raw.sendspace=65535



  • FWIW, still seeing this problem here.  Yesterday I updated to 2.3.1 and also set these:

    @cmb:

    net.inet.raw.maxdgram="131072"
    net.inet.raw.recvspace="131072"
    net.raw.recvspace=65535
    net.raw.sendspace=65535

    I just bumped those up higher hoping it will help, but at least for us neither the 2.3.1 update nor those specific values fixed it.  Does it matter if they're set at System > Advanced > System Tunables rather than in loader.config.local?



  • We've now been up for over a week with these settings (set in System > Advanced > System Tunables):

    net.inet.raw.maxdgram 131072
    net.inet.raw.recvspace 1048576
    net.raw.recvspace 1048576
    net.raw.sendspace 1048576

    Edit:  up over 2 weeks now, still no problem



  • Hi I'm new here and have a problem with my PFSense and the IPsec connection .

    The environment :
    Location A pfsense 2.3.1_1
    Location B pfsense 2.3.1_1

    Connected via IPSec " SitetoSite "

    I tried all the tips from this thread. Unfortunately without success.

    Like
    changeing net.inet.raw.maxdgram  131072 
    net.inet.raw.recvspace  1048576 
    net.raw.recvspace  1048576 
    net.raw.sendspace  1048576

    The problem is when I try to access Site B about RMTC works without problems .
    However, if I want to print a print job from B to site A drops the connection and restarts.

    Does somebody has any idea ?

    I'm a bit desperate .

    Thank you very much

    I Forget to say that it works perfect before i updatet my pfsense …



  • Hi it´s me again, i tryed to use OPENVPN instead of IPSEC
    I have the same Problem and my PFSENSE reboot new after 2 min.

    Does anyone know this situation ?



  • Hello everybody!

    I have read that thread but unfortunately I have the same issue. We use PfSense 2.3.1 with OpenBGPD+IPsec to Amazon AWS.

    We have set that:

    
    net.inet.raw.maxdgram="131072"
    net.inet.raw.recvspace="131072"
    net.raw.recvspace=65535
    net.raw.sendspace=65535
    
    

    Our IPsec disconnect every couple hours. When I check IPsec status - looks ok, but I can not transfer any packets. I don't have to reboot Firewalls but only stop OpenBGPD and IPsec. Start again and all is working again ok for next couple of hours.

    Do you have any idea what I can check more? I didn't check that fix from GitHub. But do you think it could be it?

    Thank you for any help or answer.

    Best,
    Kamyk



  • I run a couple of pfsense boxes to link my house to a few neighbors (so hardly mission critical).
    Since the upgrade to 2.3, then 2.3.1, then 2.3.1p1, my IPsec tunnels haven't worked.
    I don't run OpenBGP (at least I don't think I do) and I tried applying the System Tuneables that jnorell suggested.
    I also tried purging all my VPN configurations, and recreating them. Still no love :(
    What's odd (at least to me), is that all the tunnels come up in the web interface, but they don't pass traffic.
    It's not the end of the world, as I moved to OpenVPN in the interim, however I'd prefer to get back to IPsec.
    Thanks in advance



  • @Kamyk:

    I have read that thread but unfortunately I have the same issue. We use PfSense 2.3.1 with OpenBGPD+IPsec to Amazon AWS.

    Known issue: https://redmine.pfsense.org/issues/6223



  • @olobley:

    Since the upgrade to 2.3, then 2.3.1, then 2.3.1p1, my IPsec tunnels haven't worked.

    What's odd (at least to me), is that all the tunnels come up in the web interface, but they don't pass traffic.

    It sounds like you have a different problem (try enabling cisco extentions in ipsec advanced settings), this one is indicated by 'No buffer space available' errors in the logs.



  • Today, after almost 29 days uptime, we're getting 'error sending to PF_KEY socket: No buffer space available' again .. I'm bumping settings up some more:

    
    net.inet.raw.maxdgram = 131072
    net.inet.raw.recvspace = 1048576
    net.raw.recvspace = 1048576
    net.raw.sendspace = 2097152
    
    


  • Hello, new to this forum. Just throwing my hat in the ring for this issue as well. Plagued by "error sending to PF_KEY socket: No buffer space available".
    I'm using three IPsec tunnels. One to AWS (with BGP), one to Azure, one to a mikrotik router at a remote office.

    Is there a way to effectively restart IPsec and flush that buffer without rebooting?
    Restarting the service via the GUI, or manually killing charon and starter and restarting ipsec via terminal does not do it.

    EDIT: Of course I should mention this problem started happening after upgrading from 2.2.(6?) to 2.3.1_1
    I have increased
    net.inet.raw.maxdgram
    net.inet.raw.recvspace
    net.raw.recvspace
    net.raw.sendspace

    to recommended values, but have not rebooted since. I will reboot late tonight.



  • Same issue with upgrade to 2.3.1_5, any idea if this will be resolved in 2.3.2 or 2.4.x (FreeBSD 11, right?)



  • Not something we're going to have time for in 2.3.2 (release next week), hopefully it's either resolved already in FreeBSD 11, so 2.4 will be fine, or someone can track down the root cause and get it fixed (my last day here is in two weeks).

    2.4 snapshots should be out soon. Help testing then would be appreciated.



  • i face the same problem when i try to establish a new ipsec site to site vpn between 2 branches with a pfsense with a firmware 2.2.6. I solved that by adding
    on the phase 1 proposal (authentication ) the real ip of my peer as it was behind the a nat

    My identifier ===== choose Ip address ======= then put your real ip address

    and on the Peer Identifier you should put the private ip of the other side if he do the same

    Peer identifier ======== ip address =========then put your private ip address



  • @cmb:

    Not something we're going to have time for in 2.3.2 (release next week), hopefully it's either resolved already in FreeBSD 11, so 2.4 will be fine, or someone can track down the root cause and get it fixed (my last day here is in two weeks).

    2.4 snapshots should be out soon. Help testing then would be appreciated.

    Sure thing, will test as soon as 2.4 snapshots are available! Good luck on your next adventure, and thanks for all the hard work on pfSense :)



  • Is anybody aware of any progress on this? Bumping the buffer sizes only extends the issue from a few hours to about two days but that is it.

    Also is there any news regarding the root cause? I am struggling to understand the interaction between IPSec (+GRE) and OpenBGPd. Surely the same would happen with any TCP-based application, or is it something that OpenBGPd specifically repeatedly calls on the sockets that causes IPSec to eventually die?

    I run a number of tunnels with IPSec + GRE + BGP (pfSense to pfSense and pfSense to Cisco) and since 2.1 they were never really stable. All the way up to 2.3 I had to monitor the GRE tunnels and bounce them after any IPSec re-key or tunnel flap because OpenBGPd was seeing them as invalid next hops. This went away in 2.3, but now IPSec is basically unusable. Doesn't matter why, it makes for an incomplete product. Nobody really runs static routing over non-trivial topologies, and with non-functional BGP, IPSec is only usable for mobile clients. I'm going to give BIRD a try - and migrating the whole network to OSPF is not really an option here, although I will consider it.

    Failing that, after many years with pfSense I am going to start looking for alternatives. pfSense is a fantastic platform, and thanks for all the hard work guys, but constant IPSec issues have just about killed it for me.



  • …OK, some progress.

    Having looked up the PF_KEY rcvbuf error got me a change and a setting introduced in StrongSWAN 5.3.0 where the event socket buffer can be tuned.

    Once all IPSec tunnels were dead, I stopped ipsec, stopped openbgpd, then I opened /etc/inc/vpn.inc, searched for the charon { plugins { section and added the following:

    ....
    
                    kernel-pfkey {
    
                            events_buffer_size = 1048576
    
                    }
    
    

    Started ipsec via GUI which re-generated the configs, started openbgpd. Guess what - tunnels came back up, I can see SADs and SPDs again, and some of the BGP sessions are up again (those to Cisco, funny enough). I have now rebooted all pfSense instances and will see how long they will last.

    Thanks,
    owczi



  • @owczi:

    …OK, some progress.

    ...

    Started ipsec via GUI which re-generated the configs, started openbgpd. Guess what - tunnels came back up, I can see SADs and SPDs again, and some of the BGP sessions are up again (those to Cisco, funny enough). I have now rebooted all pfSense instances and will see how long they will last.

    Thanks,
    owczi

    Still up?



  • @obrienmd:

    Still up?

    Nope - shat itself after about 24 hours. HOWEVER, I don't have to reboot to get the tunnels and BGP sessions back up. The setting I added to charon config may not have anything to do with it. I will keep trying various combinations to get a sensible answer: on some of the pfSense instances I did not have to restart IPSec at all, only bgpd, but it could have been that they had BGP down because of the other peers, and an ipsec restart is still required. I have no time to investigate right now.

    Basically:```
    /usr/local/etc/rc.d/bgpd.sh stop; ipsec stop; sleep 1; ipsec start; sleep 2; /usr/local/etc/rc.d/bgpd.sh start

    
    I need to write a monitoring script that will do this when all tunnels go down. For now I will just make it a cron job every few hours, maybe even every hour. offset so it doesn't happen on all instances at the same time. This will at least keep me going.


  • EDIT: full path for ipsec - required when invoked from cron; do not reset ipsec / bgpd if there are no connections.
    EDIT2: fixed to correctly pick up connections when nothing is up and check for buffer errors
    Crude as can be, but will do the job… I run this every 5 minutes via a cron job:

    
    #!/bin/sh
    estabcount=0
    p2count=0
    totalcount=0
    buferr=0
    
    bounceall() {
    /usr/local/etc/rc.d/bgpd.sh stop
    sleep 1
    $ipsecpath stop
    sleep 1
    $ipsecpath start
    sleep 3
    /usr/local/etc/rc.d/bgpd.sh start
    }
    
    ipsecpath=/usr/local/sbin/ipsec
    
    echo "=== started at `date` ==="
    
    for con in `$ipsecpath status | grep "[" | sed 's/[.*//g' | sort | uniq` ; do 
    echo $con
    estab=0
    p2=0
    
    $ipsecpath status $con | grep ESTAB >/dev/null 2>&1 && estab=1
    $ipsecpath status $con | grep INSTALLED >/dev/null 2>&1 && p2=1
    
    [ $estab -eq 1 ] && { 
    	echo $con p1 up
    	estabcount=$(( $estabcount + 1 ))
    	[ $p2 -eq 0 ] && {
    	 	echo $con p2 down, restarting
                    echo stopping $con...
    		$ipsecpath down $con >/dev/null 2>&1
    		sleep 1
                    echo starting $con...
    		$ipsecpath up $con | grep error | grep "buffer space" >/dev/null 2>&1  && { echo "PF_KEY buffer error while starting $con"; buferr=$(( $buferr + 1 )); }
    	}
    
    }
    [ $estab -eq 0 ] && { echo $con p1 down; }
    [ $p2 -eq 1 ] && { echo $con p2 up; p2count=$(( $p2count + 1 )); }
    totalcount=$(( $totalcount + 1 ))
    done
    
    echo
    echo ===
    echo estab $estabcount / $totalcount
    echo p2 $p2count / $totalcount
    echo buf_err $buferr / $totalcount
    echo ===
    echo
    
    [ $totalcount -gt 0 ] && [ $buferr -gt 0 ] && {
    echo $buferr connections show buffer space errors - bouncing openbgpd and ipsec
    bounceall
    exit
    }
    
    [ $totalcount -gt 0 ] && [ $estabcount -eq 0 ] && {
    echo no connections have p1 up - bouncing openbgpd and ipsec
    bounceall
    exit
    }
    
    [ $totalcount -gt 0 ] && [ $estabcount -eq $totalcount ] && [ $p2count -eq 0 ] && {
    echo all connections have p1 up but no connections have p2 up - bouncing openbgpd and ipsec
    bounceall
    exit
    }
    
    

    It will bounce all tunnels which have phase 2 down, and if no tunnels have p1 it will bounce ipsec and bgpd. We'll see how long this will last.



  • How has this worked for you?



  • I have just edited the post with the latest version of the script. It didn't work when everything was down, and didn't correctly react to the situation where all P1 is up but no P2 is up. With these modifications This has been working fine for me for the past two weeks or so. It still crashes, but at least it recovers now.

    So: The charon setting I added does not fix the problem, but allows you to recover by stopping bgpd, stopping ipsec, starting ipsec and then starting bgpd again - which did not seem possible before. Better than nothing…



  • Still going strong - up to 7 days of BGP session uptime, after which IPSec and BGP get restarted once charon starts showing the buffer space errors while attepting to restart tunnels. My setup has got redundancy so this is reliable; obviously a single connection will be impacted, albeit not for long.

    EDIT: Some two months later and I have pretty much forgotten about IPSec + OpenBGPD issues. Tunnels occasionally die, but with redundancy I have not experienced a single outage since putting my solution in place. BGP session duration varies, but it is quietly restored whenever the buffer problem rears its ugly head.



  • Has anyone tried to repro with 2.4 beta?



  • I believe I have the same / similar problem regarding strongswan + openbgpd for AWS VPC VPNs.

    I saw the discusion on kernel tunables and the link to Redmine bug #6223.

    I saw owcz1's bash script. It sounds like an effective workaround but not an actual problem resolution.

    I have not yet implemented it, but will give it a close look.

    My status is as follows:

    First time pfSense user

    Installed on one appliance this week from pfSense-CE-memstick-serial-2.3.2-RELEASE-amd64 image

    Appliance has a static public IP on WAN and a static RFC1918 IP on LAN

    Stood up two strongswan tunnels to AWS VPC per their instructions http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpn-connections.html

    Installed openbgpd, configured a neighbor group with two neighbors for AWS routes; also configured a third neighbor on LAN segment for internal routing

    With only one tunnel is enabled, service starts properly and BGP routes propagate.

    Second tunnel subsequently enabled (and changes applied): Tunnel comes up, remote endpoint pingable, BGP looks good

    With both tunnels enabled, service does not start properly. Remote endpoints not pingable, BGP routes not propagated

    Seems to go into the weeds when left alone. When I logged out yesterday both tunnels were working, and when I logged in today they were not.

    Google search on charon "error sending to PF_KEY socket: No buffer space available" log event led to this thread.

    Have already verified the code fix is present in /etc/inc/vpn.inc which was committed by Chris Buechler committed on Apr 14.

    If I am unable to stabilize this deployment via the documented workaround, I will be available to collaborate.

    Any time or assistance the community can offer will be greatly appreciated.

    Regards,



  • Looks like the pfsense OpenBGPD package has an update available that showed up within the past couple months. (0.11_7)
    Does anyone know if this resolves the PF_KEY buffer issue? I haven't tried it.



  • @mraymond:

    Looks like the pfsense OpenBGPD package has an update available that showed up within the past couple months. (0.11_7)
    Does anyone know if this resolves the PF_KEY buffer issue? I haven't tried it.

    Running 0.11_9 of OpenBGPD and experience the same problem, not updated pfsense to 2.3.2_1 (Still on 2.3) but

    kernel-pfkey {

    events_buffer_size = 1048576

    }

    In charon plugins in /etc/inc/vpn.inc

    Gives me the ability to stop&start to resume the tunnel via restarts of ipsec and openbgpd.

    With 2.4.0 (FreeBSD pfSense.localdomain 11.0-RELEASE-p7 FreeBSD 11.0-RELEASE-p7 #29 6317ca7fc42(RELENG_2_4): Sun Feb 12 08:31:05 CST 2017    root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense  amd64)

    It's even worse, much more unstable and openbgpd dies constantly, doesn't require intervention but BGP sessions just die

    169.254.41.77 = AWS BGP peer, reachable via IPSec

    Feb 12 17:21:46 bgpd 72880 neighbor 169.254.41.77 (VPC): state change Connect -> OpenSent, reason: Connection opened
    Feb 12 17:21:46 bgpd 72880 neighbor 169.254.41.77 (VPC): state change OpenSent -> OpenConfirm, reason: OPEN message received
    Feb 12 17:21:46 bgpd 72880 neighbor 169.254.41.77 (VPC): state change OpenConfirm -> Established, reason: KEEPALIVE message received
    Feb 12 17:21:47 bgpd 72601 nexthop 169.254.41.77 now valid: via XXX.XXX.XXX.XXX (my public IP)
    Feb 12 17:22:26 bgpd 72880 neighbor 169.254.41.77 (VPC): write error: Permission denied
    Feb 12 17:22:26 bgpd 72880 neighbor 169.254.41.77 (VPC): state change Established -> Idle, reason: Fatal error
    Feb 12 17:23:05 bgpd 72611 route decision engine exiting
    Feb 12 17:23:05 bgpd 72880 session engine exiting
    Feb 12 17:23:05 bgpd 72601 dispatch_imsg in main: pipe closed
    Feb 12 17:23:05 bgpd 72601 dispatch_imsg in main: pipe closed
    Feb 12 17:23:05 bgpd 72601 Lost child: session engine exited
    Feb 12 17:23:05 bgpd 72601 Lost child: route decision engine exited
    Feb 12 17:23:05 bgpd 72601 kernel routing table 0 (Loc-RIB) decoupled
    Feb 12 17:23:05 bgpd 72601 Terminating

    [2.4.0-BETA][admin@pfSense.localdomain]/usr/local/etc/rc.d: /usr/local/sbin/bgpd -d -f /var/etc/openbgpd/bgpd.conf -v
    startup
    rereading config
    route decision engine ready
    new ktable rdomain_0 for rtableid 0
    RDE reconfigured
    session engine ready
    listening on 169.254.41.78
    SE reconfigured
    neighbor 169.254.41.77 (VPC): state change None -> Idle, reason: None
    neighbor 169.254.41.77 (VPC): state change Idle -> Connect, reason: Start
    neighbor 169.254.41.77 (VPC): state change Connect -> OpenSent, reason: Connection opened
    neighbor 169.254.41.77 (VPC): state change OpenSent -> OpenConfirm, reason: OPEN message received
    neighbor 169.254.41.77 (VPC): state change OpenConfirm -> Established, reason: KEEPALIVE message received
    nexthop 169.254.41.77 now valid: via XXX.XXX.XXX.XXX
    Traffic stops here
    neighbor 169.254.41.77 (VPC): write error: Permission denied
    neighbor 169.254.41.77 (VPC): state change Established -> Idle, reason: Fatal error
    neighbor 169.254.41.77 (VPC): state change Idle -> Connect, reason: Start
    neighbor 169.254.41.77 (VPC): state change Connect -> OpenSent, reason: Connection opened
    neighbor 169.254.41.77 (VPC): state change OpenSent -> OpenConfirm, reason: OPEN message received
    neighbor 169.254.41.77 (VPC): state change OpenConfirm -> Established, reason: KEEPALIVE message received
    nexthop 169.254.41.77 now valid: via XXX.XXX.XXX.XXX



  • Would the pfSense developers possibly consider moving to Quagga instead of OpenBGPd?

    It seems like Quagga with a proper pfSense GUI front-end would solve two problems:

    1. It would solve this current instability/conflict with strongSwan
    2. Quagga would allow for simultaneous use of OSPF & BGP

    From 50k foot perspective, this seems like the best choice for the platform.



  • does this problem still exist in 2.3.4 p1?





  • owczi - did you ever find a more elegant solution to the issue, or are you still running this script via cron?



  • @timmzahn said in IPSec Down after Upgrade to 2.3:

    ou ever find a more elegant solution to the issue, or are you sti

    I know this topic is old, but since I found it via google I will post my solution.

    I did replace OpenBGP with FrrBGP. I have been able to restore my IPSEC tunneling with AWS and also use the BGP services on PfSense for my needs.


Log in to reply