IPSec Down after Upgrade to 2.3
-
Still going strong - up to 7 days of BGP session uptime, after which IPSec and BGP get restarted once charon starts showing the buffer space errors while attepting to restart tunnels. My setup has got redundancy so this is reliable; obviously a single connection will be impacted, albeit not for long.
EDIT: Some two months later and I have pretty much forgotten about IPSec + OpenBGPD issues. Tunnels occasionally die, but with redundancy I have not experienced a single outage since putting my solution in place. BGP session duration varies, but it is quietly restored whenever the buffer problem rears its ugly head.
-
Has anyone tried to repro with 2.4 beta?
-
I believe I have the same / similar problem regarding strongswan + openbgpd for AWS VPC VPNs.
I saw the discusion on kernel tunables and the link to Redmine bug #6223.
I saw owcz1's bash script. It sounds like an effective workaround but not an actual problem resolution.
I have not yet implemented it, but will give it a close look.
My status is as follows:
First time pfSense user
Installed on one appliance this week from pfSense-CE-memstick-serial-2.3.2-RELEASE-amd64 image
Appliance has a static public IP on WAN and a static RFC1918 IP on LAN
Stood up two strongswan tunnels to AWS VPC per their instructions http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpn-connections.html
Installed openbgpd, configured a neighbor group with two neighbors for AWS routes; also configured a third neighbor on LAN segment for internal routing
With only one tunnel is enabled, service starts properly and BGP routes propagate.
Second tunnel subsequently enabled (and changes applied): Tunnel comes up, remote endpoint pingable, BGP looks good
With both tunnels enabled, service does not start properly. Remote endpoints not pingable, BGP routes not propagated
Seems to go into the weeds when left alone. When I logged out yesterday both tunnels were working, and when I logged in today they were not.
Google search on charon "error sending to PF_KEY socket: No buffer space available" log event led to this thread.
Have already verified the code fix is present in /etc/inc/vpn.inc which was committed by Chris Buechler committed on Apr 14.
If I am unable to stabilize this deployment via the documented workaround, I will be available to collaborate.
Any time or assistance the community can offer will be greatly appreciated.
Regards,
-
Looks like the pfsense OpenBGPD package has an update available that showed up within the past couple months. (0.11_7)
Does anyone know if this resolves the PF_KEY buffer issue? I haven't tried it. -
Looks like the pfsense OpenBGPD package has an update available that showed up within the past couple months. (0.11_7)
Does anyone know if this resolves the PF_KEY buffer issue? I haven't tried it.Running 0.11_9 of OpenBGPD and experience the same problem, not updated pfsense to 2.3.2_1 (Still on 2.3) but
kernel-pfkey {
events_buffer_size = 1048576
}
In charon plugins in /etc/inc/vpn.inc
Gives me the ability to stop&start to resume the tunnel via restarts of ipsec and openbgpd.
With 2.4.0 (FreeBSD pfSense.localdomain 11.0-RELEASE-p7 FreeBSD 11.0-RELEASE-p7 #29 6317ca7fc42(RELENG_2_4): Sun Feb 12 08:31:05 CST 2017 root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense amd64)
It's even worse, much more unstable and openbgpd dies constantly, doesn't require intervention but BGP sessions just die
169.254.41.77 = AWS BGP peer, reachable via IPSec
Feb 12 17:21:46 bgpd 72880 neighbor 169.254.41.77 (VPC): state change Connect -> OpenSent, reason: Connection opened
Feb 12 17:21:46 bgpd 72880 neighbor 169.254.41.77 (VPC): state change OpenSent -> OpenConfirm, reason: OPEN message received
Feb 12 17:21:46 bgpd 72880 neighbor 169.254.41.77 (VPC): state change OpenConfirm -> Established, reason: KEEPALIVE message received
Feb 12 17:21:47 bgpd 72601 nexthop 169.254.41.77 now valid: via XXX.XXX.XXX.XXX (my public IP)
Feb 12 17:22:26 bgpd 72880 neighbor 169.254.41.77 (VPC): write error: Permission denied
Feb 12 17:22:26 bgpd 72880 neighbor 169.254.41.77 (VPC): state change Established -> Idle, reason: Fatal error
Feb 12 17:23:05 bgpd 72611 route decision engine exiting
Feb 12 17:23:05 bgpd 72880 session engine exiting
Feb 12 17:23:05 bgpd 72601 dispatch_imsg in main: pipe closed
Feb 12 17:23:05 bgpd 72601 dispatch_imsg in main: pipe closed
Feb 12 17:23:05 bgpd 72601 Lost child: session engine exited
Feb 12 17:23:05 bgpd 72601 Lost child: route decision engine exited
Feb 12 17:23:05 bgpd 72601 kernel routing table 0 (Loc-RIB) decoupled
Feb 12 17:23:05 bgpd 72601 Terminating[2.4.0-BETA][admin@pfSense.localdomain]/usr/local/etc/rc.d: /usr/local/sbin/bgpd -d -f /var/etc/openbgpd/bgpd.conf -v
startup
rereading config
route decision engine ready
new ktable rdomain_0 for rtableid 0
RDE reconfigured
session engine ready
listening on 169.254.41.78
SE reconfigured
neighbor 169.254.41.77 (VPC): state change None -> Idle, reason: None
neighbor 169.254.41.77 (VPC): state change Idle -> Connect, reason: Start
neighbor 169.254.41.77 (VPC): state change Connect -> OpenSent, reason: Connection opened
neighbor 169.254.41.77 (VPC): state change OpenSent -> OpenConfirm, reason: OPEN message received
neighbor 169.254.41.77 (VPC): state change OpenConfirm -> Established, reason: KEEPALIVE message received
nexthop 169.254.41.77 now valid: via XXX.XXX.XXX.XXX
Traffic stops here
neighbor 169.254.41.77 (VPC): write error: Permission denied
neighbor 169.254.41.77 (VPC): state change Established -> Idle, reason: Fatal error
neighbor 169.254.41.77 (VPC): state change Idle -> Connect, reason: Start
neighbor 169.254.41.77 (VPC): state change Connect -> OpenSent, reason: Connection opened
neighbor 169.254.41.77 (VPC): state change OpenSent -> OpenConfirm, reason: OPEN message received
neighbor 169.254.41.77 (VPC): state change OpenConfirm -> Established, reason: KEEPALIVE message received
nexthop 169.254.41.77 now valid: via XXX.XXX.XXX.XXX -
Would the pfSense developers possibly consider moving to Quagga instead of OpenBGPd?
It seems like Quagga with a proper pfSense GUI front-end would solve two problems:
- It would solve this current instability/conflict with strongSwan
- Quagga would allow for simultaneous use of OSPF & BGP
From 50k foot perspective, this seems like the best choice for the platform.
-
does this problem still exist in 2.3.4 p1?
-
For me yes it does…
https://forum.pfsense.org/index.php?topic=134925
-
owczi - did you ever find a more elegant solution to the issue, or are you still running this script via cron?
-
@timmzahn said in IPSec Down after Upgrade to 2.3:
ou ever find a more elegant solution to the issue, or are you sti
I know this topic is old, but since I found it via google I will post my solution.
I did replace OpenBGP with FrrBGP. I have been able to restore my IPSEC tunneling with AWS and also use the BGP services on PfSense for my needs.