IPSec/IKEV2 error "trap not found, unable to acquire reqid"

geovaneg

Hi,

We have a IPSec/IKEV2 Server running in PFSense 2.4.4-RELEASE-p3 (amd64).
The VPN server serves an average of 40 concurrent mobile clients.
Each phase 1 tunnel created has three phase 2 tunnels.
When the "reqid" variable reaches the value "16384", the "trap not found" error logged in the logs below occurs and users can connect but cannot traffic over the VPN.
In my environment this value is reached approximately every four months.
To resolve the issue, I need to stop the VPN service and start it again for the variable to be reset.
Is this a StrongSwan bug or is it a configurable variable?

Aug 18 20:12:10 vpn2 charon: 02[KNL] creating acquire job for policy serverIP/32|/0 === clientIP/32|/0 with reqid {16384}
Aug 18 20:12:10 vpn2 charon: 13[CFG] trap not found, unable to acquire reqid 16384

Dec 11 11:34:34 vpn2 charon: 14[KNL] creating acquire job for policy serverIP/32|/0 === clientIP/32|/0 with reqid {16384}
Dec 11 11:34:34 vpn2 charon: 01[CFG] trap not found, unable to acquire reqid 16384

Konstanti

@geovaneg
This is a limitation of The FreeBSD kernel

Strongswan developer response

That because of IPSEC_MANUAL_REQID_MAX (0x3fff == 16383). Which is a strangely low limit (at least for keying daemons like strongSwan that manage reqids themselves) since reqids are 32-bit numbers.

reqids are currently allocated sequentially using a sttic counter (source:src/libcharon/kernel/kernel_interface.c#L328). The code that allocates them does not know anything about the limit above (it doesn't even know or care that it runs on a FreeBSD kernel).

geovaneg

@Konstanti,

Thanks for reply.
Is there a workaround solution beyond service restart?
What is the best solution? Do you find it more effective to change StrongSwan code to reuse reqid's, as suggested in this topic (https://wiki.strongswan.org/issues/2315), or propose to change the freebsd kernel code? Or both?
Do you know if there are any request in progress for FreeBSD developers to increase the capacity of the variable?

Geovane

geovaneg

Hi,
Today, by reviewing the daily use of "reqid's" in our environment, I got a much more pessimistic perspective of when the 16384 value will be reached: approximately every 30 days.

geovaneg

@Konstanti

I thought it was important to report the case to the FreeBSD kernel development team in this post:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242606

I already got an answer from crest@rlwinm.de :

"FreeBSD already contains a suitable allocator in "sys/kern/subr_unit.c"."

What do you think about?

Thanks,

Geovane

Konstanti

This post is deleted!

Konstanti

@geovaneg

Hello
Alternatively, you can use cron to run the command ipsec restart every 20-30 days, then all the requid counters will be reset .

geovaneg

@Konstanti

Hi,

Yes, last night, as a workaraound, I put this line in cron to stop and start the service as often as necessary:

"/usr/local/sbin/pfSsh.php playback svc stop ipsec; /usr/local/sbin/pfSsh.php playback svc start ipsec"

Geovane

sblinov

@geovaneg

Currently I use cron with this line /usr/local/sbin/ipsec restart . It works for me.
Any another solutions for fix that problem?

paraffin

It appears that Strongswan put out a patch today for this issue on the thread posted by @geovaneg , I guess they realised that the FreeBSD fix would take a long time to implement.

https://wiki.strongswan.org/issues/2315

How quickly would fixes like this be implemented in PFSense as I had this exact issue today?

jimp

@paraffin said in IPSec/IKEV2 error "trap not found, unable to acquire reqid":

How quickly would fixes like this be implemented in PFSense as I had this exact issue today?

It needs to make it into a strongSwan release first. And once it's in a strongSwan release, pfSense would need to include that new version in a release.

Given that it just went into strongSwan master two days ago, it's too late for 2.4.5. There is still a possibility it might be fixed for 2.5.0, though, but it all depends on when strongSwan puts out a release with it.

geovaneg

@jimp Due to the coronavirus pandemic and the migration of many workers to the remote work mode, the 16k limit can be reached more quickly and become an important limiter in this moment of crisis.
I think it would be important if we could work in an integrated way with the StrongSwan team and take the bug fix as soon as possible to the stable version of PFSense.

jimp

We can't just pull it in without vetting it first. Rushing it how things get broken worse.

strongSwan says they expect a new release shortly:

https://wiki.strongswan.org/projects/strongswan/roadmap

5.8.3
Due in 6 days (18.03.2020)

It's almost certainly too late for that to be in 2.4.5, but we have done OOB updates for some packages before.

marcquark

@jimp i think this thread might also describe the root cause of my problem over here https://forum.netgate.com/topic/149043/gateway-monitoring-gets-stuck-in-infinite-loop-when-using-multiple-vtis-on-sg-3100
I probably went down the wrong path trying to figure out where things break. Gateway Monitoring was apparently never to blame, it probably just accelerated the trigger. My problem was caused by this https://redmine.pfsense.org/issues/10176

FWIW my log messages always read "trap not found, unable to acquire reqid 0" when the problem arises

Anyway, a strongswan hotfix would be much, much appreciated. We use multiple Routed IPSec tunnels to interconnect Office spaces, Data Centers and AWS. These dying VPN tunnels are haunting us. Is there anything we from the community can do to help get this into a stable release?

/e i'm going to try and spin up a couple of VMs, maybe i can reliably reproduce the problem.

jimp

We'll need to be careful with it and test it on 2.5.0 first. Here's an example of why:

strongSwan 5.8.3 came out on March 25th, one day before pfSense 2.4.5. If we had rushed to include it in 2.4.5, we'd have people hitting issues that were discovered in strongSwan 5.8.3 which necessitated a new version, 5.8.4, released today.

So as with most other things, it will be included when it's deemed ready for inclusion.

sblinov

I have updated pfSense to 2.5 dev release and I found some connection issues, can't connect to Ikev2 IPSec mobile users. They are use PSK authentication. I tried different settings and it was ni successful for me.
@geovaneg @jimp please double check it

jimp

@sblinov said in IPSec/IKEV2 error "trap not found, unable to acquire reqid":

I have updated pfSense to 2.5 dev release and I found some connection issues, can't connect to Ikev2 IPSec mobile users. They are use PSK authentication. I tried different settings and it was ni successful for me.
@geovaneg @jimp please double check it

Start your own thread for that, it isn't related to the topic of this thread.