IPSec/IKEV2 error "trap not found, unable to acquire reqid"



  • Hi,

    We have a IPSec/IKEV2 Server running in PFSense 2.4.4-RELEASE-p3 (amd64).
    The VPN server serves an average of 40 concurrent mobile clients.
    Each phase 1 tunnel created has three phase 2 tunnels.
    When the "reqid" variable reaches the value "16384", the "trap not found" error logged in the logs below occurs and users can connect but cannot traffic over the VPN.
    In my environment this value is reached approximately every four months.
    To resolve the issue, I need to stop the VPN service and start it again for the variable to be reset.
    Is this a StrongSwan bug or is it a configurable variable?

    Aug 18 20:12:10 vpn2 charon: 02[KNL] creating acquire job for policy serverIP/32|/0 === clientIP/32|/0 with reqid {16384}
    Aug 18 20:12:10 vpn2 charon: 13[CFG] trap not found, unable to acquire reqid 16384

    Dec 11 11:34:34 vpn2 charon: 14[KNL] creating acquire job for policy serverIP/32|/0 === clientIP/32|/0 with reqid {16384}
    Dec 11 11:34:34 vpn2 charon: 01[CFG] trap not found, unable to acquire reqid 16384



  • @geovaneg
    This is a limitation of The FreeBSD kernel

    Strongswan developer response

    That because of IPSEC_MANUAL_REQID_MAX (0x3fff == 16383). Which is a strangely low limit (at least for keying daemons like strongSwan that manage reqids themselves) since reqids are 32-bit numbers.

    reqids are currently allocated sequentially using a sttic counter (source:src/libcharon/kernel/kernel_interface.c#L328). The code that allocates them does not know anything about the limit above (it doesn't even know or care that it runs on a FreeBSD kernel).



  • @Konstanti,

    Thanks for reply.
    Is there a workaround solution beyond service restart?
    What is the best solution? Do you find it more effective to change StrongSwan code to reuse reqid's, as suggested in this topic (https://wiki.strongswan.org/issues/2315), or propose to change the freebsd kernel code? Or both?
    Do you know if there are any request in progress for FreeBSD developers to increase the capacity of the variable?

    Geovane



  • Hi,
    Today, by reviewing the daily use of "reqid's" in our environment, I got a much more pessimistic perspective of when the 16384 value will be reached: approximately every 30 days.



  • @Konstanti

    I thought it was important to report the case to the FreeBSD kernel development team in this post:

    https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242606

    I already got an answer from crest@rlwinm.de :

    "FreeBSD already contains a suitable allocator in "sys/kern/subr_unit.c"."

    What do you think about?

    Thanks,

    Geovane



  • This post is deleted!


  • @geovaneg

    Hello
    Alternatively, you can use cron to run the command ipsec restart every 20-30 days, then all the requid counters will be reset .



  • @Konstanti

    Hi,

    Yes, last night, as a workaraound, I put this line in cron to stop and start the service as often as necessary:

    "/usr/local/sbin/pfSsh.php playback svc stop ipsec; /usr/local/sbin/pfSsh.php playback svc start ipsec"

    Geovane



  • @geovaneg

    Currently I use cron with this line /usr/local/sbin/ipsec restart . It works for me.
    Any another solutions for fix that problem?



  • It appears that Strongswan put out a patch today for this issue on the thread posted by @geovaneg , I guess they realised that the FreeBSD fix would take a long time to implement.

    https://wiki.strongswan.org/issues/2315

    How quickly would fixes like this be implemented in PFSense as I had this exact issue today?


  • Rebel Alliance Developer Netgate

    @paraffin said in IPSec/IKEV2 error "trap not found, unable to acquire reqid":

    How quickly would fixes like this be implemented in PFSense as I had this exact issue today?

    It needs to make it into a strongSwan release first. And once it's in a strongSwan release, pfSense would need to include that new version in a release.

    Given that it just went into strongSwan master two days ago, it's too late for 2.4.5. There is still a possibility it might be fixed for 2.5.0, though, but it all depends on when strongSwan puts out a release with it.



  • @jimp Due to the coronavirus pandemic and the migration of many workers to the remote work mode, the 16k limit can be reached more quickly and become an important limiter in this moment of crisis.
    I think it would be important if we could work in an integrated way with the StrongSwan team and take the bug fix as soon as possible to the stable version of PFSense.


  • Rebel Alliance Developer Netgate

    We can't just pull it in without vetting it first. Rushing it how things get broken worse.

    strongSwan says they expect a new release shortly:

    https://wiki.strongswan.org/projects/strongswan/roadmap

    5.8.3
    Due in 6 days (18.03.2020)

    It's almost certainly too late for that to be in 2.4.5, but we have done OOB updates for some packages before.



  • @jimp i think this thread might also describe the root cause of my problem over here https://forum.netgate.com/topic/149043/gateway-monitoring-gets-stuck-in-infinite-loop-when-using-multiple-vtis-on-sg-3100
    I probably went down the wrong path trying to figure out where things break. Gateway Monitoring was apparently never to blame, it probably just accelerated the trigger.
    FWIW my log messages always read "trap not found, unable to acquire reqid 0" when the problem arises

    Anyway, a strongswan hotfix would be much, much appreciated. We use multiple Routed IPSec tunnels to interconnect Office spaces, Data Centers and AWS. These dying VPN tunnels are haunting us. Is there anything we from the community can do to help get this into a stable release?

    /e i'm going to try and spin up a couple of VMs, maybe i can reliably reproduce the problem.


  • Rebel Alliance Developer Netgate

    We'll need to be careful with it and test it on 2.5.0 first. Here's an example of why:

    strongSwan 5.8.3 came out on March 25th, one day before pfSense 2.4.5. If we had rushed to include it in 2.4.5, we'd have people hitting issues that were discovered in strongSwan 5.8.3 which necessitated a new version, 5.8.4, released today.

    So as with most other things, it will be included when it's deemed ready for inclusion.


Log in to reply