Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Charon becoming unresponsive

    Scheduled Pinned Locked Moved IPsec
    37 Posts 6 Posters 6.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jimpJ
      jimp Rebel Alliance Developer Netgate
      last edited by

      I wouldn't put them up that high, or the log would be full of debug messages that are far too verbose to be helpful in most cases.

      IIRC the VICI log entries are all under "Configuration" though so that's a good place to focus.

      Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      W 1 Reply Last reply Reply Quote 1
      • W
        wickeren @jimp
        last edited by wickeren

        @jimp

        So far not able to reproduce on current 2.6.0 snaphot, while a couple of days ago I could make it happen almost immediately after a reboot. Puzzled about that. Don’t see any recent changes in Redmine that could explain this difference in behaviour.

        Wonder if I should keep 2.6.0 running. Doing only some basic routing on this machine, besides quite a bunch of ipsec tunnels and all seems to be working well so far. The improvements in ipsec status only might make it worthwhile and I don’t see any open urgent or high impact issued that could hit me here.
        Try to reproduce again tomorrow on 2.6.0

        W 1 Reply Last reply Reply Quote 0
        • W
          wickeren @wickeren
          last edited by wickeren

          @jimp

          It has happened again this morning, on 2.6.0, after manually trying to bring up the only single connection that was down at 9:43:24

          Aug 23 09:43:24 pfSense4 charon[76421]: 10[IKE] <con10|35> sending DPD request
          Aug 23 09:43:24 pfSense4 charon[76421]: 10[ENC] <con10|35> generating INFORMATIONAL request 1084 [ ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 10[NET] <con10|35> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 10[NET] <con21|42> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 10[ENC] <con21|42> parsed INFORMATIONAL request 5554 [ ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 10[ENC] <con21|42> generating INFORMATIONAL response 5554 [ ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 10[NET] <con21|42> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 10[NET] <con10|35> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 10[ENC] <con10|35> parsed INFORMATIONAL response 1084 [ ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 13[IKE] <con26|48> sending DPD request
          Aug 23 09:43:24 pfSense4 charon[76421]: 13[ENC] <con26|48> generating INFORMATIONAL request 5576 [ ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 13[NET] <con26|48> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 13[NET] <con26|48> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 13[ENC] <con26|48> parsed INFORMATIONAL response 5576 [ ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 07[CFG] vici client 411 connected
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] vici client 411 registered for: list-sa
          Aug 23 09:43:24 pfSense4 charon[76421]: 07[CFG] vici client 411 requests: list-sas
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] vici client 411 disconnected
          Aug 23 09:43:24 pfSense4 charon[76421]: 07[CFG] vici client 412 connected
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] vici client 412 registered for: control-log
          Aug 23 09:43:24 pfSense4 charon[76421]: 14[CFG] vici client 412 requests: initiate
          Aug 23 09:43:24 pfSense4 charon[76421]: 14[CFG] vici initiate CHILD_SA 'con15_105'
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> initiating IKE_SA con15[93] to xxx.xxx.xxx.xxx
          Aug 23 09:43:24 pfSense4 charon[76421]: 06[CFG] vici client 413 connected
          Aug 23 09:43:24 pfSense4 charon[76421]: 05[CFG] vici client 413 registered for: control-log
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> configured proposals: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> sending supported signature hash algorithms: sha256 sha384 sha512 identity
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[ENC] <con15|93> generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 09[CFG] vici client 413 requests: initiate
          Aug 23 09:43:24 pfSense4 charon[76421]: 09[CFG] vici initiate CHILD_SA 'con15_104'
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[NET] <con15|93> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (464 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[NET] <con15|93> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (639 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[ENC] <con15|93> parsed IKE_SA_INIT response 0 [ SA KE No V V N(NATD_S_IP) N(NATD_D_IP) CERTREQ N(FRAG_SUP) V ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> received Cisco Delete Reason vendor ID
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> received Cisco Copyright (c) 2009 vendor ID
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> received FRAGMENTATION vendor ID
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> selecting proposal:
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93>   proposal matches
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> received proposals: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> configured proposals: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> selected proposal: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> received 3 cert requests for an unknown ca
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> authentication of 'xxx.xxx.xxx.xxx' (myself) with pre-shared key
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> proposing traffic selectors for us:
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93>  xxx.xxx.xxx.xxx/24|/0
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> proposing traffic selectors for other:
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93>  xxx.xxx.xxx.xxx/24|/0
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> configured proposals: ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> establishing CHILD_SA con15_105{2977}
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[ENC] <con15|93> generating IKE_AUTH request 1 [ IDi N(INIT_CONTACT) IDr AUTH N(ESP_TFC_PAD_N) SA TSi TSr N(EAP_ONLY) N(MSG_ID_SYN_SUP) ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 11[NET] <con15|93> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (272 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[NET] <con15|93> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (272 bytes)
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[ENC] <con15|93> parsed IKE_AUTH response 1 [ V IDr AUTH SA TSi TSr N(ESP_TFC_PAD_N) N(NON_FIRST_FRAG) ]
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> authentication of 'customer.domain.nl' with pre-shared key successful
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> IKE_SA con15[93] established between xxx.xxx.xxx.xxx[xxx.xxx.xxx.xxx]...xxx.xxx.xxx.xxx[customer.domain.nl]
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> scheduling rekeying in 74785s
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> maximum IKE_SA lifetime 83425s
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> received ESP_TFC_PADDING_NOT_SUPPORTED, not using ESPv3 TFC padding
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> selecting proposal:
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93>   proposal matches
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> received proposals: ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> configured proposals: ESP:AES_CBC_256/HMAC_SHA2_256_128/MODP_2048/NO_EXT_SEQ
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> selected proposal: ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> selecting traffic selectors for us:
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93>  config: xxx.xxx.xxx.xxx/24|/0, received: xxx.xxx.xxx.xxx/24|/0 => match: xxx.xxx.xxx.xxx/24|/0
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> selecting traffic selectors for other:
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93>  config: xxx.xxx.xxx.xxx/24|/0, received: xxx.xxx.xxx.xxx/24|/0 => match: xxx.xxx.xxx.xxx/24|/0
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> CHILD_SA con15_105{2977} established with SPIs c2fe8f8f_i 1b28265f_o and TS xxx.xxx.xxx.xxx/24|/0 === xxx.xxx.xxx.xxx/24|/0
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> proposing traffic selectors for us:
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93>  xxx.xxx.xxx.xxx/24|/0
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> proposing traffic selectors for other:
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93>  xxx.xxx.xxx.xxx/24|/0
          Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> configured proposals: ESP:AES_CBC_256/HMAC_SHA2_256_128/MODP_2048/NO_EXT_SEQ
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[DMN] Starting IKE charon daemon (strongSwan 5.9.2, FreeBSD 12.2-STABLE, amd64)
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] PKCS11 module '<name>' lacks library path
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[KNL] unable to set UDP_ENCAP: Invalid argument
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[NET] enabling UDP decapsulation for IPv6 on port 4500 failed
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading unbound resolver config from '/etc/resolv.conf'
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading unbound trust anchors from '/usr/local/etc/ipsec.d/dnssec.keys'
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] ipseckey plugin is disabled
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading ca certificates from '/usr/local/etc/ipsec.d/cacerts'
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading aa certificates from '/usr/local/etc/ipsec.d/aacerts'
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading ocsp signer certificates from '/usr/local/etc/ipsec.d/ocspcerts'
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading attribute certificates from '/usr/local/etc/ipsec.d/acerts'
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading crls from '/usr/local/etc/ipsec.d/crls'
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading secrets from '/usr/local/etc/ipsec.secrets'
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] opening triplet file /usr/local/etc/ipsec.d/triplets.dat failed: No such file or directory
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loaded 0 RADIUS server configurations
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[LIB] loaded plugins: charon unbound pkcs11 aes des blowfish rc2 sha2 sha1 md4 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey ipseckey pem openssl fips-prf curve25519 xcbc cmac hmac drbg curl attr kernel-pfkey kernel-pfroute resolve socket-default stroke vici updown eap-identity eap-sim eap-md5 eap-mschapv2 eap-dynamic eap-radius eap-tls eap-ttls eap-peap xauth-generic xauth-eap xauth-pam whitelist addrblock counters
          Aug 23 10:12:01 pfSense4 charon[99301]: 00[JOB] spawning 16 worker threads
          Aug 23 10:12:02 pfSense4 charon[99301]: 04[NET] <1> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (894 bytes)
          Aug 23 10:12:02 pfSense4 charon[99301]: 04[ENC] <1> parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
          Aug 23 10:12:02 pfSense4 charon[99301]: 04[CFG] <1> looking for an IKEv2 config for xxx.xxx.xxx.xxx...xxx.xxx.xxx.xxx
          Aug 23 10:12:02 pfSense4 charon[99301]: 04[IKE] <1> no IKE config found for xxx.xxx.xxx.xxx...xxx.xxx.xxx.xxx, sending NO_PROPOSAL_CHOSEN
          Aug 23 10:12:02 pfSense4 charon[99301]: 04[ENC] <1> generating IKE_SA_INIT response 0 [ N(NO_PROP) ]
          Aug 23 10:12:02 pfSense4 charon[99301]: 04[NET] <1> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (36 bytes)
          Aug 23 10:12:02 pfSense4 charon[99301]: 15[CFG] vici client 1 connected
          Aug 23 10:12:02 pfSense4 charon[99301]: 15[CFG] vici client 1 requests: get-keys
          Aug 23 10:12:02 pfSense4 charon[99301]: 15[CFG] vici client 1 requests: get-shared
          Aug 23 10:12:02 pfSense4 charon[99301]: 15[CFG] vici client 1 requests: load-shared
          

          Scrolling back through the logs show no obvious reason or clues to me. It just looks like it completely stalls right in the middle of the P2 negotiation phase, without any previous warning or sign this is going to happen.
          I have brought the connections up and down numerous times during the weekend without being able to reproduce and now all of a sudden it happens again. The only thing I can say for sure right now that so far it always has been triggered by manually bringing up a connection from the status page.

          As all logging also stop it's quite hard to tell what is going on here. Could it be https://github.com/strongswan/strongswan/issues/268, which is a 5.9.2 issue? Any plan for 5.9.3, or is it not ported to FreeBSD yet?

          Julian

          1 Reply Last reply Reply Quote 0
          • jimpJ
            jimp Rebel Alliance Developer Netgate
            last edited by

            I see strongSwan 5.9.3 in FreeBSD ports but we haven't pulled it in yet. It is possible that bug is related though that may not be the same situation you are hitting.

            Is there anything odd about the negotiation in the remote logs when that happened?

            I don't see anything there that stands out as being a problem, but I know the swanctl commands to initiate and terminate usually end up staying active until the negotiation is complete, which may hold onto a VICI slot longer than expected.

            The status page should probably run it with --force or a -timeout value now that I'm looking at the code again.

            Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            W 1 Reply Last reply Reply Quote 1
            • jimpJ
              jimp Rebel Alliance Developer Netgate
              last edited by

              https://redmine.pfsense.org/issues/12298

              https://github.com/pfsense/pfsense/commit/953aba88ede593dba2d05fefed879acce5dfde83

              Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              1 Reply Last reply Reply Quote 1
              • W
                wickeren @jimp
                last edited by

                Is there anything odd about the negotiation in the remote logs when that happened?

                Not that I’m aware of, but I have to ask if they are willing to provide any logging if they have it, as I don’t manage the remote end. But as I have seen this happening on multiple PFsense boxes already with all kinds of connection with different brands on the other side, I doubt if this is coming from the other side or the logging from the other side will make it more clear.
                It's worth a try though.

                I don't see anything there that stands out as being a problem, but I know the swanctl commands to initiate and terminate usually end up staying active until the negotiation is complete, which may hold onto a VICI slot longer than expected.

                The status page should probably run it with --force or a -timeout value now that I'm looking at the code again.

                If that would prevent this situation from happening it would be great. Still wonder in wat state the service is. It is still running (unfortunately, otherwise service watchdog would have catched it) but is not responsive in any way. Do you think a --force or -timeout would prevent this state?

                Anything other for me to do now? More logging or higher log levels won’t be at any help, I think?

                1 Reply Last reply Reply Quote 0
                • jimpJ
                  jimp Rebel Alliance Developer Netgate
                  last edited by

                  The new version of strongSwan will be in builds maybe as early as tomorrow, along with the code change I just put in for the initiate timeout and forced terminate. So I'd keep tracking snapshots this week and see if you can still replicate it in a couple days.

                  You could try increasing the log level on the configuration section but it's still not clear what, if anything, is happening there that could be adding to the problem.

                  Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  W 1 Reply Last reply Reply Quote 1
                  • W
                    wickeren @jimp
                    last edited by wickeren

                    @jimp

                    Can I track progress on Redmine? Or is it not yet in there?
                    I see. Tnx a lot for your effort!

                    Best regards,

                    Julian

                    W 1 Reply Last reply Reply Quote 0
                    • W
                      wickeren @wickeren
                      last edited by

                      Is there a way to see how many VICI sockets are currently in use? Limit is 3, right?

                      1 Reply Last reply Reply Quote 0
                      • jimpJ
                        jimp Rebel Alliance Developer Netgate
                        last edited by

                        You can kind of get an idea by looking at:

                        : sockstat | grep -i vici
                        root     swanctl    60061 7  stream -> /var/run/charon.vici
                        

                        There will be some other entries from charon of its own but those are for serving the info not requesting.

                        You should also be able to check for a backlog of requests by looking at this:

                        : netstat -LAn | grep vici
                        unix  0/0/3                            /var/run/charon.vici
                        

                        Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                        Need help fast? Netgate Global Support!

                        Do not Chat/PM for help!

                        W 1 Reply Last reply Reply Quote 1
                        • W
                          wickeren @jimp
                          last edited by

                          @jimp

                          Thank you!

                          so 0/0/3 means 3 sockets available, 0 in use, 0 in queue, I guess.

                          I just now realise the Listen queue overflow messages in my start post actually might be the VICI connections. IIRC the listen queue is defined as ~ 1.5 * number of allowed connections, so the 5 already in queue awaiting acceptance makes sense now.

                          jimpJ 1 Reply Last reply Reply Quote 0
                          • jimpJ
                            jimp Rebel Alliance Developer Netgate @wickeren
                            last edited by

                            @wickeren said in Charon becoming unresponsive:

                            @jimp

                            Thank you!

                            so 0/0/3 means 3 sockets available, 0 in use, 0 in queue, I guess.

                                         -L      Show the size of the various listen queues.  The first
                                                 count shows the number of unaccepted connections, the
                                                 second count shows the amount of unaccepted incomplete
                                                 connections, and the third count is the maximum number of
                                                 queued connections.
                            

                            I just now realise the Listen queue overflow messages in my start post actually might be the VICI connections. IIRC the listen queue is defined as ~ 1.5 * number of allowed connections, so the 5 already in queue awaiting acceptance makes sense now.

                            Exactly, that's what started my line of questioning above.

                            Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            W 1 Reply Last reply Reply Quote 0
                            • W
                              wickeren @jimp
                              last edited by wickeren

                              @jimp

                              Have not been able to reproduce after using your fix in ipsec.inc, at least so far. In the meantime I'm using zabbix now to check the ipsec.log timestamp to see if logging has stopped and alarm me.
                              Is the upgrade to Strongswan 5.9.3 visible in Redmine or do I just have to check he nightly builds?

                              Best regards,

                              Julian

                              1 Reply Last reply Reply Quote 0
                              • jimpJ
                                jimp Rebel Alliance Developer Netgate
                                last edited by

                                It is there now. I checked a vm running 2.6.0.a.20210825.0500 and it has strongswan-5.9.3

                                Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                Need help fast? Netgate Global Support!

                                Do not Chat/PM for help!

                                W 1 Reply Last reply Reply Quote 0
                                • W
                                  wickeren @jimp
                                  last edited by

                                  @jimp

                                  Great! Tnx a lot!

                                  Gonna try tonight and stress it a bit more to see if it's reproducible or not anymore.

                                  W 1 Reply Last reply Reply Quote 0
                                  • W
                                    wickeren @wickeren
                                    last edited by wickeren

                                    For some reason easy reproducible now, even with Strongswan 5.9.3 on latest snapshot.
                                    Fresh reboot, login, Ipsec status, bring some tunnels up and it just stops. Have to say it crashes on bringing up the exact same connection as last time. Might be a coincidence, but seeing others sides logs might be interesting. Still no answer from there though.
                                    Sometimes bringing up an down this particular connection is no problem at all.

                                    When it was this state I have run the advised commands:

                                    sockstat | grep -i vici

                                    
                                    root Ā  Ā  swanctl Ā  Ā 28047 7 Ā stream -> /var/run/charon.vici
                                    root Ā  Ā  swanctl Ā  Ā 19770 7 Ā stream -> /var/run/charon.vici
                                    root Ā  Ā  swanctl Ā  Ā 19222 7 Ā stream -> /var/run/charon.vici
                                    root Ā  Ā  php-fpm Ā  Ā 22436 14 stream -> /var/run/charon.vici
                                    root Ā  Ā  charon Ā  Ā  60508 22 stream /var/run/charon.vici
                                    root Ā  Ā  charon Ā  Ā  60508 25 stream /var/run/charon.vici
                                    root Ā  Ā  charon Ā  Ā  60508 26 stream /var/run/charon.vici
                                    root Ā  Ā  php-fpm Ā  Ā 328 Ā  14 stream -> /var/run/charon.vici
                                    ? Ā  Ā  Ā  Ā ? Ā  Ā  Ā  Ā  Ā ? Ā  Ā  ? Ā stream /var/run/charon.vici
                                    ? Ā  Ā  Ā  Ā ? Ā  Ā  Ā  Ā  Ā ? Ā  Ā  ? Ā stream /var/run/charon.vici
                                    ? Ā  Ā  Ā  Ā ? Ā  Ā  Ā  Ā  Ā ? Ā  Ā  ? Ā stream /var/run/charon.vici
                                    
                                    

                                    netstat -LAn | grep vici

                                    
                                    unix Ā 3/0/3 Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā  Ā /var/run/charon.vici
                                    
                                    

                                    Can provide complete IPSEC logs if needed, but it stalls at the same point in P2 initiate as last provided log snippet and then it all ends until rebooting.

                                    Any other thing I could do to collect some more info? I just can’t believe to be the only one being able to reproduce this, on even multiple boxes. It all are KVM VPS boxes if that might matter at all.

                                    Best regards,

                                    Julian

                                    1 Reply Last reply Reply Quote 0
                                    • jimpJ
                                      jimp Rebel Alliance Developer Netgate
                                      last edited by jimp

                                      I've got about 20 systems interconnected with IPsec tunnels that I setup when working on IPsec for 21.09/2.6.0 and thus far I haven't been able to reproduce it on any of them, no matter how much I click around on the status page. They are a mix of Plus and CE both on hardware and VMs (Proxmox/KVM and ESX)

                                      Only thing I can think of is maybe something is in the middle of making a request when IPsec is reloaded at boot time, but even when I try to do that manually I can't get it to break here.

                                      There must be some timing, environmental, or procedural component we haven't figured out yet that is contributing.

                                      Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                      Need help fast? Netgate Global Support!

                                      Do not Chat/PM for help!

                                      W 1 Reply Last reply Reply Quote 0
                                      • W
                                        wickeren @jimp
                                        last edited by

                                        @jimp

                                        some more info:

                                        ipsec status shows this:

                                        con15_105{139}: ROUTED, TUNNEL, reqid 139
                                        con15_105{139}: xxx.xxx.xxx.0/24|/0 === yyy.yyy.yyy.0/24|/0
                                        con15_104{138}: ROUTED, TUNNEL, reqid 138
                                        con15_104{138}: xxx.xxx.xxx.0/24|/0 === yyy.yyy.yyy.0/24|/0

                                        However, con15 show disconnected on the status screen. Do I misunderstand the info from ipsec status?

                                        Trying to connect from status screen sometimes triggers the situation, especially shorly after boot.

                                        W 1 Reply Last reply Reply Quote 0
                                        • W
                                          wickeren @wickeren
                                          last edited by

                                          On Strongswan github they suggest:

                                          Configure logging, install debug symbols (or don't strip the binaries), and attach a debugger when it happens. Then provide stacktraces for all threads.
                                          Then we can say something.

                                          I have to say that I haven’t done this before, let alone on freebsd. I think I need to run pkg install pfSense-kernel-debug for debug symbols but I’m unsure what to do next to get the stacktraces.

                                          1 Reply Last reply Reply Quote 0
                                          • jimpJ
                                            jimp Rebel Alliance Developer Netgate
                                            last edited by

                                            Debugging is probably a bit beyond what you'd be able to do since it would generate tons and tons of data and it would be nearly impossible to sift through it all to tell what the interesting bits are.

                                            You could try truss -fp xxxxx where xxxxx is the PID of the charon process to see what happens at the time.

                                            Or truss -f swanctl <blah> when attempting a manual connect/disconnect to see if it shows anything useful.

                                            I kind of doubt swanctl would be helpful in this case since it's probably just blocked waiting on a request to vici to do something and never getting a response so it piles up.

                                            Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                            Need help fast? Netgate Global Support!

                                            Do not Chat/PM for help!

                                            W 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.