Charon becoming unresponsive
-
It has happened again.
What I did after login was just opening the status page and click connect on the one single connection that was down (or at least seemed to be down). Then the status page hung with "Collecting IPSEC status information". All IPSEC traffic stalled. The last message in the log was 14:13:19 16[CFG] vici initiate CHILD_SA 'con1500000' and then it all went silent untill a reboot of the box. The 8 min gap is obvious and never happens during normal operation, lots of logging all the time if all tunnels are connected.
Aug 17 14:13:18 pfSense4 charon[34687]: 07[ENC] <con21000|152315> parsed INFORMATIONAL request 549 [ ] Aug 17 14:13:18 pfSense4 charon[34687]: 07[ENC] <con21000|152315> generating INFORMATIONAL response 549 [ ] Aug 17 14:13:18 pfSense4 charon[34687]: 07[NET] <con21000|152315> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes) Aug 17 14:13:19 pfSense4 charon[34687]: 09[CFG] vici terminate CHILD_SA 'con1500000' Aug 17 14:13:19 pfSense4 charon[34687]: 09[CFG] vici terminate CHILD_SA 'con1500000' Aug 17 14:13:19 pfSense4 charon[34687]: 09[CFG] vici initiate CHILD_SA 'con1500000' Aug 17 14:13:19 pfSense4 charon[34687]: 06[IKE] <con1500000|152321> initiating IKE_SA con1500000[152321] to xxx.xxx.xxx.xxx Aug 17 14:13:19 pfSense4 charon[34687]: 16[CFG] vici initiate CHILD_SA 'con1500000' Aug 17 14:21:42 pfSense4 charon[57854]: 00[DMN] Starting IKE charon daemon (strongSwan 5.9.2, FreeBSD 12.2-STABLE, amd64) Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] PKCS11 module '<name>' lacks library path Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] loaded PKCS#11 v2.20 library 'opensc' (/usr/local/lib/opensc-pkcs11.so) Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] OpenSC Project: OpenSC smartcard framework v0.21 Aug 17 14:21:42 pfSense4 charon[57854]: 00[KNL] unable to set UDP_ENCAP: Invalid argument Aug 17 14:21:42 pfSense4 charon[57854]: 00[NET] enabling UDP decapsulation for IPv6 on port 4500 failed Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] ipseckey plugin is disabled Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] loading ca certificates from '/usr/local/etc/ipsec.d/cacerts' Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] loading aa certificates from '/usr/local/etc/ipsec.d/aacerts' Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] loading ocsp signer certificates from '/usr/local/etc/ipsec.d/ocspcerts' Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] loading attribute certificates from '/usr/local/etc/ipsec.d/acerts' Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] loading crls from '/usr/local/etc/ipsec.d/crls' Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] loading secrets from '/usr/local/etc/ipsec.secrets' Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] opening triplet file /usr/local/etc/ipsec.d/triplets.dat failed: No such file or directory Aug 17 14:21:42 pfSense4 charon[57854]: 00[CFG] loaded 0 RADIUS server configurations Aug 17 14:21:42 pfSense4 charon[57854]: 00[LIB] loaded plugins: charon unbound pkcs11 aes des blowfish rc2 sha2 sha1 md4 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey ipseckey pem openssl fips-prf curve25519 xcbc cmac hmac drbg curl attr kernel-pfkey kernel-pfroute resolve socket-default stroke vici updown eap-identity eap-sim eap-md5 eap-mschapv2 eap-dynamic eap-radius eap-tls eap-ttls eap-peap xauth-generic xauth-eap xauth-pam whitelist addrblock counters Aug 17 14:21:42 pfSense4 charon[57854]: 00[JOB] spawning 16 worker threads
There was a core dump in root, but itโs not from this occurrence but an earlier one so I wonder if that is at any help.
-
Whatever happened must be before that point. It probably was stuck on the status there because the commands to terminate the child SA and initiate it again got stuck, consuming all the VICI sockets.
But it's not clear why those commands didn't succeed.
We've made numerous changes to IPsec on snapshots, though, so it's possible it's already fixed. But without being able to reproduce it reliably / on demand it's tough to say for certain without testing it in your environment.
-
I'll try to reproduce it again and see if I can find a pattern in it so I can do it reliably on demand.
Any suggestions to set some more log level in ipsec.log, in order to gather more info on this?
Best regards,
Julian
-
Actually quite easy to reproduce on latest 2.6.0 snapshot. Just click connect on one or more connections showing down on the status page and it all stops responding. Was able reproduce it a couple of times, even easier than on 2.5.2.
Only difference is the status page doesnโt keep spinning forever but just shows all connections as down. Might be just to the new logic introduced in the status page. Bye the way, looks very nice and is fast, if charon is running and still doing well
Log file a little different, it all stopped a bit later in process of bringing up the connection, but that might be just random.If there is anything I can do to help in this case Iโm more than willing to do so. At night a can test whatever I want without anyone noticing. Went back to 2.5.2 for now.
-
On 2.5.2 the status page and buttons were likely broken which may be why it was harder to do there. But that still doesn't explain why it is stopping.
On all my test systems when I click those buttons it does the intended action immediately. I've clicked and clicked and clicked all over the status page and widget on different tunnels and it always works for me.
Something is causing the VICI interface on charon to wedge on your environment but unfortunately it isn't clear what. There may be something a lot farther back in the logs.
-
@jimp
Strange, I've seen it on multiple boxes, but so far almost randomly.I was thinking about doing this at night:
- upgrade the same box as last time to 2.6.0 again.
- stop IPSEC service, clear logging and adjust IPSEC log level.
- start IPSEC so we have a fresh set of log files to start with
- try to reproduce the issue and look for clues in the log.
What log settings do you suggest? Demaon and Configuration backend both to Diag? Or even higher? Or other categories as well?
-
I wouldn't put them up that high, or the log would be full of debug messages that are far too verbose to be helpful in most cases.
IIRC the VICI log entries are all under "Configuration" though so that's a good place to focus.
-
So far not able to reproduce on current 2.6.0 snaphot, while a couple of days ago I could make it happen almost immediately after a reboot. Puzzled about that. Donโt see any recent changes in Redmine that could explain this difference in behaviour.
Wonder if I should keep 2.6.0 running. Doing only some basic routing on this machine, besides quite a bunch of ipsec tunnels and all seems to be working well so far. The improvements in ipsec status only might make it worthwhile and I donโt see any open urgent or high impact issued that could hit me here.
Try to reproduce again tomorrow on 2.6.0 -
It has happened again this morning, on 2.6.0, after manually trying to bring up the only single connection that was down at 9:43:24
Aug 23 09:43:24 pfSense4 charon[76421]: 10[IKE] <con10|35> sending DPD request Aug 23 09:43:24 pfSense4 charon[76421]: 10[ENC] <con10|35> generating INFORMATIONAL request 1084 [ ] Aug 23 09:43:24 pfSense4 charon[76421]: 10[NET] <con10|35> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 10[NET] <con21|42> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 10[ENC] <con21|42> parsed INFORMATIONAL request 5554 [ ] Aug 23 09:43:24 pfSense4 charon[76421]: 10[ENC] <con21|42> generating INFORMATIONAL response 5554 [ ] Aug 23 09:43:24 pfSense4 charon[76421]: 10[NET] <con21|42> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 10[NET] <con10|35> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 10[ENC] <con10|35> parsed INFORMATIONAL response 1084 [ ] Aug 23 09:43:24 pfSense4 charon[76421]: 13[IKE] <con26|48> sending DPD request Aug 23 09:43:24 pfSense4 charon[76421]: 13[ENC] <con26|48> generating INFORMATIONAL request 5576 [ ] Aug 23 09:43:24 pfSense4 charon[76421]: 13[NET] <con26|48> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 13[NET] <con26|48> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (80 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 13[ENC] <con26|48> parsed INFORMATIONAL response 5576 [ ] Aug 23 09:43:24 pfSense4 charon[76421]: 07[CFG] vici client 411 connected Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] vici client 411 registered for: list-sa Aug 23 09:43:24 pfSense4 charon[76421]: 07[CFG] vici client 411 requests: list-sas Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] vici client 411 disconnected Aug 23 09:43:24 pfSense4 charon[76421]: 07[CFG] vici client 412 connected Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] vici client 412 registered for: control-log Aug 23 09:43:24 pfSense4 charon[76421]: 14[CFG] vici client 412 requests: initiate Aug 23 09:43:24 pfSense4 charon[76421]: 14[CFG] vici initiate CHILD_SA 'con15_105' Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> initiating IKE_SA con15[93] to xxx.xxx.xxx.xxx Aug 23 09:43:24 pfSense4 charon[76421]: 06[CFG] vici client 413 connected Aug 23 09:43:24 pfSense4 charon[76421]: 05[CFG] vici client 413 registered for: control-log Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> configured proposals: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048 Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> sending supported signature hash algorithms: sha256 sha384 sha512 identity Aug 23 09:43:24 pfSense4 charon[76421]: 11[ENC] <con15|93> generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ] Aug 23 09:43:24 pfSense4 charon[76421]: 09[CFG] vici client 413 requests: initiate Aug 23 09:43:24 pfSense4 charon[76421]: 09[CFG] vici initiate CHILD_SA 'con15_104' Aug 23 09:43:24 pfSense4 charon[76421]: 11[NET] <con15|93> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (464 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 11[NET] <con15|93> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (639 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 11[ENC] <con15|93> parsed IKE_SA_INIT response 0 [ SA KE No V V N(NATD_S_IP) N(NATD_D_IP) CERTREQ N(FRAG_SUP) V ] Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> received Cisco Delete Reason vendor ID Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> received Cisco Copyright (c) 2009 vendor ID Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> received FRAGMENTATION vendor ID Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> selecting proposal: Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> proposal matches Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> received proposals: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048 Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> configured proposals: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048 Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> selected proposal: IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_2048 Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> received 3 cert requests for an unknown ca Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> authentication of 'xxx.xxx.xxx.xxx' (myself) with pre-shared key Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> proposing traffic selectors for us: Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> xxx.xxx.xxx.xxx/24|/0 Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> proposing traffic selectors for other: Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> xxx.xxx.xxx.xxx/24|/0 Aug 23 09:43:24 pfSense4 charon[76421]: 11[CFG] <con15|93> configured proposals: ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ Aug 23 09:43:24 pfSense4 charon[76421]: 11[IKE] <con15|93> establishing CHILD_SA con15_105{2977} Aug 23 09:43:24 pfSense4 charon[76421]: 11[ENC] <con15|93> generating IKE_AUTH request 1 [ IDi N(INIT_CONTACT) IDr AUTH N(ESP_TFC_PAD_N) SA TSi TSr N(EAP_ONLY) N(MSG_ID_SYN_SUP) ] Aug 23 09:43:24 pfSense4 charon[76421]: 11[NET] <con15|93> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (272 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 15[NET] <con15|93> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (272 bytes) Aug 23 09:43:24 pfSense4 charon[76421]: 15[ENC] <con15|93> parsed IKE_AUTH response 1 [ V IDr AUTH SA TSi TSr N(ESP_TFC_PAD_N) N(NON_FIRST_FRAG) ] Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> authentication of 'customer.domain.nl' with pre-shared key successful Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> IKE_SA con15[93] established between xxx.xxx.xxx.xxx[xxx.xxx.xxx.xxx]...xxx.xxx.xxx.xxx[customer.domain.nl] Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> scheduling rekeying in 74785s Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> maximum IKE_SA lifetime 83425s Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> received ESP_TFC_PADDING_NOT_SUPPORTED, not using ESPv3 TFC padding Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> selecting proposal: Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> proposal matches Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> received proposals: ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> configured proposals: ESP:AES_CBC_256/HMAC_SHA2_256_128/MODP_2048/NO_EXT_SEQ Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> selected proposal: ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> selecting traffic selectors for us: Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> config: xxx.xxx.xxx.xxx/24|/0, received: xxx.xxx.xxx.xxx/24|/0 => match: xxx.xxx.xxx.xxx/24|/0 Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> selecting traffic selectors for other: Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> config: xxx.xxx.xxx.xxx/24|/0, received: xxx.xxx.xxx.xxx/24|/0 => match: xxx.xxx.xxx.xxx/24|/0 Aug 23 09:43:24 pfSense4 charon[76421]: 15[IKE] <con15|93> CHILD_SA con15_105{2977} established with SPIs c2fe8f8f_i 1b28265f_o and TS xxx.xxx.xxx.xxx/24|/0 === xxx.xxx.xxx.xxx/24|/0 Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> proposing traffic selectors for us: Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> xxx.xxx.xxx.xxx/24|/0 Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> proposing traffic selectors for other: Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> xxx.xxx.xxx.xxx/24|/0 Aug 23 09:43:24 pfSense4 charon[76421]: 15[CFG] <con15|93> configured proposals: ESP:AES_CBC_256/HMAC_SHA2_256_128/MODP_2048/NO_EXT_SEQ Aug 23 10:12:01 pfSense4 charon[99301]: 00[DMN] Starting IKE charon daemon (strongSwan 5.9.2, FreeBSD 12.2-STABLE, amd64) Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] PKCS11 module '<name>' lacks library path Aug 23 10:12:01 pfSense4 charon[99301]: 00[KNL] unable to set UDP_ENCAP: Invalid argument Aug 23 10:12:01 pfSense4 charon[99301]: 00[NET] enabling UDP decapsulation for IPv6 on port 4500 failed Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading unbound resolver config from '/etc/resolv.conf' Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading unbound trust anchors from '/usr/local/etc/ipsec.d/dnssec.keys' Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] ipseckey plugin is disabled Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading ca certificates from '/usr/local/etc/ipsec.d/cacerts' Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading aa certificates from '/usr/local/etc/ipsec.d/aacerts' Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading ocsp signer certificates from '/usr/local/etc/ipsec.d/ocspcerts' Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading attribute certificates from '/usr/local/etc/ipsec.d/acerts' Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading crls from '/usr/local/etc/ipsec.d/crls' Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loading secrets from '/usr/local/etc/ipsec.secrets' Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] opening triplet file /usr/local/etc/ipsec.d/triplets.dat failed: No such file or directory Aug 23 10:12:01 pfSense4 charon[99301]: 00[CFG] loaded 0 RADIUS server configurations Aug 23 10:12:01 pfSense4 charon[99301]: 00[LIB] loaded plugins: charon unbound pkcs11 aes des blowfish rc2 sha2 sha1 md4 md5 random nonce x509 revocation constraints pubkey pkcs1 pkcs7 pkcs8 pkcs12 pgp dnskey sshkey ipseckey pem openssl fips-prf curve25519 xcbc cmac hmac drbg curl attr kernel-pfkey kernel-pfroute resolve socket-default stroke vici updown eap-identity eap-sim eap-md5 eap-mschapv2 eap-dynamic eap-radius eap-tls eap-ttls eap-peap xauth-generic xauth-eap xauth-pam whitelist addrblock counters Aug 23 10:12:01 pfSense4 charon[99301]: 00[JOB] spawning 16 worker threads Aug 23 10:12:02 pfSense4 charon[99301]: 04[NET] <1> received packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (894 bytes) Aug 23 10:12:02 pfSense4 charon[99301]: 04[ENC] <1> parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ] Aug 23 10:12:02 pfSense4 charon[99301]: 04[CFG] <1> looking for an IKEv2 config for xxx.xxx.xxx.xxx...xxx.xxx.xxx.xxx Aug 23 10:12:02 pfSense4 charon[99301]: 04[IKE] <1> no IKE config found for xxx.xxx.xxx.xxx...xxx.xxx.xxx.xxx, sending NO_PROPOSAL_CHOSEN Aug 23 10:12:02 pfSense4 charon[99301]: 04[ENC] <1> generating IKE_SA_INIT response 0 [ N(NO_PROP) ] Aug 23 10:12:02 pfSense4 charon[99301]: 04[NET] <1> sending packet: from xxx.xxx.xxx.xxx[500] to xxx.xxx.xxx.xxx[500] (36 bytes) Aug 23 10:12:02 pfSense4 charon[99301]: 15[CFG] vici client 1 connected Aug 23 10:12:02 pfSense4 charon[99301]: 15[CFG] vici client 1 requests: get-keys Aug 23 10:12:02 pfSense4 charon[99301]: 15[CFG] vici client 1 requests: get-shared Aug 23 10:12:02 pfSense4 charon[99301]: 15[CFG] vici client 1 requests: load-shared
Scrolling back through the logs show no obvious reason or clues to me. It just looks like it completely stalls right in the middle of the P2 negotiation phase, without any previous warning or sign this is going to happen.
I have brought the connections up and down numerous times during the weekend without being able to reproduce and now all of a sudden it happens again. The only thing I can say for sure right now that so far it always has been triggered by manually bringing up a connection from the status page.As all logging also stop it's quite hard to tell what is going on here. Could it be https://github.com/strongswan/strongswan/issues/268, which is a 5.9.2 issue? Any plan for 5.9.3, or is it not ported to FreeBSD yet?
Julian
-
I see strongSwan 5.9.3 in FreeBSD ports but we haven't pulled it in yet. It is possible that bug is related though that may not be the same situation you are hitting.
Is there anything odd about the negotiation in the remote logs when that happened?
I don't see anything there that stands out as being a problem, but I know the
swanctl
commands to initiate and terminate usually end up staying active until the negotiation is complete, which may hold onto a VICI slot longer than expected.The status page should probably run it with
--force
or a-timeout
value now that I'm looking at the code again. -
-
Is there anything odd about the negotiation in the remote logs when that happened?
Not that Iโm aware of, but I have to ask if they are willing to provide any logging if they have it, as I donโt manage the remote end. But as I have seen this happening on multiple PFsense boxes already with all kinds of connection with different brands on the other side, I doubt if this is coming from the other side or the logging from the other side will make it more clear.
It's worth a try though.I don't see anything there that stands out as being a problem, but I know the
swanctl
commands to initiate and terminate usually end up staying active until the negotiation is complete, which may hold onto a VICI slot longer than expected.The status page should probably run it with
--force
or a-timeout
value now that I'm looking at the code again.If that would prevent this situation from happening it would be great. Still wonder in wat state the service is. It is still running (unfortunately, otherwise service watchdog would have catched it) but is not responsive in any way. Do you think a --force or -timeout would prevent this state?
Anything other for me to do now? More logging or higher log levels wonโt be at any help, I think?
-
The new version of strongSwan will be in builds maybe as early as tomorrow, along with the code change I just put in for the initiate timeout and forced terminate. So I'd keep tracking snapshots this week and see if you can still replicate it in a couple days.
You could try increasing the log level on the configuration section but it's still not clear what, if anything, is happening there that could be adding to the problem.
-
Can I track progress on Redmine? Or is it not yet in there?
I see. Tnx a lot for your effort!Best regards,
Julian
-
Is there a way to see how many VICI sockets are currently in use? Limit is 3, right?
-
You can kind of get an idea by looking at:
: sockstat | grep -i vici root swanctl 60061 7 stream -> /var/run/charon.vici
There will be some other entries from
charon
of its own but those are for serving the info not requesting.You should also be able to check for a backlog of requests by looking at this:
: netstat -LAn | grep vici unix 0/0/3 /var/run/charon.vici
-
Thank you!
so 0/0/3 means 3 sockets available, 0 in use, 0 in queue, I guess.
I just now realise the Listen queue overflow messages in my start post actually might be the VICI connections. IIRC the listen queue is defined as ~ 1.5 * number of allowed connections, so the 5 already in queue awaiting acceptance makes sense now.
-
@wickeren said in Charon becoming unresponsive:
Thank you!
so 0/0/3 means 3 sockets available, 0 in use, 0 in queue, I guess.
-L Show the size of the various listen queues. The first count shows the number of unaccepted connections, the second count shows the amount of unaccepted incomplete connections, and the third count is the maximum number of queued connections.
I just now realise the Listen queue overflow messages in my start post actually might be the VICI connections. IIRC the listen queue is defined as ~ 1.5 * number of allowed connections, so the 5 already in queue awaiting acceptance makes sense now.
Exactly, that's what started my line of questioning above.
-
Have not been able to reproduce after using your fix in ipsec.inc, at least so far. In the meantime I'm using zabbix now to check the ipsec.log timestamp to see if logging has stopped and alarm me.
Is the upgrade to Strongswan 5.9.3 visible in Redmine or do I just have to check he nightly builds?Best regards,
Julian
-
It is there now. I checked a vm running
2.6.0.a.20210825.0500
and it hasstrongswan-5.9.3