Charon crashing

wickeren

On multiple 2.2.4 install I see charon crashes, on one site with only 2 tunnels it crashes almost every our.

kernel: pid XXXXX (charon), uid 0: exited on signal 6 (core dumped)
Most of the time the tunnels have to be started manualy then.

Any ideas? Wat are my troubleshooting options in this case? What extra logging is helpful to turn on in this case?

Guest

You talk about IPSec VPN? Sorry but there a re many failures reported in the last time span and many users
went away from the IPSec to the OpenVPN option.

cmb

What's in the IPsec log right before it crashing? Is there a core file left behind somewhere? 32 or 64 bit? What kind of hardware? What kind of configuration are you running?

@BlueKobold:

You talk about IPSec VPN? Sorry but there a re many failures reported in the last time span and many users
went away from the IPSec to the OpenVPN option.

That's generally not true. While the first couple 2.2.x were bumpy for a number of people, and there are always edge cases (and always have been, in every device ever created that does IPsec), things work well for nearly everyone at this point. charon crashing definitely isn't a common issue.

Guest

hat's generally not true. While the first couple 2.2.x were bumpy for a number of people, and there are always edge cases

Fine to hear about that.

(and always have been, in every device ever created that does IPsec), things work
well for nearly everyone at this point. charon crashing definitely isn't a common issue.

Also good to hear about that too, there fore I will also try it another time with IPSec in the near future.

HHR

I have the same issue (https://forum.pfsense.org/index.php?topic=97347.0).
Nearly every day charon crashes at my site. My log file was too small to save the Ipsec log for the time of crashing, but now i've increased the log files. I should have the log when charon crashes tomorrow.

doktornotor

@HHR:

Nearly every day charon crashes at my site.

How dare you! I never crash, let alone at your bloody site!!!

cmb

strongswan 5.3.3 fixed a crash in rekeying with mismatched DH group.
https://wiki.strongswan.org/projects/strongswan/wiki/Changelog53

That would be possible to hit on connections that otherwise work in some circumstances, like if the remote iterates through multiple DH group options.

Should have 2.2.5 snapshots up with 5.3.3 in the next day or so if it checks out fine.

wickeren

@cmb:

What's in the IPsec log right before it crashing? Is there a core file left behind somewhere? 32 or 64 bit? What kind of hardware? What kind of configuration are you running?

all installs are 64 bit full installs, some of them on KVM, some on bare metal. Can't find any core dumps. Gonna have a look at the full IPSec log, it could very well being a rekeying issue. Crashing almost every hour and a phase2 lifetime of 3600 points in that direction

wickeren

From the ipsec.log:
Sep 18 07:35:48 pfSense2 charon: 10[ENC] <con3|2>parsed CREATE_CHILD_SA request 166 [ SA No KE ]
Sep 18 07:35:48 pfSense2 charon: 10[IKE] <con3|2>x.x.x.x is initiating an IKE_SA
Sep 18 07:35:48 pfSense2 charon: 10[IKE] <con3|2>x.x.x.x is is initiating an IKE_SA
Sep 18 07:35:48 pfSense2 charon: 10[IKE] <con3|2>DH group MODP_1536 inacceptable, requesting MODP_1024
Sep 18 07:35:48 pfSense2 charon: 10[IKE] <con3|2>DH group MODP_1536 inacceptable, requesting MODP_1024
Sep 18 07:35:48 pfSense2 charon: 10[DMN] thread 10 received 11
Sep 18 07:35:48 pfSense2 charon: 10[LIB] dumping 2 stack frame addresses:
Sep 18 07:35:48 pfSense2 charon: 10[LIB] /lib/libthr.so.3 @ 0x80114f000 (_swapcontext+0x15a) [0x80115d47a]
Sep 18 07:35:48 pfSense2 charon: 10[LIB] ->
Sep 18 07:35:48 pfSensCLOG

Indeed it seems to be going wrong with rekeying with the other site trying incorrect DH groups, so I'm hit by bug described here:
https://wiki.strongswan.org/projects/strongswan/wiki/Changelog53</con3|2></con3|2></con3|2></con3|2></con3|2>

wickeren

yesterdays snapshot seems to include strongswan 5.3.3 and so far charon haven't crashed anymore :)

cmb

@wickeren:

yesterdays snapshot seems to include strongswan 5.3.3 and so far charon haven't crashed anymore :)

Yes it's been updated. Glad that took care of it. The problem you were seeing there was definitely that mismatched DH group crash.

HHR

This is my ipsec.log when charon crashes:

Sep 22 15:38:26 pfsense charon: 14[NET] <con13|1953> sending packet: from a.a.a.a[500] to b.b.b.b[500] (76 bytes)
Sep 22 15:38:26 pfsense charon: 14[NET] <con13|1953> received packet: from b.b.b.b[500] to a.a.a.a[500] (76 bytes)
Sep 22 15:38:26 pfsense charon: 14[ENC] <con13|1953> parsed INFORMATIONAL response 112 [ ]
Sep 22 15:38:26 pfsense charon: 14[IKE] <con10|1994> retransmit 1 of request with message ID 2
Sep 22 15:38:26 pfsense charon: 14[IKE] <con10|1994> retransmit 1 of request with message ID 2
Sep 22 15:38:26 pfsense charon: 14[NET] <con10|1994> sending packet: from a.a.a.a[500] to c.c.c.c[500] (444 bytes)
Sep 22 15:38:26 pfsense charon: 14[NET] <con10|1994> received packet: from c.c.c.c[500] to a.a.a.a[500] (36 bytes)
Sep 22 15:38:26 pfsense charon: 14[ENC] <con10|1994> parsed IKE_SA_INIT response 0 [ N(INVAL_MID) ]
Sep 22 15:38:26 pfsense charon: 14[IKE] <con10|1994> received message ID 0, expected 2\. Ignored
Sep 22 15:38:26 pfsense charon: 14[IKE] <con10|1994> received message ID 0, expected 2\. Ignored
Sep 22 15:38:27 pfsense charon: 14[IKE] <con9|1989> giving up after 5 retransmits
Sep 22 15:38:27 pfsense charon: 14[IKE] <con9|1989> giving up after 5 retransmits
Sep 22 15:38:27 pfsense charon: 14[IKE] <con9|1989> peer not responding, trying again (2/3)
Sep 22 15:38:27 pfsense charon: 14[IKE] <con9|1989> peer not responding, trying again (2/3)
Sep 22 15:38:27 pfsense charon: 14[DMN] <con9|1989> thread 14 received 11
Sep 22 15:38:27 pfsense charon: 14[LIB] <con9|1989>  dumping 2 stack frame addresses:
Sep 22 15:38:27 pfsense charon: 14[LIB] <con9|1989>   /lib/libthr.so.3 @ 0x80114f000 (_swapcontext+0x15a) [0x80115d47a]
Sep 22 15:38:27 pfsense charon: 14[LIB] <con9|1989>     -> 
Sep 22 15:38:27 pfsense charon: 14[LIB] <con9|1989>   /lib/libthr.so.3 @ 0x80114f000 (sigaction+0x342) [0x80115d062]
Sep 22 15:38:27 pfsense charon: 14[LIB] <con9|1989>     -> 
Sep 22 15:38:27 pfsense charon: 14[DMN] <con9|1989> killing ourself, received critical signal
Sep 22 15:38:36 pfsense ipsec_starter[84815]: charon has died -- restart scheduled (5sec)
Sep 22 15:38:41 pfsense charon: 00[DMN] Starting IKE charon daemon (strongSwan 5.3.2, FreeBSD 10.1-RELEASE-p15, amd64)
Sep 22 15:38:41 pfsense charon: 00[KNL] unable to set UDP_ENCAP: Invalid argument
Sep 22 15:38:41 pfsense charon: 00[NET] enabling UDP decapsulation for IPv6 on port 4500 failed
Sep 22 15:38:41 pfsense charon: 00[CFG] ipseckey plugin is disabled</con9|1989></con9|1989></con9|1989></con9|1989></con9|1989></con9|1989></con9|1989></con9|1989></con9|1989></con9|1989></con9|1989></con10|1994></con10|1994></con10|1994></con10|1994></con10|1994></con10|1994></con10|1994></con13|1953></con13|1953></con13|1953>

I guess the DH group is not my problem. :(

cmb

@HHR:

I guess the DH group is not my problem. :(

Not with the exact same symptoms, but that fix is described as "If the responder declines our KE payload during a CHILD_SA rekeying migrate() is called to reuse the child-create task. But the child-rekey task then calls the same method again.".

You're getting INVAL_MID (invalid message ID) error in return during rekeying right before it crashes, so that could be something similar with the same root cause.

If that's a crash you encounter regularly, upgrading to latest 2.2.5 from snapshots to get the latest strongswan 5.3.3 is the best next step. It's possible it's a different symptom of the same problem, or a different problem that's been fixed already. If it's not, then knowing whether you're seeing the INVAL_MID right before each crash would be telling.

HHR

Short Update: After 2 weeks of testing pfsense 2.2.5 with no crashing, i updated the production system on friday. And what should i say: no crashing until now. Thank you

geocast

I'm having similar issues.

https://forum.pfsense.org/index.php?topic=100779.0

I've just updatet to the latest 2.2.5 as advised here.

See if it helps. The loading of diag_ipsec.php still needs some time.