IPSec tunnels nomore available after a few days
we encounter an issue with our pfsense firewall. We have 40 IPSec tunnels between pfsense and SnapGear appliances. The tunnels were running for 2 weeks without any problem (the SnapGear clients have dynamic ip's with dyndns, and ip's are changing everyday). This morning we lost connexion on every ipsec tunnels at the same time, the appliances were not able to connect again. PPTP clients, nat, and other features were still working correctly.
Rebooting the pfsense solved the problem. As you know, pfsense is automatically clearing the logs at reboot by default… so i don't have the logs anymore..
Do you have an idea of what can have caused the problem ?
Thx for your help !
Without the logs, there is no way to know.
In the future, you can try just resetting the racoon service under Status > Services. This should completely reset the IPsec daemon and allow the tunnels to reestablish. It would also be wise to check the IPsec logs (Status > System Logs, IPsec tab) before and after the reset.
If you have the capability, it may also be nice to redirect your logs to another internal system via syslog, so the logs can be retained long-term. The options for this are in the GUI for the logs on the settings tab, and I think there is a doc wiki article about setting up a remote syslog server. I know I wrote it up in the book, but I thought I also had some bits about it in the wiki, too.
i've had a crash again, i think this is the problem :
Mar 16 15:47:46 pfsense php: : IPSEC: One or more IPSEC tunnel endpoints has changed IP. Refreshing.
Mar 16 15:47:47 pfsense php: : Reloading IPsec tunnel 'mignault'. Previous IP '184.108.40.206', current IP '220.127.116.11'. Reloading policy
Mar 16 15:47:47 pfsense php: : IPSEC: Send a reload signal to the IPsec process
Mar 16 15:47:47 pfsense php: : The command '/usr/local/sbin/racoonctl -s /var/db/racoon/racoon.sock reload-config' returned exit code '1', the output was ''
Mar 16 15:47:48 pfsense kernel: pid 6135 (racoon), uid 0: exited on signal 11 (core dumped)
Do you have any idea ?
Signal 11 errors are almost always hardware errors. Typically memory or heating, but often they are PSU or MB/CPU related as well.
If you have a dynamic IP address for an endpoint, you might consider using a dynamic DNS service instead of the IP. This way it should update itself when the address changes. I have a dynamic IP on my home cable connection and use a dyndns and it's never dropped, even when I've changed IP's.
We are already using dyndns, and it's running fine. The problem is the racoon service crashing.
I've now reinstalled pfsense on 2 other servers. I've checked the ram using memtest86+ V4. No errors. On both computers, i still got racoon crashing with signal 11… Same problem on 3 completely different hardware... the problem must be elsewhere...
Do you have an idea of what we can do ?
Just a precision, in my first post i said we have about 40 tunnels, i must precise that the pfsense box is the only one with a fixed ip, every other endpoints are using dyndns, so the racoon config is reloaded quite often (the ip's are expiring every 24h) Generally the ip update process is working fine,but sometime it crashes as you can see…
May the hostname itself be the cause ? I don't know exactly what your php script is doing with the racoon config, I suppose it's replacing the modified IP by the new one, so it should not be the cause...