New IP on Dynamic DNS IPSEC killed racoon
-
I haven't seen that happen before, and I have a couple sites that rely on dyndns hostnames for IPsec to function on their site-to-site tunnels.
-
Thanks for the reply jimp. I was hoping this was just a once off issue but last night it happened again. I'm not 100% sure this is related to the DNS ipsec tunnel but the issue only began after it was added, and it looks like the change in IP address is the last thing to happen before things go pear shaped :s . I've attached the logs - hopefully someone can see something I've missed.
Would there be any way the core dump could be used to get some more information about this problem? Maybe there's a way I can add a cron job to run every minute to restart racoon if it crashes again? I'm willing to try anything to tackle this problem.
I found a possibly related issue here - http://forum.pfsense.org/index.php?topic=18330.0 . Loki seemed to be having trouble with the dnswatch command and his last post in the linked thread showed a similar error in the logs as the one I'm getting.
Thanks in advance for any input on this problem - Skirmish.
Edit: Just thought I'd add that this isn't happening every time the IP changes - the IP changes a couple of times a day and racoon is stopping every few weeks.
.0.251.250.1 2010-10-17 20:56:06 kernel: pid 55846 (racoon), uid 0: exited on signal 11 (core dumped) kern info 10.251.250.1 2010-10-17 20:56:05 php: : IPSEC: Send a reload signal to the IPsec process user warning 10.251.250.1 2010-10-17 20:56:05 php: : The command '/usr/local/sbin/racoonctl -s /var/db/racoon/racoon.sock reload-config' returned exit code '1', the output was '' user warning 10.251.250.1 2010-10-17 20:56:05 racoon: INFO: 10.251.250.1[500] used as isakmp port (fd=42) d info 10.251.250.1 2010-10-17 20:56:05 racoon: INFO: 192.168.0.254[500] used as isakmp port (fd=41) d info 10.251.250.1 2010-10-17 20:56:05 racoon: INFO: 10.20.251.1[500] used as isakmp port (fd=39) d info 10.251.250.1 2010-10-17 20:56:05 racoon: INFO: 127.0.0.1[500] used as isakmp port (fd=40) d info 10.251.250.1 2010-10-17 20:56:05 racoon: INFO: IPsec-SA request for 211.27.66.101 queued due to no phase1 found. d info 10.251.250.1 2010-10-17 20:56:05 racoon: INFO: xxx.70.164.94[500] used as isakmp port (fd=43) d info 10.251.250.1 2010-10-17 20:56:05 racoon: INFO: begin Identity Protection mode. d info 10.251.250.1 2010-10-17 20:56:05 racoon: INFO: initiate new phase 1 negotiation: xxx.70.164.94[500]<=>211.27.66.101[500] d info 10.251.250.1 2010-10-17 20:56:04 php: : IPSEC: One or more IPSEC tunnel endpoints has changed IP. Refreshing. user warning 10.251.250.1 2010-10-17 20:56:04 racoon: INFO: unsupported PF_KEY message REGISTER d info 10.251.250.1 2010-10-17 20:56:04 php: : Reloading IPsec tunnel 'Sydenham'. Previous IP '203.134.22.98', current IP '211.27.66.101'. Reloading policy user warning 10.251.250.1 2010-10-17 20:56:03 racoon: INFO: phase2 sa deleted xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:56:02 racoon: INFO: phase2 sa expired xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:56:02 racoon: INFO: request for establishing IPsec-SA was queued due to no phase1 found. d info 10.251.250.1 2010-10-17 20:55:54 racoon: INFO: phase2 sa deleted xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:53 racoon: INFO: phase2 sa expired xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:53 racoon: INFO: request for establishing IPsec-SA was queued due to no phase1 found. d info 10.251.250.1 2010-10-17 20:55:42 racoon: INFO: phase2 sa deleted xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:41 racoon: INFO: phase2 sa expired xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:41 racoon: INFO: request for establishing IPsec-SA was queued due to no phase1 found. d info 10.251.250.1 2010-10-17 20:55:37 racoon: INFO: phase2 sa deleted xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:36 racoon: INFO: IPsec-SA request for 203.134.22.98 queued due to no phase1 found. d info 10.251.250.1 2010-10-17 20:55:36 racoon: INFO: phase2 sa expired xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:36 racoon: INFO: initiate new phase 1 negotiation: xxx.70.164.94[500]<=>203.134.22.98[500] d info 10.251.250.1 2010-10-17 20:55:36 racoon: INFO: begin Identity Protection mode. d info 10.251.250.1 2010-10-17 20:55:24 racoon: ERROR: phase1 negotiation failed due to time up. 2e7ee71c7e7a7be9:0000000000000000 d info 10.251.250.1 2010-10-17 20:55:19 racoon: INFO: phase2 sa deleted xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:18 racoon: INFO: phase2 sa expired xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:18 racoon: INFO: request for establishing IPsec-SA was queued due to no phase1 found. d info 10.251.250.1 2010-10-17 20:55:15 racoon: INFO: phase2 sa deleted xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:14 racoon: INFO: phase2 sa expired xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:14 racoon: INFO: request for establishing IPsec-SA was queued due to no phase1 found. d info 10.251.250.1 2010-10-17 20:55:05 racoon: INFO: phase2 sa deleted xxx.70.164.94-203.134.22.98 d info 10.251.250.1 2010-10-17 20:55:04 racoon: INFO: request for establishing IPsec-SA was queued due to no phase1 found. d info 10.251.250.1 2010-10-17 20:55:04 racoon: INFO: phase2 sa expired xxx.70.164.94-203.134.22.98
-
The timing is suspicious, but it's still hard to say. In that other thread it was a bad pre-release snapshot build that was the problem. If you're on 1.2.3-RELEASE that wouldn't be the issue.
You could backup your config and try a build of 2.0 beta to see if the behavior is better there.
-
Thanks for the reply jimp, I'll try & get a backup done of the current system - if I can find a way to quickly switch back to 1.2.3 in case of an emergency then I'll happily try 2.0 to see if the behaviour improves.
-
If you have a 1.2.3 CD and a USB stick handy, use this:
http://doc.pfsense.org/index.php/Automatically_Restore_During_Install -
Thanks for that, it's just what I was after. Racoon has begun going down on every IP change now so I'll most likely try this tonight after business closes, I'll let you know how things go. Wish me luck!
Edit: This may be a silly question, but is there a particular build of 2.0 Beta that's more stable than others?
-
Except in extremely rare occasions, the most recent beta build can be assumed to be the most stable.
-
Thanks submicron - I'll try the latest build.
-
Ok, running the latest build of PfSense 2.0 Beta 4 for i386 the results are in: Racoon is not going down completely on new IP at the remote site but the tunnel to that site is. Definitely a positive but for the other issues I'm having with the beta. The tunnel to the site with a dynamic dns is still going down on IP change and no amount of rebooting the router on the other end brings it up. The only way I've found to bring up the tunnel is to restart racoon.
-
This may be a shot in the dark, but are you sure you have exactly the same timeouts configured for both P1 and P2 on each end of the tunnel? This almost sounds like a security policy still exists on one side of the tunnel when the IP changes. When this happens again, check to see if you still have SPDs in place referencing the broken tunnel. If you do, kill them and re-init the tunnel. I think this may be the bug Jimp was talking about, and it was definitely fixed in 1.2.3-release, but it doesn't hurt to double-check.
-
2.0 still has an issue where DPD will fail to tear down a tunnel completely when the other end dies, despite printing messages to log saying it has removed the tunnel. I have to kill racoon on my 2.0 box with some regularity to regain connectivity to a remote site if it goes offline.
-
Thanks for the input guys, I've checked the lifetimes on both sides: phase1 and 2 are both set to 3600 on each side currently. The only slightly different timeout is the DPD as the routers have different fields. I've set PfSense to 10 second delay, 12 retries and the Snapgear to 10 second delay, 120 second timeout.
Ideally I'd be back with 1.2.3 release - I've had some issues with 2.0's stability - but it might be worth sticking with it if only one tunnel goes down instead of several. It's still a mystery to me why I'm having a problem with 1.2.3 release, I feel like PfSense & Snapgears don't get along too well… Maybe it's just the DPD thing jimp mentioned in 2.0b.
Thanks again for working with me on this. -
Hey jimp, I noticed something this morning - the IPSEC to the dynamic dns site had been down a few hours. Thing is the lifetime on both phases was set to 1 hour. Is this still part of the bug in 2.0? I could be wrong but I thought that if the SA was set to 1hr then PfSense would try to re-establish the connection after the lifetime expired even with the DPD bug.