Non-reliable IPSec recovery after reboot.
I am experience a lot of unreliability with IPSec recovery after reboot in 2.1 beta.
I frequently have to go into the IPSec Overview and jumpstart the connection.
I have also seen that pfSense -> pFSense IPv4 tends to be the worst.
pfSense 2.1 -> Linksys RV082 (location1) IPv4 Fairly stable recovery, aprox 90-95% successfull
pfSense 2.1 -> pfSense 2.1 (location2) IPv6 Fairly stable recovery aprox 80-85% successfull
pfSense 2.1 -> pfSense 2.1 (location2) IPv4 Very often to almost every time not recovering. Has to jump-start it frequently.
The setup is very similar as to the RV082.
And in normal log mode I could not see the reason to why it doesn't try to start the session.
Nothing in the racoon log at all from what I could see.
I will try to catch a debug-log.
Seeing something similar between the last few snaps of 2.1 when linking via IP4 ipsec to pfsense 2.0.2 - auto connection sometimes fails, but a few forced restarts sems to clear it - only entries in logs are about no response.
The 2.0.2 box connects to another 2.0.2 box faultlessly.
Have you been running 2.1-BETA for some time (months) and if so, did the problems come up only recently?
I'm asking because it's only a couple weeks ago since the latest ipsec-tools v0.8.1 were added to 2.1-BETA, check http://forum.pfsense.org/index.php/topic,58179.0.html
I have run the 2.1 beta for several months , but the second VPN against another 2.1 installation is done recently
so I don't have a long track-record from prior ip-sec tools.
However, I have done some tracings and what I could see is the following:
1. The Linksys router is much more frequently trying to setup the connection sending ISAKMP messages.
For the second connection against pfSense 2.1 I could only see this for a short period and then it gives up and stops sending ISAKMP.
2. When I have manually started connection so both is up running OK and then makes a reboot,
my installation does not try to startup the connection after reboot is finished.
It just sits silent and does NOT try to send an ISAKMP Aggressive message to the destination even if the WAN link is up.
Sometimes it starts but take more than 4 minutes before it sends the first ISAKMP. Don't know what it is waiting for.
3. Sometimes it goes down and then it could stay down forever until I manually jumpstarts it.
4. The Linksys IPSec connection is pretty much rock-solid steady but much of this depends on that
it is much more frequently trying to initiate a session and when that happens, the pfSense answers it
and continues to set it up.
5. The IPv6 tunnel to the same pfSense installation also tends to be more stable (getting up and stay up)
than it's IPv4 counterpart. However, the IPv6 goes through a GIF tunnel to Tunnelbroker.net (in US) and from
there back to the second pfSense through similar GIF-tunnel terminated in the second pfSense.
The second pfSense is aprox 170km apart.
The IPv4 goes directly through WAN -> WAN port.
**To summarize I think the biggest problem is that the pfSense is very slow in initiating a connection.
Sometimes it does it but takes long time (several minutes) and sometimes it doesn't do it at all.
The reason for that I have no clue to why, but maybe it is a tuning thing in the ipsec-tools ?
This even worsens when you have 2 pfSense 2.1 installations acting the same way where
both is waiting to start the ISAKMP Aggressive session.
In my Linksys <-> pfSense, most likely the Linksys initiates it in 99.9% of the cases as it is
more "aggresive" in sending ISAKMP startup messages. This is why I haven't seen it until now
when I added a second VPN.**
UPDATE: Just to make it clear. I have DPD Enabled. default values: 10s, 5 retries.
IPsec is a complex protocol (let alone interoperability issues between IPsec implementations by vendors) and there can be many reasons why you're seeing these problems. You'd need to port full configs (/var/etc/ipsec/*.conf) and racoon debug log.
Anyway, imho racoon does have certain issues with how re-keying works, you can read some related discussion at e.g. http://forum.pfsense.org/index.php?topic=47969.0
I will do all this.
However, I could still not explain or understand why the racoon process is just waiting doing nothing
even if it detects that the WAN is up. It get's incoming ISAKMP messages to the first working VPN
connection and that is setup OK. (but that is because the other end was initiating it)
But the second is just silent and does not try to send out any messages at all.
The problem is when you have 2 machines acting like this. It's like being on a party
and everyone is to afraid to make the first move. :-)
I will try to catch some more details on this.
No clues from the logs - but picking up on some previous comments. It appears in my scenario that DPD is not working is terms of forcing a disconnect when the link is dead/inactive.
The 2.0.2 box shows the in vpn as inactive and the logs are full of no response/timeout errors - the the 2.1 beta is sitting there saying the link is active and that the last activity was some hours ago! DPD is set to 10 secs and 5 retries at both ends.
If the 2.0.2 box is restarted it tries hard to make the link - but is completely ignored by the 2.1 beta - as the link is till active?? Do the reverse and force the 2.1 box to restart and the 2.0.2 box detects the missing peer and initiates a new link perfectly.
SO DPD broken in 2.1?