PFSense <–> PFsense: IPSEC Tunnels Losing Connectivity

csnf

Same issue here. I have 2.0 on the main IPSec tunnel and 1.2.3 on 8 different machines and randomly stop sending data across the tunnel. I have to restart raccoon to get things working again. This only happens when I upgraded to 2.0. I hope somebody can isolate this issue.

Zeon

Hey guys,
Just to let you all know I'm going to try what was suggested in this thread:
http://forum.pfsense.org/index.php/topic,41617.0.html

So remove the NAT-T traversal and dead peer detection and see how that goes.

jmarquez

Hi all.

Same frustrating problem here with 2 VPN using pfSense 2.0.1 in all sides.

I read in some post that this only happens from version 2.0 up, so I might downgrade to 1.2.3 as this issue makes unusable the VPN connection.

Hope this is fixed soon.

Regards,
Jesus

cmb

@jmarquez:

I read in some post that this only happens from version 2.0 up, so I might downgrade to 1.2.3 as this issue makes unusable the VPN connection.

That's not true, it happens on occasion with every IPsec implementation on every device in the world. 2.0.x does not have any general IPsec problems. It's most always related to misconfigurations. Most commonly, mismatched lifetimes on P1 and/or P2 for the symptoms described here, though at times it can be circumstances where you need DPD enabled.

There isn't enough info here on any of the reported issues to troubleshoot, and every issue is likely a different cause, so if you're having issues please start your own thread with specifics - IPsec logs from both sides in particular.

Zeon - this one's your thread, post your IPsec logs from the other end. The bit shown here just shows one end renegotiated successfully.

jmarquez

Don't get me wrong cmb.

I'm really happy using pfSense. I think that it is a great peace of code.
I agree with you about every person's issue related to ipSec. My issue is similar to the ones related on this thread just in the fact that tunnels drop randomly.

In my particular problem, I followed the steps described by Zeon post (http://forum.pfsense.org/index.php/topic,41617.0.html) and the tunnel have not dropped so far.

All the best.

Zeon

@cmb:

@jmarquez:

I read in some post that this only happens from version 2.0 up, so I might downgrade to 1.2.3 as this issue makes unusable the VPN connection.

That's not true, it happens on occasion with every IPsec implementation on every device in the world. 2.0.x does not have any general IPsec problems. It's most always related to misconfigurations. Most commonly, mismatched lifetimes on P1 and/or P2 for the symptoms described here, though at times it can be circumstances where you need DPD enabled.

There isn't enough info here on any of the reported issues to troubleshoot, and every issue is likely a different cause, so if you're having issues please start your own thread with specifics - IPsec logs from both sides in particular.

Zeon - this one's your thread, post your IPsec logs from the other end. The bit shown here just shows one end renegotiated successfully.

Hi CMB,
Firstly, I can say after a few days of disabled DPD and NT-T that I have had no further dropouts and couldn't be happier. This is true across 6 separate tunnels with some having latency of 1ms and others as high as 30ms (throughput of the internet connections is anywhere between 100mbps to 30mbps).

Unfortunately i don't have the logs of the problem anymore but will try to recreate them one weekend for the benefit of the other users on here.

Out of interest when is DPD needed? I have had situation where I have knocked a cable out for up to 10 seconds and the tunnel still seems to work fine once I plug back in?

cmb

@Zeon:

Firstly, I can say after a few days of disabled DPD and NT-T that I have had no further dropouts and couldn't be happier. This is true across 6 separate tunnels with some having latency of 1ms and others as high as 30ms (throughput of the internet connections is anywhere between 100mbps to 30mbps).

Disabling NAT-T where you don't need it is a good thing to do. For DPD, as long as it's enabled on both sides with the same settings you should be good. That's what we use on all ours internally.

@Zeon:

Unfortunately i don't have the logs of the problem anymore but will try to recreate them one weekend for the benefit of the other users on here.

Out of interest when is DPD needed? I have had situation where I have knocked a cable out for up to 10 seconds and the tunnel still seems to work fine once I plug back in?

Circumstances where one end drops an SA and the other doesn't recognize when that SA is no longer valid is where DPD fixes having to force restart one or both ends. That may be a reboot on one side or the other (primarily an unplanned one like a power outage or yanking the plug, an orderly reboot should tell the other end to clear it), or an IP change on one of the sides where there are dynamic WANs. Those are the two most common that I can think of offhand. Just knocking a cable out for a few seconds or minutes even is no big deal, unless you happen to get a new IP when it's reconnected (with dynamic WANs, the link up will force reconnect to your ISP, which with some will get you a new IP). If you still have the same IP, the existing SA is still valid and will work fine.

maldex

struggeling across this thread reminds me of the same issue i had a while ago as well, quite annoying, including against Astaro 8.2.
I wouldn't vow this but crosschecking my config now, one of the configuration change leftovers since the performance tests we did quiet a while ago (<v2.01) is="" that="" we're="" using="" <em="">Blowfish in Phase1 now. It never happened again so i completly forgot about this. I'm using the my 2.01(dyn-IP) now against both, pfsense 2.01(also dyn-ip) and Astaro V8.3 (fixed-ip):

All have public IPs (not nat involved, Nat Traversal disabled)
Default Mutual PSK, Main mode (btw i thought this cannot work with ipsec by definition? well done!!! :)) , My & Peer IP Address, Default Policy Gen. and Proposal Checking.
Phase1:
– Encryption algorithm: Blowfish 256
– Hash algorithm: SHA1
– DH key group: 5 and Lifetime: 86400
– DPD: Enabled, 10 Detection and 5 retries
Phase2:
– Encryption algorithms: AES 256 (Only this, no other proposal)
– Hash algorithms: MD5 (Only this, no other proposal)
– PFS key group: 5 and Lifetime: 86400. Auto Ping remote Host is set

yes, not the same encryption and hashing in phase 1 and 2, but even the one with 2xphase2 works stable now. Sorry, can't provide more details,

I'll let you guys know if i encounter a 'stalled' vpn again.

cheers
Josh</v2.01)>

boogieshafer

on the pfsense side, try setting the P1 Policy Generation to "unique"

i was having similar issues for subequent reconnects for the Shrew client where restarting the pfsense ipsec process would clear the issue

i did NOT need to disable NAT-T or DPD, just changing the P1 Policy Generation setting from "default" to "unique" was the only change i made

dhatz

It seems that several people are reporting IPsec VPN issues with pfsense 2.x (note: which includes the recent ipsec-tools 0.8.0). While some problems may be due to misconfiguration (e.g. the racoon / mpd conflict), the pfsense<->pfsense VPN scenario should be trouble-free.

As most of the problems posted here seem to be related to rekeying, I've been searching the ipsec-tools-devel mailing lists for clues. Check the following discussions:

http://old.nabble.com/why-is-SA-lifetime-kilobyte-limit-disabled-in-racoon–td31648198.html

Even if Node-A think IPsec-SA is expired at this time, Node-B doen't
think so. i.e. the states of IPsec-SA is mismatched.

Understand – similar things already happen with time-based
lifetimes if there is a clock skew between the two boxes.
(This is particulary bad if the oldest available SA is used
by the kernel.)

Racoon's strategy of rekeying is "Initiator do it." If Node-B
is responder, Node-A doesn't start rekeying even if IPsec-SA is
expired.
That sounds like a bug in racoon. It seems that if either end is
unsatisfied with the SA, that end should trigger a new one.

I'd also call this a shortcoming at least. The standards are
weak, and one doesn't know how other implementations behave.
It would be safer if both sides did care about renegotiations.

But the key
question is what the other implementions do, and what the standard says.

I've just tried OpenBSD's isakmpd (the oldish version in pkgsrc).
It initiates a Phase 2 exchange if the soft timeout on its
side expires, even if it was responder initially. (It randomizes
the soft timeouts to minimize the chance that both sides start
the exchange simultanously.)
PFC2409 says that both sides can initiate rekeying. "Can" --
this is not much of a guideline for implementors.

I can see the argument that especially with a 24h or less
lifetime, AES doesn't need volume-based rekeying.

OK, I was more concerned about interoperability. What if
the other side insists in some volume limit?

I've just tried OpenBSD's isakmpd (the oldish version in pkgsrc).
It initiates a Phase 2 exchange if the soft timeout on its
side expires, even if it was responder initially. (It randomizes
the soft timeouts to minimize the chance that both sides start
the exchange simultanously.)
PFC2409 says that both sides can initiate rekeying. "Can" --
this is not much of a guideline for implementors.

True, but it seems the original responder initiating a renegotiation is
the only reasonable behavior.

At the very least, it would appear to suggest that if the original
initiator rejects an attempt on the part of the original responder to
rekey, that's a bug.

True, but it seems the original responder initiating a renegotiation is
the only reasonable behavior.

If both side start rekeying at same time, there is/was a problem of
SA selection.

The two rekeying session makes two pair of IPsec-SAs. racoon can
do this, and IPsec implementations (kernel side) do one of following:

a. Use oldest IPsec-SA to send and keep all IPsec-SAs to receive(KAME)
b. Use newest IPsec-SA to send and keep all IPsec-SAs to receive(Fast IPsec)
c. Use newest IPsec-SA to send/receive and purge older IPsec-SAs

Of cause, c. is bad behavior, but small implementations(kernel side)
may handle only one sessions and one key pair at a time.
Standards don't prohibit this. This problem is exist between IKE
standards and IPsec standards. It seems IKEv2 makes this more clean.

Today, most implementations select b. or have configuration for it.
And racoon isn't used on other than KAME, Fast IPsec, or Linux(a. or b.)
I think your logic actually works fine. But racoon is old product,
so it doesn't catch recent trends up.

http://marc.info/?l=ipsec-tools-devel&m=129905181832157&w=2
http://marc.info/?l=ipsec-tools-devel&m=129916127621017&w=2

let me revive the discussion on an active negotiation,
as opposed to a passive daemon. Until recently my use
of IPsec was tied to isakmpd, ipsecctl, and OpenBSD
and my views are conditioned by this fact. There the
IPsec daemon is normally active in initiating its
negotiations at startup, unless told to configure
a passive listener for a particular tunnel/transport.
At the other extreme there is even a so called
active-only setting.

The implicit and default setting in racoon-0.7.3 is
"passive off", but this still waits for a demand to be
detected. Thus the mode is better described as "passive
until harshly bugged to get going"! The need to ping
and wait for a ridiculously long delay should not be
acceptable in most circumstances. Forgive me for the
critisism, but to me this is a design flaw. It is a
question of dependability and of trust to erect the
desired IPsec tunnels already at booting time.

Funny: when we tried to switch from racoon to isakmpd at work, a long
long time ago, this is one of the things we noticed on our TODO list:
patch isakmpd to negociate SAs only when traffic comes to the tunnel :-)

And this is how things should (can ?) be done according to RFC 2367
which provide SADB_ACQUIRE PFkey message….

Now, doing comparative browsing in the sources 0.7.3
and 0.8, the actual use of the variable PASSIVE in
"struct remoteconf" has indeed expanded somewhat.
Is the code progressing or maturing into a state
that allows an actively negotiating daemon? I.e.,
without waiting for traffic demand before commencing?

Not afaik.
Feel free to provide a patch for that, this would not be so
complicated to parse all config and start negociation for needed
tunnels, but there are also setups where we want to have tunnels
negociated only when needed (so when traffic comes to the tunnel), so
a patch will need to provide this feature as optional.
The best would be to have a peer-based (or sainfo based ?) token for
that.

Please also note that this is quite easy to also generate dummy
traffic for the needed tunnels when you activate the configuration if
you want.
And of course generate dummy traffic from time to time to ensure the
tunnel will always be up.