Fix the VPN IPSEC Dead Peer Detection in 1.2.2 or 1.2.3 {$200}
-
You may also want to include any other relevant config from the Cisco if you can, things that might alter the way IPSec is handled.
-
Just remembered…Already posted the screenshots...see the post below.
http://forum.pfsense.org/index.php/topic,15678.msg81991.html#msg81991
-
Ok. Doesn't look like anything exotic… I'll see if my VPN concentrator still boots, we acquired it used quite some time ago and never needed it. :)
-
Just a little update: I have at least managed to reproduce the problem. I can get the tunnel to come up, ping, etc. Then if I reboot the VPN Concentrator, the tunnel is down on the VC side, and the pfSense side still believes the tunnel is active.
I have to bounce racoon to bring it back to life.
In this scenario, .40 is the pfSense box and .49 is the Cisco VPN Concentrator # You can see the DPD heartbeat every 10 seconds 15:31:54.836050 IP (tos 0x0, ttl 128, id 645, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:31:54.836829 IP (tos 0x0, ttl 64, id 7023, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:04.839087 IP (tos 0x0, ttl 128, id 646, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:04.839919 IP (tos 0x0, ttl 64, id 12303, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:14.842125 IP (tos 0x0, ttl 128, id 658, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:14.842992 IP (tos 0x0, ttl 64, id 2415, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:24.845161 IP (tos 0x0, ttl 128, id 685, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:24.845898 IP (tos 0x0, ttl 64, id 58238, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:34.848195 IP (tos 0x0, ttl 128, id 720, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:34.848982 IP (tos 0x0, ttl 64, id 2157, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:44.851233 IP (tos 0x0, ttl 128, id 751, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:44.852033 IP (tos 0x0, ttl 64, id 7995, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] # VPN Concentrator Rebooted Here 15:33:46.287064 IP (tos 0x0, ttl 64, id 29287, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x17), length 116 15:33:47.287790 IP (tos 0x0, ttl 64, id 3429, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x18), length 116 15:33:48.288783 IP (tos 0x0, ttl 64, id 58683, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x19), length 116 15:33:59.306212 IP (tos 0x0, ttl 64, id 23344, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x1a), length 116 15:33:59.306992 arp who-has x.x.x.40 tell x.x.x.49 15:33:59.307063 arp reply x.x.x.40 is-at 00:d0:b7:xx:xx:xx 15:33:59.307392 IP (tos 0x0, ttl 128, id 1, offset 0, flags [none], proto UDP (17), length 72) x.x.x.49.500 > x.x.x.40.500: [udp sum ok] isakmp 1.0 msgid cookie ->: phase 1 ? inf: (n: doi=ipsec proto=#0 type=INVALID-SPI orig=([|isakmp])) # Any more traffic that tries to traverse the tunnel gets back an invalid SPI reply from the Cisco, but racoon seems to ignore that(?) 15:38:12.199544 IP (tos 0x0, ttl 64, id 44124, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x1b), length 116 15:38:12.200149 IP (tos 0x0, ttl 128, id 2, offset 0, flags [none], proto UDP (17), length 72) x.x.x.49.500 > x.x.x.40.500: [udp sum ok] isakmp 1.0 msgid cookie ->: phase 1 ? inf: (n: doi=ipsec proto=#0 type=INVALID-SPI orig=([|isakmp])) 15:38:13.200220 IP (tos 0x0, ttl 64, id 32774, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x1c), length 116 15:38:13.200783 IP (tos 0x0, ttl 128, id 3, offset 0, flags [none], proto UDP (17), length 72) x.x.x.49.500 > x.x.x.40.500: [udp sum ok] isakmp 1.0 msgid cookie ->: phase 1 ? inf: (n: doi=ipsec proto=#0 type=INVALID-SPI orig=([|isakmp])) 15:38:14.201177 IP (tos 0x0, ttl 64, id 12600, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x1d), length 116 15:38:14.201741 IP (tos 0x0, ttl 128, id 4, offset 0, flags [none], proto UDP (17), length 72) x.x.x.49.500 > x.x.x.40.500: [udp sum ok] isakmp 1.0 msgid cookie ->: phase 1 ? inf: (n: doi=ipsec proto=#0 type=INVALID-SPI orig=([|isakmp])) # Here is where I stopped/started racoon and the traffic started to flow again (just some pings) 15:41:44.297629 IP (tos 0x0, ttl 64, id 52940, offset 0, flags [none], proto UDP (17), length 152) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 1 I ident: [|sa] 15:41:44.395658 IP (tos 0x0, ttl 128, id 12, offset 0, flags [none], proto UDP (17), length 108) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 1 R ident: [|sa] 15:41:44.424531 IP (tos 0x0, ttl 64, id 22746, offset 0, flags [none], proto UDP (17), length 208) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 1 I ident: [|ke] 15:41:44.532342 IP (tos 0x0, ttl 128, id 13, offset 0, flags [none], proto UDP (17), length 284) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 1 R ident: [|ke] 15:41:44.561447 IP (tos 0x0, ttl 64, id 61626, offset 0, flags [none], proto UDP (17), length 96) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 1 I ident[E]: [encrypted id] 15:41:44.655766 IP (tos 0x0, ttl 128, id 14, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 1 R ident[E]: [encrypted id] 15:41:44.656514 IP (tos 0x0, ttl 64, id 14790, offset 0, flags [none], proto UDP (17), length 112) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others I inf[E]: [encrypted hash] 15:41:45.658171 IP (tos 0x0, ttl 64, id 15246, offset 0, flags [none], proto UDP (17), length 192) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others I oakley-quick[E]: [encrypted hash] 15:41:45.664940 IP (tos 0x0, ttl 128, id 15, offset 0, flags [none], proto UDP (17), length 216) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others R oakley-quick[E]: [encrypted hash] 15:41:45.665747 IP (tos 0x0, ttl 64, id 56534, offset 0, flags [none], proto UDP (17), length 88) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others I oakley-quick[E]: [encrypted hash] 15:41:46.297514 IP (tos 0x0, ttl 64, id 53208, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x65050928,seq=0x1), length 116 15:41:46.299785 IP (tos 0x0, ttl 64, id 34354, offset 0, flags [none], proto ESP (50), length 136) x.x.x.49 > x.x.x.40: ESP(spi=0x06fc83b9,seq=0x1), length 116 15:41:47.299312 IP (tos 0x0, ttl 64, id 62344, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x65050928,seq=0x2), length 116 15:41:47.301208 IP (tos 0x0, ttl 64, id 34355, offset 0, flags [none], proto ESP (50), length 136) x.x.x.49 > x.x.x.40: ESP(spi=0x06fc83b9,seq=0x2), length 116 15:41:48.300263 IP (tos 0x0, ttl 64, id 22492, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x65050928,seq=0x3), length 116 15:41:48.302170 IP (tos 0x0, ttl 64, id 34356, offset 0, flags [none], proto ESP (50), length 136) x.x.x.49 > x.x.x.40: ESP(spi=0x06fc83b9,seq=0x3), length 116
-
Exactly!!!!!!!!!!!!!!!! ??? ??? ??? :'( :'( :'(
-
I did notice something weird, if you look at the times on the DPD packets in my tcpdump, they were being initiated by the VPN concentrator, not pfSense.
Perhaps that is why it isn't working? Even though pfSense is set for DPD, it has negotiated it with Cisco but is only replying and not initiating DPD checks?
Just a guess… probably needs more experimentation.
-
What I find odd is none of the Devs have not responded to this post nor the other posts I have seen regarding this issue at least telling us it is a bug or if it is unable to be fixed.
I will probably end up buying Cisco Linksys RV042's instead of implementing pfSense at remote locations until this can be fixed as I need to deploy about 7 locations with remote VPN's and needing to manually intervene each times is just a little too much for me.
-
Actually, if you use the term "dev" loosely, I am one :-)
(I am a committer on 2.0/HEAD and for packages, but I'm not a part of the core team)
This still feels more like a racoon bug than a pfSense bug. If it does turn out to be a pfSense bug, it may just be in terms of how DPD is being configured by the WebGUI. There isn't much to go wrong there, though.
I can try to build a tunnel to a 2.0 box and see if it behaves the same way.
-
As far as I know both should be able to initiate DPD. I think this is part of the problem as DPD is not available in 1.2.2. I have tried modifying the file in pfsense to enable it but it does nothing once I have edited the file.
In 1.2.3 people are having the same problem. I wish someone would shed some light on the issue.
-
I should mention that I am testing this with 1.2.3, so DPD should be working, in theory anyhow.
-
Could it be something to do with the version of racoon? I could not find much info about it besides a project called Kame..but that did not tell me much.
-
I have another tunnel between this 1.2.2 and another 1.2.2 and the tunnel has no problem coming back up. But others have complained about it so there is no consistency.
-
Could it be something to do with the version of racoon? I could not find much info about it besides a project called Kame..but that did not tell me much.
That's possible, which is why I want to try it against 2.0 as well. They are both running the same version of ipsec-tools (http://ipsec-tools.sourceforge.net/) though:
from 1.2.3:
May 11 12:33:07 pfsense-123test racoon: INFO: @(#)ipsec-tools 0.7.1 (http://ipsec-tools.sourceforge.net)from 2.0:
May 11 11:32:58 pfSense-20test racoon: INFO: @(#)ipsec-tools 0.7.1 (http://ipsec-tools.sourceforge.net)There is a slightly newer version of ipsec-tools (0.7.2) out, but I don't see any details release notes on the web site that state what bugs were fixed.
-
It appears that what I thought was the DPD packet may have been Cisco's IPSec keep-alive. I disabled that, and left DPD enabled on the pfSense side, and I see no regular traffic on the tunnel.
That makes me think that either DPD isn't being turned on, or it isn't being negotiated properly for the tunnel.
With Cisco's Keep-Alive turned on, I get:
May 11 15:41:44 pfsense-123test racoon: INFO: received Vendor ID: CISCO-UNITY
May 11 15:41:44 pfsense-123test racoon: INFO: received Vendor ID: draft-ietf-ipsra-isakmp-xauth-06.txt
May 11 15:41:44 pfsense-123test racoon: INFO: received Vendor ID: DPDWith it disabled, I get:
May 11 16:26:21 pfsense-123test racoon: INFO: received Vendor ID: CISCO-UNITY
May 11 16:26:21 pfsense-123test racoon: INFO: received Vendor ID: draft-ietf-ipsra-isakmp-xauth-06.txtThat makes me wonder if, perhaps, the Cisco side really doesn't support DPD as well as it claims to. I'm not sure if there is some other mechanism to signal racoon when something else fails (i.e. the keepalive ping) to reload a specific tunnel. That would probably be more reliable than DPD since that must be supported by both sides.
Why this works for some vendors/devices and not others is also puzzling…
-
Then why is it that when I reboot my concentrator the pfSense does not respond as seeing the tunnel go down at all but stays green?
-
Then why is it that when I reboot my concentrator the pfSense does not respond as seeing the tunnel go down at all but stays green?
That is what I'm trying to figure out… :-)
As you said, when both sides are pfSense, it seems to work. I'm wondering if the "invalid SPI" reply generated by the Cisco is broken in some way (or racoon's parsing thereof) such that it doesn't pick up on the fact that the tunnel traffic is being rejected.
-
Version history: –-------------- 0.7.1 - 23 July 2008 o Fixes a memory leak when invalid proposal received o Some fixes in DPD o do not set default gss id if xauth is used o fixed hybrid enabled builds o fixed compilation on FreeBSD8 o cleanup in network port value manipulation o gets ports from SADB_X_EXT_NAT_T_[SD]PORT if present in purge_ipsec_spi() o Generates a log if cert validation has been disabled by configuration o better handling for pfkey socket read errors o Fixes in yacc / bison stuff o new plog() macro (reduced CPU usage when logging is disabled) o Try to works better with huge SPD/SAD o Corrected modecfg option syntax o Many other various fixes…
-
That is all foreign to me but maybe it has something to do with the problem.
-
I'll have to look at it more tomorrow, but I might be able to see if bumping ipsec-tools to 0.7.2 might help things along.
Can't promise anything though. -
Those aren't the changes in 0.7.2, but the changes in 0.7.1. Here are the changes in 0.7.2:
0.7.2 - 22 April 2009
-
Fix a remote crash in fragmentation code
-
Phase2 message identities are phase1 specific (Vista compatibility)
-
Autogenerate ChangeLog from cvs metadata
-
Fix mode config pool resizing
-
NAT-T fixes related to purging of IPsec SA:s and retransmission
-
Remove phase1 handler immediately if first exchange is bad
-
A bunch of memory leak and possible memory corruptions (triggerable
by bad configuration or startup parameters)
Seems like an update that is worth upgrading to given how many crash fixes there are in it.
-