Fix the VPN IPSEC Dead Peer Detection in 1.2.2 or 1.2.3 {$200}
-
This problem is making me have to log into my firewall 10 times a day to delete the SA's for this tunnel.
-
Are you sure there isn't something else going on?
I have IPSec tunnels to several locations, and I don't have any problems at all. My connection at home has been very unstable lately, but every time it has reconnected the tunnel came back up without issue. Same with my other remote sites.
However, most of these are PPPoE based, not sure if that makes a difference. Some more detail in that respect may help work toward a solution.
-
Connections are static IP. The device is a Cisco VPN Concentrator 3005 and the other end is 1.2.2 embedded on ALIX.
-
Maybe this will help:
From 1.2.2 Release on Alix
May 6 18:58:13 racoon: [MSP Monitor]: INFO: ISAKMP-SA deleted 12.238.x.x[500]-69.12.x.x[500] spi:5b6b39080cbe0b85:29d8f870eded6416
May 6 18:58:12 racoon: [MSP Monitor]: INFO: ISAKMP-SA expired 12.238.x.x[500]-69.12.x.x[500] spi:5b6b39080cbe0b85:29d8f870eded6416
May 6 18:30:21 racoon: ERROR: libipsec failed pfkey align (Invalid sadb message)
May 6 18:30:21 racoon: ERROR: libipsec failed pfkey align (Invalid sadb message)
May 6 12:06:23 racoon: [MSP Monitor]: ERROR: no iph2 found: ESP 69.12.x.x[0]->12.238.x.x[0] spi=34828622(0x213714e)
May 6 12:06:23 racoon: INFO: unsupported PF_KEY message REGISTER
May 6 12:06:20 racoon: [MSP Monitor]: INFO: IPsec-SA established: ESP 12.238.x.x[0]->69.12.x.x[0] spi=1492648816(0x58f80370)
May 6 12:06:20 racoon: [MSP Monitor]: INFO: IPsec-SA established: ESP 69.12.x.x[0]->12.238.x.x[0] spi=233661810(0xded6572)
May 6 12:06:20 racoon: WARNING: ignore INITIAL-CONTACT notification, because it is only accepted after phase1.
May 6 12:06:20 racoon: [MSP Monitor]: INFO: respond new phase 2 negotiation: 12.238.x.x[0]<=>69.12.x.x[0]
May 6 12:06:20 racoon: [MSP Monitor]: INFO: ISAKMP-SA established 12.238.x.x[500]-69.12.x.x[500] spi:6605245b1011d662:5bac53ce09319aa9
May 6 12:06:20 racoon: WARNING: No ID match.
May 6 12:06:20 racoon: INFO: received Vendor ID: DPD
May 6 12:06:20 racoon: INFO: received Vendor ID: draft-ietf-ipsra-isakmp-xauth-06.txt
May 6 12:06:20 racoon: INFO: received Vendor ID: CISCO-UNITY
May 6 12:06:20 racoon: INFO: received broken Microsoft ID: FRAGMENTATION
May 6 12:06:20 racoon: INFO: received Vendor ID: draft-ietf-ipsec-nat-t-ike-03
May 6 12:06:20 racoon: INFO: received Vendor ID: draft-ietf-ipsec-nat-t-ike-02
May 6 12:06:20 racoon: INFO: begin Identity Protection mode.
May 6 12:06:20 racoon: [MSP Monitor]: INFO: respond new phase 1 negotiation: 12.238.x.x[500]<=>69.12.x.x[500]
May 6 12:06:20 racoon: [MSP Monitor]: INFO: ISAKMP-SA deleted 12.238.x.x[500]-69.12.x.x[500] spi:26cc8a5bb4d644b5:5244b6db8d48c8dd
May 6 12:06:19 racoon: [MSP Monitor]: INFO: ISAKMP-SA expired 12.238.x.x[500]-69.12.x.x[500] spi:26cc8a5bb4d644b5:5244b6db8d48c8dd
May 6 12:06:19 racoon: [MSP Monitor]: ERROR: pfkey DELETE received: ESP 12.238.x.x[0]->69.12.x.x[0] spi=1907603162(0x71b3b6da)
May 6 12:06:19 racoon: INFO: unsupported PF_KEY message REGISTER
May 6 11:43:26 racoon: [MSP Monitor]: ERROR: no iph2 found: ESP 69.12.x.x[0]->12.238.x.x[0] spi=101591129(0x60e2859)
May 6 11:43:26 racoon: INFO: unsupported PF_KEY message REGISTER
May 6 11:43:24 racoon: [MSP Monitor]: INFO: IPsec-SA established: ESP 12.238.x.x[0]->69.12.x.x[0] spi=1907603162(0x71b3b6da)
May 6 11:43:24 racoon: [MSP Monitor]: INFO: IPsec-SA established: ESP 69.12.x.x[0]->12.238.x.x[0] spi=34828622(0x213714e)
May 6 11:43:23 racoon: WARNING: ignore INITIAL-CONTACT notification, because it is only accepted after phase1.
May 6 11:43:23 racoon: [MSP Monitor]: INFO: respond new phase 2 negotiation: 12.238.x.x[0]<=>69.12.x.x[0]
May 6 11:43:22 racoon: [MSP Monitor]: INFO: ISAKMP-SA established 12.238.x.x[500]-69.12.x.x[500] spi:26cc8a5bb4d644b5:5244b6db8d48c8dd
May 6 11:43:22 racoon: WARNING: No ID match.
May 6 11:43:22 racoon: INFO: received Vendor ID: DPD
May 6 11:43:21 racoon: INFO: received Vendor ID: draft-ietf-ipsra-isakmp-xauth-06.txt
May 6 11:43:21 racoon: INFO: received Vendor ID: CISCO-UNITY
May 6 11:43:20 racoon: INFO: received broken Microsoft ID: FRAGMENTATION
May 6 11:43:20 racoon: INFO: received Vendor ID: draft-ietf-ipsec-nat-t-ike-03
May 6 11:43:20 racoon: INFO: received Vendor ID: draft-ietf-ipsec-nat-t-ike-02
May 6 11:43:20 racoon: INFO: begin Identity Protection mode.
May 6 11:43:20 racoon: [MSP Monitor]: INFO: respond new phase 1 negotiation: 12.238.x.x[500]<=>69.12.x.x[500]
May 6 11:43:15 racoon: [MSP Monitor]: ERROR: pfkey DELETE received: ESP 12.238.x.x[0]->69.12.x.x[0] spi=210205826(0xc877c82)
May 6 11:43:15 racoon: INFO: unsupported PF_KEY message REGISTER
May 6 11:32:21 racoon: [MSP Monitor]: INFO: IPsec-SA established: ESP 12.238.x.x[0]->69.12.x.x[0] spi=210205826(0xc877c82)
May 6 11:32:21 racoon: [MSP Monitor]: INFO: IPsec-SA established: ESP 69.12.x.x[0]->12.238.x.x[0] spi=101591129(0x60e2859)
May 6 11:32:21 racoon: WARNING: ignore INITIAL-CONTACT notification, because it is only accepted after phase1.
May 6 11:32:21 racoon: [MSP Monitor]: INFO: respond new phase 2 negotiation: 12.238.x.x[0]<=>69.12.x.x[0]
May 6 11:32:21 racoon: [MSP Monitor]: INFO: ISAKMP-SA established 12.238.x.x[500]-69.12.x.x[500] spi:e8110c06e832ad67:59e186430da18beb
May 6 11:32:21 racoon: WARNING: No ID match.
May 6 11:32:21 racoon: INFO: received Vendor ID: DPD
May 6 11:32:21 racoon: INFO: received Vendor ID: draft-ietf-ipsra-isakmp-xauth-06.txt -
I have one of those VPN concentrators sitting here collecting dust, but I'm not sure I'd have time to replicate a test environment.
Would it be possible for you to post sanitized versions of the tunnel config from both sides?
-
Sure thing. I will do it tonight.
-
You may also want to include any other relevant config from the Cisco if you can, things that might alter the way IPSec is handled.
-
Just remembered…Already posted the screenshots...see the post below.
http://forum.pfsense.org/index.php/topic,15678.msg81991.html#msg81991
-
Ok. Doesn't look like anything exotic… I'll see if my VPN concentrator still boots, we acquired it used quite some time ago and never needed it. :)
-
Just a little update: I have at least managed to reproduce the problem. I can get the tunnel to come up, ping, etc. Then if I reboot the VPN Concentrator, the tunnel is down on the VC side, and the pfSense side still believes the tunnel is active.
I have to bounce racoon to bring it back to life.
In this scenario, .40 is the pfSense box and .49 is the Cisco VPN Concentrator # You can see the DPD heartbeat every 10 seconds 15:31:54.836050 IP (tos 0x0, ttl 128, id 645, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:31:54.836829 IP (tos 0x0, ttl 64, id 7023, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:04.839087 IP (tos 0x0, ttl 128, id 646, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:04.839919 IP (tos 0x0, ttl 64, id 12303, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:14.842125 IP (tos 0x0, ttl 128, id 658, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:14.842992 IP (tos 0x0, ttl 64, id 2415, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:24.845161 IP (tos 0x0, ttl 128, id 685, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:24.845898 IP (tos 0x0, ttl 64, id 58238, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:34.848195 IP (tos 0x0, ttl 128, id 720, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:34.848982 IP (tos 0x0, ttl 64, id 2157, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:44.851233 IP (tos 0x0, ttl 128, id 751, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] 15:32:44.852033 IP (tos 0x0, ttl 64, id 7995, offset 0, flags [none], proto UDP (17), length 120) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others ? inf[E]: [encrypted hash] # VPN Concentrator Rebooted Here 15:33:46.287064 IP (tos 0x0, ttl 64, id 29287, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x17), length 116 15:33:47.287790 IP (tos 0x0, ttl 64, id 3429, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x18), length 116 15:33:48.288783 IP (tos 0x0, ttl 64, id 58683, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x19), length 116 15:33:59.306212 IP (tos 0x0, ttl 64, id 23344, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x1a), length 116 15:33:59.306992 arp who-has x.x.x.40 tell x.x.x.49 15:33:59.307063 arp reply x.x.x.40 is-at 00:d0:b7:xx:xx:xx 15:33:59.307392 IP (tos 0x0, ttl 128, id 1, offset 0, flags [none], proto UDP (17), length 72) x.x.x.49.500 > x.x.x.40.500: [udp sum ok] isakmp 1.0 msgid cookie ->: phase 1 ? inf: (n: doi=ipsec proto=#0 type=INVALID-SPI orig=([|isakmp])) # Any more traffic that tries to traverse the tunnel gets back an invalid SPI reply from the Cisco, but racoon seems to ignore that(?) 15:38:12.199544 IP (tos 0x0, ttl 64, id 44124, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x1b), length 116 15:38:12.200149 IP (tos 0x0, ttl 128, id 2, offset 0, flags [none], proto UDP (17), length 72) x.x.x.49.500 > x.x.x.40.500: [udp sum ok] isakmp 1.0 msgid cookie ->: phase 1 ? inf: (n: doi=ipsec proto=#0 type=INVALID-SPI orig=([|isakmp])) 15:38:13.200220 IP (tos 0x0, ttl 64, id 32774, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x1c), length 116 15:38:13.200783 IP (tos 0x0, ttl 128, id 3, offset 0, flags [none], proto UDP (17), length 72) x.x.x.49.500 > x.x.x.40.500: [udp sum ok] isakmp 1.0 msgid cookie ->: phase 1 ? inf: (n: doi=ipsec proto=#0 type=INVALID-SPI orig=([|isakmp])) 15:38:14.201177 IP (tos 0x0, ttl 64, id 12600, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x08d978ba,seq=0x1d), length 116 15:38:14.201741 IP (tos 0x0, ttl 128, id 4, offset 0, flags [none], proto UDP (17), length 72) x.x.x.49.500 > x.x.x.40.500: [udp sum ok] isakmp 1.0 msgid cookie ->: phase 1 ? inf: (n: doi=ipsec proto=#0 type=INVALID-SPI orig=([|isakmp])) # Here is where I stopped/started racoon and the traffic started to flow again (just some pings) 15:41:44.297629 IP (tos 0x0, ttl 64, id 52940, offset 0, flags [none], proto UDP (17), length 152) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 1 I ident: [|sa] 15:41:44.395658 IP (tos 0x0, ttl 128, id 12, offset 0, flags [none], proto UDP (17), length 108) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 1 R ident: [|sa] 15:41:44.424531 IP (tos 0x0, ttl 64, id 22746, offset 0, flags [none], proto UDP (17), length 208) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 1 I ident: [|ke] 15:41:44.532342 IP (tos 0x0, ttl 128, id 13, offset 0, flags [none], proto UDP (17), length 284) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 1 R ident: [|ke] 15:41:44.561447 IP (tos 0x0, ttl 64, id 61626, offset 0, flags [none], proto UDP (17), length 96) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 1 I ident[E]: [encrypted id] 15:41:44.655766 IP (tos 0x0, ttl 128, id 14, offset 0, flags [none], proto UDP (17), length 112) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 1 R ident[E]: [encrypted id] 15:41:44.656514 IP (tos 0x0, ttl 64, id 14790, offset 0, flags [none], proto UDP (17), length 112) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others I inf[E]: [encrypted hash] 15:41:45.658171 IP (tos 0x0, ttl 64, id 15246, offset 0, flags [none], proto UDP (17), length 192) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others I oakley-quick[E]: [encrypted hash] 15:41:45.664940 IP (tos 0x0, ttl 128, id 15, offset 0, flags [none], proto UDP (17), length 216) x.x.x.49.500 > x.x.x.40.500: isakmp 1.0 msgid cookie ->: phase 2/others R oakley-quick[E]: [encrypted hash] 15:41:45.665747 IP (tos 0x0, ttl 64, id 56534, offset 0, flags [none], proto UDP (17), length 88) x.x.x.40.500 > x.x.x.49.500: isakmp 1.0 msgid cookie ->: phase 2/others I oakley-quick[E]: [encrypted hash] 15:41:46.297514 IP (tos 0x0, ttl 64, id 53208, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x65050928,seq=0x1), length 116 15:41:46.299785 IP (tos 0x0, ttl 64, id 34354, offset 0, flags [none], proto ESP (50), length 136) x.x.x.49 > x.x.x.40: ESP(spi=0x06fc83b9,seq=0x1), length 116 15:41:47.299312 IP (tos 0x0, ttl 64, id 62344, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x65050928,seq=0x2), length 116 15:41:47.301208 IP (tos 0x0, ttl 64, id 34355, offset 0, flags [none], proto ESP (50), length 136) x.x.x.49 > x.x.x.40: ESP(spi=0x06fc83b9,seq=0x2), length 116 15:41:48.300263 IP (tos 0x0, ttl 64, id 22492, offset 0, flags [none], proto ESP (50), length 136) x.x.x.40 > x.x.x.49: ESP(spi=0x65050928,seq=0x3), length 116 15:41:48.302170 IP (tos 0x0, ttl 64, id 34356, offset 0, flags [none], proto ESP (50), length 136) x.x.x.49 > x.x.x.40: ESP(spi=0x06fc83b9,seq=0x3), length 116
-
Exactly!!!!!!!!!!!!!!!! ??? ??? ??? :'( :'( :'(
-
I did notice something weird, if you look at the times on the DPD packets in my tcpdump, they were being initiated by the VPN concentrator, not pfSense.
Perhaps that is why it isn't working? Even though pfSense is set for DPD, it has negotiated it with Cisco but is only replying and not initiating DPD checks?
Just a guess… probably needs more experimentation.
-
What I find odd is none of the Devs have not responded to this post nor the other posts I have seen regarding this issue at least telling us it is a bug or if it is unable to be fixed.
I will probably end up buying Cisco Linksys RV042's instead of implementing pfSense at remote locations until this can be fixed as I need to deploy about 7 locations with remote VPN's and needing to manually intervene each times is just a little too much for me.
-
Actually, if you use the term "dev" loosely, I am one :-)
(I am a committer on 2.0/HEAD and for packages, but I'm not a part of the core team)
This still feels more like a racoon bug than a pfSense bug. If it does turn out to be a pfSense bug, it may just be in terms of how DPD is being configured by the WebGUI. There isn't much to go wrong there, though.
I can try to build a tunnel to a 2.0 box and see if it behaves the same way.
-
As far as I know both should be able to initiate DPD. I think this is part of the problem as DPD is not available in 1.2.2. I have tried modifying the file in pfsense to enable it but it does nothing once I have edited the file.
In 1.2.3 people are having the same problem. I wish someone would shed some light on the issue.
-
I should mention that I am testing this with 1.2.3, so DPD should be working, in theory anyhow.
-
Could it be something to do with the version of racoon? I could not find much info about it besides a project called Kame..but that did not tell me much.
-
I have another tunnel between this 1.2.2 and another 1.2.2 and the tunnel has no problem coming back up. But others have complained about it so there is no consistency.
-
Could it be something to do with the version of racoon? I could not find much info about it besides a project called Kame..but that did not tell me much.
That's possible, which is why I want to try it against 2.0 as well. They are both running the same version of ipsec-tools (http://ipsec-tools.sourceforge.net/) though:
from 1.2.3:
May 11 12:33:07 pfsense-123test racoon: INFO: @(#)ipsec-tools 0.7.1 (http://ipsec-tools.sourceforge.net)from 2.0:
May 11 11:32:58 pfSense-20test racoon: INFO: @(#)ipsec-tools 0.7.1 (http://ipsec-tools.sourceforge.net)There is a slightly newer version of ipsec-tools (0.7.2) out, but I don't see any details release notes on the web site that state what bugs were fixed.
-
It appears that what I thought was the DPD packet may have been Cisco's IPSec keep-alive. I disabled that, and left DPD enabled on the pfSense side, and I see no regular traffic on the tunnel.
That makes me think that either DPD isn't being turned on, or it isn't being negotiated properly for the tunnel.
With Cisco's Keep-Alive turned on, I get:
May 11 15:41:44 pfsense-123test racoon: INFO: received Vendor ID: CISCO-UNITY
May 11 15:41:44 pfsense-123test racoon: INFO: received Vendor ID: draft-ietf-ipsra-isakmp-xauth-06.txt
May 11 15:41:44 pfsense-123test racoon: INFO: received Vendor ID: DPDWith it disabled, I get:
May 11 16:26:21 pfsense-123test racoon: INFO: received Vendor ID: CISCO-UNITY
May 11 16:26:21 pfsense-123test racoon: INFO: received Vendor ID: draft-ietf-ipsra-isakmp-xauth-06.txtThat makes me wonder if, perhaps, the Cisco side really doesn't support DPD as well as it claims to. I'm not sure if there is some other mechanism to signal racoon when something else fails (i.e. the keepalive ping) to reload a specific tunnel. That would probably be more reliable than DPD since that must be supported by both sides.
Why this works for some vendors/devices and not others is also puzzling…