Pf2.1 IPSEC between 4 sites, 1 suddenly failing
-
I've been successfully maintaining a mesh network using IPSEC between 4 sites running pfSense 2.1 for months now (years total on various versions of pfSense) so when I say that I'm familiar with the basics, know that I am. That said, the most critical of the sites (and the one I manage everything from) will no longer bring any of the tunnels up on either of its 2 wan links (6 tunnels total are normally up). NO changes were made anywhere near the time this happened. Though the timing with GoDaddy going down is an erie coincidence; I don't see how that could still be an issue at this point in time. I'm using very flexible settings for Phase 1 because 3 of the sites have dynamic IPs so the setup isn't sensitive in that respect. I am getting log entries that don't seem right to me and I'm betting someone who's knows how to read them correctly would see the issue so here's hoping :) I'm replacing my real IPs with placeholders to give them a bit more meaning and give me a bit of security.
10.0.0.1-10.0.0.3 are my remote sites
11.0.0.1-11.0.0.2 are my wan IPsSep 11 07:00:57 racoon: INFO: delete phase 2 handler. Sep 11 07:00:57 racoon: [remoteSite3]: [10.0.0.3] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 10.0.0.3[0]->11.0.0.2[0] Sep 11 07:00:56 racoon: INFO: begin Aggressive mode. Sep 11 07:00:56 racoon: [remoteSite1]: INFO: initiate new phase 1 negotiation: 11.0.0.2[500]<=>10.0.0.1[500] Sep 11 07:00:56 racoon: [remoteSite1]: INFO: IPsec-SA request for 10.0.0.1 queued due to no phase1 found. Sep 11 07:00:53 racoon: INFO: delete phase 2 handler. Sep 11 07:00:53 racoon: [remoteSite3]: [10.0.0.3] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 10.0.0.3[0]->11.0.0.2[0] Sep 11 07:00:49 racoon: [remoteSite1]: [10.0.0.1] INFO: request for establishing IPsec-SA was queued due to no phase1 found. Sep 11 07:00:32 racoon: INFO: delete phase 2 handler. Sep 11 07:00:32 racoon: [remoteSite1]: [10.0.0.1] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 10.0.0.1[0]->11.0.0.2[0] Sep 11 07:00:31 racoon: INFO: begin Aggressive mode. Sep 11 07:00:31 racoon: [remoteSite1]: INFO: initiate new phase 1 negotiation: 11.0.0.2[500]<=>10.0.0.1[500] Sep 11 07:00:31 racoon: [remoteSite1]: INFO: IPsec-SA request for 10.0.0.1 queued due to no phase1 found. Sep 11 07:00:30 racoon: INFO: delete phase 2 handler. Sep 11 07:00:30 racoon: [remoteSite1]: [10.0.0.1] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 10.0.0.1[0]->11.0.0.2[0] Sep 11 07:00:28 racoon: ERROR: phase1 negotiation failed due to time up. 2e596c0c084ceaeb:0000000000000000 Sep 11 07:00:25 racoon: [remoteSite3]: [10.0.0.3] INFO: request for establishing IPsec-SA was queued due to no phase1 found. Sep 11 07:00:22 racoon: [remoteSite3]: [10.0.0.3] INFO: request for establishing IPsec-SA was queued due to no phase1 found. Sep 11 07:00:16 racoon: ERROR: phase1 negotiation failed due to time up. 07679e04e2140a7f:0000000000000000 Sep 11 07:00:11 racoon: INFO: delete phase 2 handler. Sep 11 07:00:11 racoon: [remoteSite1]: [10.0.0.1] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 10.0.0.1[0]->11.0.0.2[0] Sep 11 07:00:10 racoon: INFO: delete phase 2 handler. Sep 11 07:00:10 racoon: [remoteSite3]: [10.0.0.3] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 10.0.0.3[0]->11.0.0.2[0] Sep 11 07:00:03 racoon: INFO: delete phase 2 handler. Sep 11 07:00:03 racoon: [remoteSite1]: [10.0.0.1] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 10.0.0.1[0]->11.0.0.2[0] Sep 11 07:00:00 racoon: [remoteSite1]: [10.0.0.1] INFO: request for establishing IPsec-SA was queued due to no phase1 found. Sep 11 06:59:58 racoon: [remoteSite1]: [10.0.0.1] INFO: request for establishing IPsec-SA was queued due to no phase1 found. Sep 11 06:59:57 racoon: INFO: delete phase 2 handler. Sep 11 06:59:57 racoon: [remoteSite1]: [10.0.0.1] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 10.0.0.1[0]->11.0.0.2[0] Sep 11 06:59:51 racoon: ERROR: phase1 negotiation failed due to time up. 16bce630b4c620cc:0000000000000000 Sep 11 06:59:40 racoon: [remoteSite1]: [10.0.0.1] INFO: request for establishing IPsec-SA was queued due to no phase1 found. Sep 11 06:59:38 racoon: INFO: begin Aggressive mode. Sep 11 06:59:38 racoon: [remoteSite3]: INFO: initiate new phase 1 negotiation: 11.0.0.2[500]<=>10.0.0.3[500]
-
All your systems are running pfs 2.1-BETA0 ?
The logfile doesn't tell much, post (sanitized) versions of
/var/etc/racoon.conf
/var/etc/spd.conf -
Thanks for looking at my issue.
I can't recall which are running beta0 but I know at least one is and likely a 2nd. None have had configuration changes or updates in weeks though and I'm as sure of that as I can be shy of wondering if pfSense is doing something new behind the scenes that it's never done before to update/change/reboot/etc.
Racoon.conf
# This file is automatically generated. Do not edit path pre_shared_key "/var/etc/psk.txt"; path certificate "/var/etc"; listen { adminsock "/var/db/racoon/racoon.sock" "root" "wheel" 0660; isakmp 11.0.0.2 [500]; isakmp_natt 11.0.0.2 [4500]; } mode_cfg { auth_source system; group_source system; } remote 10.0.0.1 { ph1id 1; exchange_mode aggressive; my_identifier address 11.0.0.2; peers_identifier address 10.0.0.1; ike_frag on; generate_policy = off; initial_contact = on; nat_traversal = on; dpd_delay = 10; dpd_maxfail = 5; support_proxy on; proposal_check claim; proposal { authentication_method pre_shared_key; encryption_algorithm blowfish 128; hash_algorithm sha1; dh_group 2; lifetime time 28800 secs; } } remote 10.0.0.2 { ph1id 5; exchange_mode aggressive; my_identifier address 11.0.0.2; peers_identifier address 10.0.0.2; ike_frag on; generate_policy = off; initial_contact = on; nat_traversal = on; dpd_delay = 10; dpd_maxfail = 5; support_proxy on; proposal_check claim; proposal { authentication_method pre_shared_key; encryption_algorithm blowfish 128; hash_algorithm sha1; dh_group 2; lifetime time 28800 secs; } } remote 10.0.0.3 { ph1id 4; exchange_mode aggressive; my_identifier address 11.0.0.2; peers_identifier address 10.0.0.3; ike_frag on; generate_policy = off; initial_contact = on; nat_traversal = on; dpd_delay = 10; dpd_maxfail = 5; support_proxy on; proposal_check claim; proposal { authentication_method pre_shared_key; encryption_algorithm blowfish 128; hash_algorithm sha1; dh_group 2; lifetime time 28800 secs; } } sainfo subnet 192.168.10.0/24 any subnet 192.168.1.0/24 any { remoteid 1; encryption_algorithm blowfish 128; authentication_algorithm hmac_sha1; pfs_group 1; lifetime time 3600 secs; compression_algorithm deflate; } sainfo subnet 192.168.10.0/24 any subnet 192.168.5.0/24 any { remoteid 5; encryption_algorithm blowfish 128; authentication_algorithm hmac_sha1; pfs_group 1; lifetime time 3600 secs; compression_algorithm deflate; } sainfo subnet 192.168.10.0/24 any subnet 192.168.6.0/24 any { remoteid 4; encryption_algorithm blowfish 128; authentication_algorithm hmac_sha1; pfs_group 1; lifetime time 3600 secs; compression_algorithm deflate; } sainfo subnet 192.168.11.0/24 any subnet 192.168.6.0/24 any { remoteid 4; encryption_algorithm blowfish 128; authentication_algorithm hmac_sha1; pfs_group 1; lifetime time 3600 secs; compression_algorithm deflate; } sainfo subnet 192.168.11.0/24 any subnet 192.168.5.0/24 any { remoteid 5; encryption_algorithm blowfish 128; authentication_algorithm hmac_sha1; pfs_group 1; lifetime time 3600 secs; compression_algorithm deflate; } sainfo subnet 192.168.11.0/24 any subnet 192.168.1.0/24 any { remoteid 1; encryption_algorithm blowfish 128; authentication_algorithm hmac_sha1; pfs_group 1; lifetime time 3600 secs; compression_algorithm deflate; }
spd.conf
spdadd -4 192.168.10.254/32 192.168.10.0/24 any -P out none; spdadd -4 192.168.10.0/24 192.168.10.254/32 any -P in none; spdadd -4 192.168.10.0/24 192.168.1.0/24 any -P out ipsec esp/tunnel/11.0.0.2-10.0.0.1/unique; spdadd -4 192.168.1.0/24 192.168.10.0/24 any -P in ipsec esp/tunnel/10.0.0.1-11.0.0.2/unique; spdadd -4 192.168.10.0/24 192.168.5.0/24 any -P out ipsec esp/tunnel/11.0.0.2-10.0.0.2/unique; spdadd -4 192.168.5.0/24 192.168.10.0/24 any -P in ipsec esp/tunnel/10.0.0.2-11.0.0.2/unique; spdadd -4 192.168.10.0/24 192.168.6.0/24 any -P out ipsec esp/tunnel/11.0.0.2-10.0.0.3/unique; spdadd -4 192.168.6.0/24 192.168.10.0/24 any -P in ipsec esp/tunnel/10.0.0.3-11.0.0.2/unique; spdadd -4 192.168.11.0/24 192.168.6.0/24 any -P out ipsec esp/tunnel/11.0.0.2-10.0.0.3/unique; spdadd -4 192.168.6.0/24 192.168.11.0/24 any -P in ipsec esp/tunnel/10.0.0.3-11.0.0.2/unique; spdadd -4 192.168.11.0/24 192.168.5.0/24 any -P out ipsec esp/tunnel/11.0.0.2-10.0.0.2/unique; spdadd -4 192.168.5.0/24 192.168.11.0/24 any -P in ipsec esp/tunnel/10.0.0.2-11.0.0.2/unique; spdadd -4 192.168.11.0/24 192.168.1.0/24 any -P out ipsec esp/tunnel/11.0.0.2-10.0.0.1/unique; spdadd -4 192.168.1.0/24 192.168.11.0/24 any -P in ipsec esp/tunnel/10.0.0.1-11.0.0.2/unique;
-
The config files seem OK.
Since 3 of the 4 sites use dynamic IPs, I'd start with verifying that the branch routers correctly update their DynDNS record and that at the main site pfsense's filterdns daemon picks up and updates the IPs in the respective conf files.
-
I'd also do a packet capture on WAN filtering on port 500 and make sure you have bidirectional communication between the sites on the ISAKMP (make sure everything one site is sending is received by the remote and vice versa). Modems or other things in line between the firewalls and the Internet can break that connectivity, and at times you'll lose the ability to communicate between site A and site B on the Internet in general even though the Internet at both sites otherwise works perfectly fine (that happens far more than I would have believed a few years back before our commercial support took off).