Strongswan on 2.2.3 ignoring rightid and setting it to %any
After upgrading from 2.2.2 to 2.2.3 I have 2 of 4 IPSec tunnels (site-to-site)
down. I know about the aesni bug (mine is off), but this seems to be something
These 2 that are down have one thing in common: the right id is not the peer
address. One is a fqdn, the other is some other IP address (because of nat). I
see the rightid is correctly set in /var/etc/ipsec/ipsec.conf but ipsec status
That %any … should not be there. My 2.2.2 installs with tunnels for the same
remote peer show the fqdn instead of %any.
Connection then fails pretty much as if my PSK where wrong:
invalid NOTIFY_V1 payload length, decryption failed?
could not decrypt payloads
message parsing failed
I guess it isn't locating the right PSK because of the %any in there. Tried to
set %any in ipsec.secrets just to see if it would change anything. It didn't,
but I'm not sure it could work that way anyway.
And more … it seems this does not happens always. I still have to check, but
it seems sometimes an ipsec restart solves this. These VPNs were up for a while
after the upgrade and some ipsec restarting. This is a production firewall and
I can't restart ipsec at will right now. Will do that later today.
Is anybody there seing this too?
Here is the generated config for one of the troubled VPNs, if anybody asks
(with mangled IPs and names):
uniqueids = yes
charondebug="dmn 1,ike 2,chd 2,enc 2"
fragmentation = yes
keyexchange = ikev1
reauth = no
forceencaps = no
mobike = no
rekey = yes
installpolicy = yes
type = transport
dpdaction = restart
dpddelay = 10s
dpdtimeout = 60s
auto = route
left = 22.214.171.124
right = 126.96.36.199
leftid = 188.8.131.52
ikelifetime = 86400s
lifetime = 86400s
ike = aes256-sha1-modp1536!
esp = aes256-sha1-modp1536!
leftauth = psk
rightauth = psk
rightid = fqdn:peer.router.com
aggressive = no
rightsubnet = 184.108.40.206
leftsubnet = 220.127.116.11
And from ipsec.secrets:
18.104.22.168 @peer.router.com : PSK LongWhateverEncodedString==
The change that impacts that is likely in ipsec.secrets. Pre-2.2.3, you would have had:
%any @peer.router.com : PSK LongWhateverEncodedString==
rather than the leading 22.214.171.124. If you edit ipsec.secrets to make it match that, then run 'ipsec stop && ipsec start', does it work as before?
I'm going through every possible combination of identifiers now to verify they all work as they should. We fixed some things there in 2.2.3, but broke others at the same time. Making sure we have a test suite that covers all the many possible combinations of options rather than just a bunch of them.
Well, on this late night I was about to tell you It didn't make any difference, but then…. I've changed ALL entries in ipsec.secrets to match that. Then it worked :D
Some more detail:
When got back to the machine some hours after my last tries, one of my troubled tunnels was online again. Apparently it came back by itself. I've checked that the remote FQDN was reported on the tunnel status line, instead of %any, as I expected.
Then I did an ipsec stop && ipsec start. That got me back to where I was in my previous post.
I tried your suggestion, changing only the ipsec.secrets line that referred to this tunnel. Stop/start. No difference.
Then I did a lot of restarting, mostly with ipsec restart. I wanted to see if it would eventually work on some restart. It never did.
Changed all ipsec.secrets lines replacing my identifier with %any. ipsec restart. All 4 tunnels worked!
Double checks: went back to only one line with %any, got the problem back. Next, tried changing just both lines of my two troubled tunnels. Still the exact same problematic behavior.
So for me it only works with all ipsec.secrets lines changed back to the old 2.2.2 style. Can you or anybody else reproduce this?
cmb, thank you very much for your reply. If you need any further testing or assistance from here, just let me know.
Could you confirm that works now on the latest 2.2.4 snapshot from https://snapshots.pfsense.org? Some circumstances may regress, but I believe this one will be fixed (back to 2.2.2 behavior for this case).
Ok, I will test that on a separate instance and get back to you.
Meanwhile, is there any way I can stop pfSense from reverting my manual changes in the ipsec.secrets file? It seems to be doing that automatically from time to time.
Anything that updates the IPsec config will overwrite ipsec.secrets. Hacking the source in /etc/inc/vpn.inc is the only way to make it not do that.
Quick update just to let you know I'm not gone. Sorry for the delay with this test.
I am not yet able to reproduce the problem on a 2.2.3 I have for testing. No point in testing the 2.2.4 snapshot in that case. There are several differences to my real setup and I will try to reduce them on Monday to see if I can get to reproduce this.
Ok cmb, I was finally able to reproduce the problem on my 2.2.3 test environment. Tried latest 2.2.4 snapshot (20150715-1754).
I confirm it does solve my problem. I see that it uses %any as left identifier on ipsec.secrets. All tunnels up with it.
Some notes on reproducing the problem:
Past Monday I reworked my test env to look a little bit more with the real case. I've setup 2 pfSense 2.2.2 vms A and B with 2 WAN links each. All wan networks are different and are routed among each router by a third pfSense vm P. I had 2 ipsec tunnels on each, between WAN1A-WAN1A and WAN2B-WAN2B. One of the tunnels used a distinguished name identifier on side B. With all that working, I updated B to 2.2.3. But still worked fine.
Today I added a third tunnel to B, against a cisco router (same as my real case). Then the problem showed up. It seems dependent on the interface I attach the tunnel to. One works, the other one doesn't. It's the same thing as described on my original post, and the workaround of changing the ipsec.secrets file manually also worked.