2.2.1 Embedded to Barracuda Rekey Issue


  • I appear to have the same problem that others have reported with tunnels that go down after some time, and must be manually restarted.  Unfortunately I upgraded 3 of my 9 pfSense firewalls to 2.2.1 before I noticed the problem.  I have applied the oldsa fix, I even rebooted all three 2.2.1 firewalls, with no luck.

    When I came in this am, my connection to NY (main office) was down, and clicking restart on the tunnel fixed it, as it has been.  Here is my current status of the firewall in my office:  http://take.ms/O96e5

    After doing some research, I don't believe I can downgrade to 2.1.X because I don't have a backup of the config from that version.

    What other information can I provide to troubleshoot this issue?  I was able to search my logs for 'racoon' in yesterday's logs, the result is here: http://pastebin.com/mH4TQa7v .  My NY endpoint is 64.190.10.2 .  I can expand the search if racoon was to narrow.

    Edit: I just noticed those log files were from January.  Apparently when I upgraded the remote server to Ubuntu 14.04 I failed to notice the change to syslog-ng, and it hasn't been loading my logging configuration.  I have fixed that and verified that logging is working properly for the other 2 firewalls running 2.2.1.  If/when one of them loses connection to NY, I'll try capture some new logging and post it.

    Edit 2: My firewall lost connection, and I caught it quickly enough that I was able to extract a small set of logs.  Again, the tunnel went down, showed as down in the pfSense UI and simply clicking the start tunnel button restored it.  Logs are here: http://pastebin.com/38XBmSRC

    Thank you in advance
    Tony Nelson

    Edit:  I forgot to mention, we do Radius Authentication for our wireless access points to our NY server, across the VPN tunnel.  That seemed to stop working as well after the upgrade to 2.2.1.  Unfortunately I don't have a lot of data that would show the cause.  I did some packet captures on the radius server but didn't save them.  At the time I blamed it on the access point, but have since replaced the access point, and have the exact same problem.


  • I hate to bump this like this, but having to restart these tunnels several times a day is draining on me.

    Anyone have any suggestion on how I get these tunnels to rekey properly?

    If there is no other resolution, has anyone tried downgrading a 2.2.1 config to 2.1.5?  I have similar configs, so I can take a backup from another firewall, and then change the IP addresses, etc if needed.

  • Banned

    @hhubris:

    Anyone have any suggestion on how I get these tunnels to rekey properly?

    Yeah. OpenVPN rekeys just fine. Switched everything since 2.2 release. The BSD port of strongswan is a heap of shit, to put it mildly. I've been pretty patient, but this POS IPsec implementation will takes years to fix.


  • Do you have any sense if this is an issue w/ strongswan?  Or possibly just a configuration issue?  pfsense running 2.1.5 has been rock solid for years.

    That said, I will investigate switching to OpenVPN.

    Thanks for the suggestion.

  • Banned

    @hhubris:

    Do you have any sense if this is an issue w/ strongswan?

    The Linux strongswan implemention has barely 1/10 of the issues encountered on FreeBSD. Hence, I'm saying the BSD port is a heap of shit.


  • After speaking with Barracuda, they do not support SSL VPN for peer to peer.

    Am I best off hoping that 2.2.2 will fix this problem?  If so, any idea how far away that is?

    Or should I start trying to figure out what I will have to do to drop back to 2.1.5?

    Thanks again,
    Tony

  • Banned

    No fixes for this in 2.2.2; and in general expecting any outcome before 2.3.x - just good luck. :P You'd better explore IPsec alternatives, strongswan is not something production-ready on BSD, sorry.


  • It worked well enough in 2.1.5.  I'll figure out how to roll back, and be very careful about trying any new 2.2 versions going forward.

  • Banned

    @hhubris:

    It worked well enough in 2.1.5.

    There was no strongswan on 2.1.x; completely different implementation.


  • Right, I see they changed from racoon to strongswan.  In retrospect, maybe that was an error.

    I just read the 2.2.2 release notes, and they hit the nail on the head.  The tunnel I'm having issues with has 3 phase 2 entries.  I wonder if I could solve the problem by just creating 3 separate connections.

    At any rate, I'm going to upgrade one of my firewalls to 2.2.2 and see if it makes any difference.  If it makes I worse, I'll come in early and rebuild from scratch on 2.1.5.  My config is that's bad.

    FWIW, I'm happy to supply any logs, or anything else that might be useful.

    Thanks


  • I noticed this for 2.2.2 strongSwan upgraded to 5.3.0, hopefully they fixed a couple of bugs in that verison. I've already upgraded a couple of my remote ends that were already on 2.2.1 and the tunnels came back up, so i don't think it's going to be any worst then 2.2.1


  • @doktornotor:

    The BSD port of strongswan is a heap of shit, to put it mildly. I've been pretty patient, but this POS IPsec implementation will takes years to fix.

    It's been bumpier than I would have liked for sure, but that's an extreme misrepresentation of the status. All the most common uses are very stable in 2.2.2, and most all have been since 2.2.0 outside the bumps where racoon was doing something wrong and people needed to fix configs that never should have worked to begin with. Some hiccups in 2.2.1 from the prefer old SAs option.

    I'm not aware of any general issues with IPsec in 2.2.2. There are a few edge cases. I'm working on a problem with IKEv2 and multiple subnets to a Cisco ASA, where the ASA seems to have the broken behavior actually, but still investigating.

    Some edge case issues will take some time with 2.2.2 before we'll know. This specific case one of them.

    @hhubris:

    FWIW, I'm happy to supply any logs, or anything else that might be useful.

    If it's still an issue with 2.2.2, let me know. I'll take it on myself as a free support incident.

  • Banned

    The only result I am able to achieve absolutely reliably with BSD + strongswan is screenshotted here and on many other threads. The SA entries keep accumulating and no traffic passes. I never had "prefer old SA" enabled anywhere. I seriously don't have time to babysit tens of tunnels and keep restarting them (even that does not work, you need stop and start instead) when people complain that it yet again does not work. This is simply NOT usable in production!!!

    The site-to-site OpenVPN tunnels last for days and days without hickup, and when connection occasionally drops, they self-fix instantly once the link is up again. Never been an isssue with racoon, and while I definitely agree that it is lacking in the feature's department, what are the fancy features good for when it's just totally unstable!

    It's your business, gyus. Take vanilla BSD kernel and test this. Either your kernel patches are killing this, or the entire BSD port is just garbage. It's your business to debug. I simply cannot take vanilla kernel and plop in on pfSense box to test whether strongswan port is shit or BSD kernel is shit or your kernel patches are shit. Plus, don't have time for similar things anyway. Even the logs are absolute nightmare to work with, never ever seen similar amount of insane useless noise with any other daemon with loglevel set to as silent as possible. All I can say is that it certainly does not behave this way on Linux.

    :( >:( >:(


  • @cmb:

    Some edge case issues will take some time with 2.2.2 before we'll know.

    One positive report thus far. Though I'd really like to know what changed in that case, as OS X and iOS have always worked fine for me.
    https://forum.pfsense.org/index.php?topic=92417.0

    @doktornotor:

    This is simply NOT usable in production!!!

    I'd love to have a specific circumstance where that's replicable on 2.2.2. You can't take "doesn't work for me, in some unusual edge case, where it works for many thousands of other systems" and say it's not usable in production. Not in your specific case for some reason, definitely. If it were easy to replicate that, we'd have had it fixed already.

  • Banned

    Dunno, for me, the "specific circumstances" here are

    • set up two pfSense boxes
    • set up IPsec tunnel with one or more P2s in there
    • watch how stops passing traffic sooner or later. Everything looks cool and the GUI tells you how's it all green and up and there's nothing unusual in the logs beyond the insane amounts of noise – and the traffic happily goes to blackhole.

    All sites have static public IPs. One is WiMax, one is VDSL, one is cable... another, cannot recall at the moment. It simply happens everywhere. Plain IPv4 tunnel, mutual RSA, v1 or v2 does not matter. Usually AES 128bits, SHA1. That's all. No NAT/BINAT in between, nothing fancy. When you cannot make this working with pfSense on both ends, dunno how you can use this for production.

    I can give it one final try on 2.2.2, after that, the entire IPsec things goes out of the door forever. Waste of time.


  • It's too early to pop champagne corks, but the office that I upgraded yesterday has been stable since then.  I have upgraded the other 2 office to 2.2.2 and so far so good.

    If this continues through tomorrow, I'll be able to enjoy my weekend much more.

    This is a huge thank you to all of the folks that worked on 2.2.2 to fix this problem.

    Since the prefer old sa "fix" didn't seem to help me, should I remove it?  Or just leave well enough alone since things seem to be working?

    Thanks again
    Tony


  • @hhubris:

    Since the prefer old sa "fix" didn't seem to help me, should I remove it?  Or just leave well enough alone since things seem to be working?

    Can leave it be. If you remove it, it'll just fall through to the now built-in tunable, which sets it the same way, so it's the same regardless.


  • Unfortunately I'm still having drops.  Just had another one with 2.2.2.

    Is there any info from the log files I can provide to help?  I have the output from the firewall the dropped the connection, 13 minutes of logs is 3600 lines, so not bad.  Nagios notified me at 10:08 the connection went down and I restarted the tunnel as quickly as I could.

    2.2.2 does seem better than 2.2.1, but 1.2.5 is still far better than either.

    Thanks
    Tony


  • Logs would be useful.


  • @doktornotor:

    I can give it one final try on 2.2.2, after that, the entire IPsec things goes out of the door forever. Waste of time.

    Same problem in 2.2.2. You do not need to test it….  :-[
    Congratulation you are in the fortunate position to switch to OpenVPN.
    I am not able to swith to native ssl, because I have a lot of foreign FW on the other side (ASA, Sophos, Juniper, ...).
    Unstable VPN-Tunnel --awkward situation for me---.

    I am really disappointed.