1.2.3-RC1 upgrade to 1.2.3-RC3 IPSEC issues



  • I upgraded from 1.2.3-RC1 to 1.2.3-RC3 last night and everything looked like it was working ok, but then at about 9AM today pfSense decided to stop sending traffic over the VPN tunnels. They showed they where up on both ends, but no data transfer. I rebooted, still nothing. I made a change to each tunnel, then changed it back and still nothing. Played with disable/enable on each tunnel and no luck. Then I thought it might be something with snort, so I uninstalled it and rebooted. Still couldn't get traffic to go over any of the tunnels. Then I remembered that the new release had some IPSec changes so I tried deleting each tunnel and recreating them on the pfSense side and everything started working.

    I'm not sure if there was a better way to resolve this, but this seams to be a bug RC3 and I haven't seen anyone else post about it yet so I figured I would.



  • Are you using carp addresses on the tunnels?  When I upgraded mine were change and I had to reset the carp addresses that the tunnels were terminated into.  After that I've not had any issues with it to other pfsense or sonicwall boxes.

    Andy



  • Yes, I am. I also noticed HA didn't seem to be working but haven't been able to take the network down to test it. I was going to look into this more over the weekend.

    What do you mean by resetting the CARP addresses? Do you mean delete and recreate them?

    NAT seems to be working fine over the CARP IP's, even when IPSec was having issues.



  • Mine vip's on one set was change, i.e. I had it set to .123 and it was 132.  But on other it was correct, I just changed it/applied it/changed it back.  At least in my box it fixed it.

    Andy



  • This issue has now came back 4 times since the upgrade. I've messed with the VIP's, rebooted and messed with the tunnels and they will finally start sending traffic again. The tunnels all show as up, but they are not. I don't get it. Anyone have any idea?



  • You might want to check out this thread.
    http://forum.pfsense.org/index.php/topic,16274.0.html
    If all of your tunnels are down, I would reset the states, and then restart racoon.

    If some of your tunnels are still up and you don't want to kill them, you could disable the tunnels that are down and then delete the SAD entries for those tunnels. Enable the tunnels again one at a time, and they should come up. It can take a couple of minutes and you may need to send some traffic like a ping. If that doesn't work you may need to go with the first option to restart racoon.

    As far as an answer to why the tunnels are breaking and not coming back up on their own, there is no standard solution.
    Some people have had success with changing the DPD setting.
    Some users have to generate traffic from the far end of a tunnel (non-pfsense equipment) to get the tunnel back up.
    Changing the lifetimes for the phase 1 and phase 2 portion of the tunnels so that the values are not the same has helped other users. For instance, do not set the lifetime on phase 1 to 28800 and the lifetime on phase 2 to 28800.
    Another option is to disable PFS on phase 2. This reduces security, but may help.

    If you provide more info on your configuration, someone else may be able to provide other possible solutions.



  • Thanks, I'll check the tread out. I've been running 1.2.3-RC1 for about 2 months now with no issues at all. I upgraded to RC3 and now I'm having IPSEC issues. Restarting racoon doesn't seem to help, but if I change the interface and put it back everything is fine for a little. What additional information should I post that might be able to help?

    All tunnels are static site to site tunnels to mostly sonicwalls, using main mode. DPD setting was not set, but I'm playing with that now. PFS has been disabled and both phase 1 and phase 2 LT is 28800. The only error I'm getting in the logs that I can find is 'failed to get sainfo' for one of the tunnels, but I think that's just a peer network setting on the remote side. Keep Alive is enabled on both ends. Not sure if this makes a difference, but before I'd typically see 1-3% cpu usage, but since the upgrade I've been seeing it bounce from 2-10% consistently. After doing some digging, I found this error: racoon: INFO: unsupported PF_KEY message REGISTER.

    If there is anything else I can provide that might help, please let me know.

    -J



  • I resolved the 'failed to get sainfo' error and messed with the DPD settings and just had another tunnel go down. The only error message I see is racoon: INFO: unsupported PF_KEY message REGISTER. I tried searching for it, but didn't find much.

    Next I'm trying to mess with deleting the PAD/SAD entries and checked the box to prefer old IPsec SA's. Hopefully, this will help.

    Any other ideas?



  • I haven't been able to make sense of the error messages in IPSec. I get some of these messages when my tunnels are working fine and no errors when they go down. :)
    I believe that the "unsupported PF_KEY message REGISTER" message can be ignored.
    I'm getting tempted to try RC1. I don't really have time to monitor my tunnels everyday.



  • What are the devices on the other end of the tunnels that have problems?



  • Well, it's been two days now and the tunnels are staying up.  I'm seeing phase 2 renegotiate without any connectivity issues over all of my tunnels. Before, they would go down every time phase 2 would expire. I am still going to give it a some more time before I consider this stable. In the mean time, I'm checking things every few hours or so. All ipsec tunnels are to sonicwalls with standard and enhanced os and one watchguard. (I hate watchguard)

    What seemed to stabilize things was changing the DPD setting to 30 seconds instead of leaving it blank and enabling the 'Prefer old IPSec SA's' under System>Advanced. Although, I think just enabling DPD resolved it. While on a Watchguard x500 (I think that's what it was) today, I noticed the default setting on the IPSec tunnels was that DPD was enabled to 20 sec. The x500 I was in isn't connect to my pfSense devices, but I thought it was interesting that it was enabled. I never noticed before.

    The rest of my tunnel settings are as follows:
    Interface: CARP IP
    DPD: 30
    Local subnet: LocalLAN
    Remote subnet: RemoteLAN
    Remote gateway: Remote WAN IP

    Phase 1
    Negotiation Mode: Main
    IKE ID: My CARP IP
    3DES/SHA1/DH2/28800

    Phase 2
    ESP/3DES/SHA1/28800
    Enable Keep alive
    Disable PFS

    I have found that these settings will work with most firewalls. I manage several SonicWALL, Cisco, Fortinet and Watchguard devices and these settings always work for me when setting up a IPSec tunnel. I usually just leave the DPD setting to the default setting on which ever firewall, so I find it odd that it seems like I need to have it enabled on pfSense now. To me, if phase 2 is not starting with out it there is an issue with racoon somewhere.



  • It sounds like what you're seeing is just a change in behavior with the newer racoon where we now have DPD. Where the other end is using it, it has to have a value there.



  • cmb, interesting…  if that's the case I could see this being an issue for a lot of people. Would it be a good idea to have the package modified to have it enabled by default since it seems like that's the standard?

    I just checked on both sonicwall enhanced and standard and DPD is enabled (Under Firewall>Advanced, if anyone needs it). Odd that RC1 had the option for it, but did not need it. I noticed that SonicWALL's use a DPD setting of 60 seconds, does it matter that it's different than what I have in pfSense? If one side is enabled, both have to be?



  • Ah, no, you do not want dpd toggled by default.

    I have 2 tunnels to devices that if I toggle dpd on for those they will repeatedly drop.

    So no value in that field means that DPD is disabled and racoon will not negotiate support for it. The other endpoint will see this and correctly ignore sending DPD messages.



  • The only reason I suggested adding a default to the dpd setting is because it seems like most firewalls have that setting enabled by default.


Log in to reply