Is it bug? IPSEC child SA entries too much, olds not deleted



  • Hi. I have IPSec Site to Site VPN between head and remote offices. Configurations are the same on both sides. I click "Show child SA entries" and see that the new ones are being created but old ones not deleted. I don't know if it is normal or bug?

    Phase 1 deteils:
    Lifetime (Seconds) - 28800
    Disable rekey - checked
    Responder Only - unchecked
    NAT Traversal - Auto
    Dead Peer Detection - checked
    Delay - 10
    Max failures - 5

    Phase 2 deteils:
    Lifetime - 3600

    0_1531490446866_ipsec1.jpg


  • Netgate Administrator

    Looks like this: https://redmine.pfsense.org/issues/8364
    Which should be fixed in 2.4.3p1. What version are you running?

    Steve



  • I am running pfSemse 2.4.3 p1 version. Is there anyway to solve this problem until the next release of pfsense?


  • Netgate Administrator

    You can try a 2.4.4 snapshot. However we just went to php7 and whilst we fixed a lot of issues internally before that there will inevitably be more to find. If you can wait a week or so for the snapshots to stablilise it would be worth doing that.
    If you're still seeing that issue in 2.4.4 that bug may need to be re-opened or a new one created.

    Steve


  • Netgate

    Is it impacting traffic flow or is it just a status display issue?



  • I haven't any issue in traffic flow, it is status display issue. I have another favour to ask of you. What is "Disable rekey"bolded text option in IPSEC? I think it is for if lifetime expires ipsec connection should stop, is it right? It is checked by default, but lifetime expires and connection doesn't stop and begins again. If I uncheck this option, it gives me "Margintime" that how long before connection expiry or keying-channel expiry should attempt to negotiate a replacement begin. Could you please clarify it for me?


  • Netgate

    It is not checked by default for new tunnels in the latest version 2.4.3_1.

    https://redmine.pfsense.org/issues/8540

    I would advise unchecking that and using reasonable lifetimes for the tunnels, like 86400 for the P1 and 28800 for the P2s. Coordinate with the other side and use the same values they do.



  • Disable rekey is checked by default when creating new ipsec phase 1.

    0_1531676792525_1.jpg

    In my ipsec, lifetime for phase 1 is 86400 and in phase 2 lifetime is 28800.
    Our remote ipsec peer is Cisco ASA. Configurations are the same in both side, but our ipsec connections lasts only 30 minutes (1800 seconds). I unchecked Disable rekey and entered 300 for Margintime but it didn't work.


  • Netgate

    Ah yeah that's fixed in 2.4.4, not 2.4.3_1.

    Maybe the ASA side has shorter lifetimes programmed?

    Like I said, this needs to be coordinated with both sides. You shouldn't need to mess around with the margin time.

    The IPsec logs tell you what is going on.



  • Lifetimes and the same in both side. Connection lasts for 30 minutes and doesn't reconnect again until I click connect. But could you please let me know what the Disable rekey is exactly and when I might need this? And what is the recommended Margintime?


  • Netgate

    That checkbox disables the rekey entirely and can result in broken connections when one side has it and the other doesn't. I know I have seen it break connections to an ASA. I have only seen it break connections not fix them. It should probably never have been the default on new connections and that's why it has been changed.

    Again, the IPsec logs are where you need to be looking. They will tell you what is happening there.

    The recommended margintime is to leave it alone and use the default.

    Everything you need to know and probably then some about what is happening there: https://wiki.strongswan.org/projects/strongswan/wiki/expiryrekey



  • I'm having what seems to be the same problem. I've found others experiencing it as well. Here

    Unfortunately, this problem cannot be initiated. I just have to wait for it to happen. When it happens to me I lose connectivity until I delete the old P2's.

    I've read the "Expiry and Replacement of IKE and IPsec SAs" article at least 10 times looking for something I'm missing.



  • One thing I did notice is that when the P1 reauthenticates, the new P1 is created and installed, but the old P1 is the one that adopts the existing P2's. Then the old P1 gets destroyed. Seemed odd to me, unless the description in the log is misleading.

    Forgive my misuse of terms, I'm no IPSec guru.


  • Netgate

    If you lose connectivity something else is going on.

    If you can pinpoint the time of connectivity loss and post the IPsec logs for that connection surrounding that, something might stand out.



  • I'm experiencing the same issues multiple P2's which causes the tunnel to get stuck as the old P2's have not been removed. I have to manually disconnect the P2's which are stuck and traffic starts to flow internally via the tunnel again.

    https://forum.netgate.com/topic/132900/ipsec-phase-2-duplicate-causes-vpn-tunnel-to-get-stuck



  • Derelict, what is the default value of margintime field if it's leave blank in IKE v1?


  • Netgate

    Again, the IPsec logs show you exactly what is going on. If there are multiple P2 tunnels and the "wrong one" is selected, it is probably the other side has deleted the one you are trying to use that should be the newest one created.

    Do any of you seeing this have the Disable rekey checkbox checked on the P1 in question?

    IKEv1 or IKEv2?



  • @derelict Disabled rekey unchecked, IKE V1

    Mine affects the VPN can not ping through to main site. What commands and logs can I run/ look at to then post on here?

    https://forum.netgate.com/topic/132900/ipsec-phase-2-duplicate-causes-vpn-tunnel-to-get-stuck


  • Netgate

    Do you have more than one IPsec tunnel? If only one, Status > System Logs, IPsec.

    In this case you will want to look at the logs surrounding the time the traffic stopped flowing.

    If you can't decipher them, you'll want to sanitize them if you're sensitive to that and post them here.

    You can look at the traffic selectors in the SPDs tab in Status > IPsec. Find the one that matches the traffic in question.

    You can evaluate the counters in the P2s in Status > IPsec. That might show something interesting.

    You can look at Diagnostics > Command Prompt executing ipsec statusall there. See what that shows. and if anything is "strange."

    There was an issue with Status > IPsec that was patched. I cannot remember exactly what it was. Perhaps not showing all information necessary if split tunneling was enabled. But one of the by-products of the fix was to display ALL active tunnels. I believe these connections were always there but just filtered for display and just the newest one displayed.

    The general behavior is to re-key at a random interval prior to the lifetime expiry. In order to provide seamless traffic flow while this is happening, the old SPD hangs around so if any traffic from the other side arrives to it, it can be decrypted and forwarded. When the full lifetime expires, the SPD is deleted. If one or the other sides request deletion, it is deleted immediately. So the key to figuring out why yours is hanging is to look at the logs surrounding these re-keys and figuring out where the breakdown is. Then once you know what it is, see if there is a way to fix it.

    Chasing the multiple displayed P2 entries might be sort of a red herring.



  • @derelict Thank you for the response. I have split tunnel on the one which is being affected causing loss of service.

    When it happens again tomorrow I will take a screenshot of everything and if possible send it over to you for a review?

    I will run the she'll command as advised, I will check the tabs on Ipsec status, what's very strange is when I manually delete the P2 which looks like traffic has stopped flowing it brings the service back up, almost like the P2s are not being deleted.

    EDIT I've just remoted in checked and I think you are right, when the issue occurs I will check the IPSEC logs and System logs. Obviously when the timer has finished the old P2 should be removed and I've just seen this happen on another VPN.

    Excited to test tomorrow!



  • @Derelict In my case, I have an IPSEC tunnel which the disconnection happens at random times, so I need to go to Status -> IPSEC to disconnect and reconnect the P1 manually, even with status "Established" displayed, is not respecting the 8h rekey of P1. After this action I'm able to ping the remote hosts.

    Both sides have the same P1 and P2 key lifetime. The Disabled rekey is unchecked, IKE V1, and today I configured Margintime to 300 seconds.

    Since the disconnection happens at random times, what is the command to save all logs to a file so I can analyze it?


  • Netgate

    The best thing to do is log to a remote log server.

    If adjusting the number of log entries visible using the filter in that view is insufficient, you can use this command to save all IPsec logs:

    clog /var/log/ipsec.log > /tmp/ipsec.log.txt

    Execute that in Diagnostics > System Command

    Then, on that same page, Download File /tmp/ipsec.log.txt

    The logs kept on the firewall are circular, however, meaning old entries are overwritten by newer entries. The amount of logging kept is set in Status > System Logs, Settings, Log file size (Bytes). What you can do there depends on your disk size. I have mine set to 50000000 (50MB) on a system with a 30GB mSATA and it is still 90% free (about 3GB used Disk space currently used by log files is: 1.2G Remaining disk space for log files: 22G). You have to reset all logs further down on that page for this to take effect.

    You can save a lot of the system state in a status output file. That is taken by navigating to https://firewall.address/status.php and downoading the resulting file. On busy firewalls that might take a moment to run. And for IPsec issues the logs saved there are often insufficient so the status output should be coupled with an ipsec.log.txt file as described above.

    If you have more than one tunnel it is often beneficial to get the conXXXX number of the tunnel from ipsec statusall so you can filter on it (and filter out other tunnel logs) using grep, etc.