IPSec PMTU



  • Hiya,

    I've been running an IPSec tunnel for a while between 2 Ubuntu boxes using Strongswan, I have now replaced one side with a PfSense box but I'm running into a problem.

    The Ubuntu box calculates the tunnel overhead properly and sends a frag required when the packet does not fit into the tunnel. The PfSense box does not do this, it just fragments the inner payload and reassembles it on the the other side. I do not want this because the performance is quite bad.

    I'm now using MSS clamping as a workaround but my UDP connections (rsync backups and such) run at about 50% of the line speed. A tcpdump shows a lot of fragmentation.

    Ubuntu 15.04 + Linux strongSwan U5.3.3/K3.19.0-28-generic:

    root@hlv-us00:/home/user# ping -M do -s 1394 192.168.10.10
    PING 192.168.10.10 (192.168.10.10) 1394(1422) bytes of data.
    1402 bytes from 192.168.10.10: icmp_seq=1 ttl=63 time=26.1 ms
    
    root@hlv-us00:/home/user# ping -M do -s 1395 192.168.10.10
    PING 192.168.10.10 (192.168.10.10) 1395(1423) bytes of data.
    ping: local error: Message too long, mtu=1422
    

    As you can see Linux + Strongswan calculates overhead correctly and just sends a FRAG_REQ when the inner packet no longer fits the tunnel. PfSense however shows different behaviour..

    PfSense 2.2.4 amd64:

    C:\Users\pakjebakmeel>ping -f -l 1394 192.168.178.202
    
    Pinging 192.168.178.202 with 1394 bytes of data:
    Reply from 192.168.178.202: bytes=1394 time=65ms TTL=63
    Reply from 192.168.178.202: bytes=1394 time=43ms TTL=63
    Reply from 192.168.178.202: bytes=1394 time=48ms TTL=63
    Reply from 192.168.178.202: bytes=1394 time=46ms TTL=63
    
    C:\Users\pakjebakmeel>ping -f -l 1472 192.168.178.202
    
    Pinging 192.168.178.202 with 1472 bytes of data:
    Reply from 192.168.178.202: bytes=1472 time=71ms TTL=63
    Reply from 192.168.178.202: bytes=1472 time=86ms TTL=63
    Reply from 192.168.178.202: bytes=1472 time=212ms TTL=63
    Reply from 192.168.178.202: bytes=1472 time=80ms TTL=63
    
    

    WHY U NO SEND FRAG_REQ??

    I have "Clear invalid DF bits instead of dropping the packets"  DISABLED

    How can I enable this behaviour on PfSense like it works on Linux + strongswan. This out-of-the-box behaviour breaks PMTU over IPSec which is bad. I consider MSS clamping a workaround for broken PMTU and it doesn't work for any other protocols than TCP..

    Is this a bug?
    Is this a limitation of BSD?
    Is this intentionally?

    What gives.. Anyone any idea's? Thanks.



  • I have now migrated my tunnels back to a Strongswan installation on a Ubuntu 15.04 virtual machine. PMTU is now working as expected:

    C:\Users\wsmeltekop>ping 192.168.178.1 -l 1394 -f
    
    Pinging 192.168.178.1 with 1394 bytes of data:
    Reply from 192.168.178.1: bytes=1394 time=19ms TTL=61
    Reply from 192.168.178.1: bytes=1394 time=19ms TTL=61
    Reply from 192.168.178.1: bytes=1394 time=17ms TTL=61
    Reply from 192.168.178.1: bytes=1394 time=17ms TTL=61
    
    Ping statistics for 192.168.178.1:
        Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 17ms, Maximum = 19ms, Average = 18ms
    
    C:\Users\wsmeltekop>ping 192.168.178.1 -l 1395 -f
    
    Pinging 192.168.178.1 with 1395 bytes of data:
    Packet needs to be fragmented but DF set.
    Packet needs to be fragmented but DF set.
    Packet needs to be fragmented but DF set.
    Packet needs to be fragmented but DF set.
    
    Ping statistics for 192.168.178.1:
        Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),
    

    I have done some reading and it seems that strongswan only sets up the tunnels, the kernel is responsible for encryption and 'grabbing' the packets for encapsulation. Also, the kernel should calculate the MTU overhead and handle that properly. When fragmentation=no is set in the config it should return an ICMP FRAG_REQ if it doesn't fit the resulting ESP packet is bigger than the WAN MTU.

    Why does this work fine on Ubuntu whilst FreeBSD seems to have an issue with this? Am I doing something wrong? Is this working for anyone else?



  • I would very much like to move my IPSec tunnels back to PfSense, they are now terminated on the Ubuntu box behind NAT-T.

    Has anyone got any idea's about this? I found that PfSense adds 'fragmentation=yes' to the ipsec.conf by default. I have modified the vpn.inc file to set disable fragmentation. I can confirm it is negated in the ipsec.conf file.

    Now it drops packets that are too big but still no FRAG_REQ ICMP packet..

    1. Why does PfSense not send this? It is vital to a functional connection on non-default MTU's.
    2. Am I the only one bothered with this behaviour? Anyone who can confirm this behaviour?



  • The fragmentation setting in ipsec.conf only affects IKE fragmentation, e.g. the negotiation, not anything that goes inside the tunnel. Something else changed if you saw a change in behavior of traffic in the tunnel after changing that.

    There does seem to be an issue in FreeBSD in that it'll fragment traffic with DF set if it's traversing IPsec. Or at least that seems to be the case at a quick review. Needs more investigation.


  • Netgate

    cmb should open a bug after verification of the issue.


  • Netgate

    Can you try:

    sysctl -w net.inet.ipsec.dfbit=1

    on both boxes, and report back?



  • Thanks for the suggestion, the tunnel is now terminated on a Ubuntu VM. I'll need to find the time this evening to move the tunnel back to PfSense for testing. I'm going to give this a try this evening.

    Currently the value is:

    net.inet.ipsec.dfbit = 0

    If set to 0, the DF bit on the outer IPv4 header will be cleared while 1 means that the outer DF bit is set regardless from the inner DF bit and 2 indicates that the DF bit is copied from the inner header to the outer one.

    I would suggest to set the value to '2'. But yes, currently set to 0 would imply that my ping with DF set gets stripped on the outter layer and cause the problems observed. It would indeed explain what I am seeing.


  • Netgate

    we've seen some trouble when setting it to =2

    it's on the list to investigate, as both cmb and I think '2' makes the most sense.



  • I have moved the tunnel back to PfSense.. The tunnel is up and passing data. The DF bit is being cleared as expected and the traffic is getting fragmented.

    As soon as I set the value to 1 the traffic starts dropping. When I set it to 2 there is no change, traffic is still fragmenting.

    sysctl -w net.inet.ipsec.dfbit=0

    net.inet.ipsec.dfbit: 0 -> 0

    Result:

    ping 192.168.178.202 -f -l 1472
    
    Pinging 192.168.178.202 with 1472 bytes of data:
    Reply from 192.168.178.202: bytes=1472 time=17ms TTL=63
    Reply from 192.168.178.202: bytes=1472 time=16ms TTL=63
    Reply from 192.168.178.202: bytes=1472 time=17ms TTL=63
    Reply from 192.168.178.202: bytes=1472 time=16ms TTL=63
    
    Ping statistics for 192.168.178.202:
        Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 16ms, Maximum = 17ms, Average = 16ms
    

    sysctl -w net.inet.ipsec.dfbit=1

    net.inet.ipsec.dfbit: 0 -> 1

    Result:

    ping 192.168.178.202 -f -l 1472
    
    Pinging 192.168.178.202 with 1472 bytes of data:
    Request timed out.
    Request timed out.
    Request timed out.
    Request timed out.
    
    Ping statistics for 192.168.178.202:
        Packets: Sent = 4, Received = 0, Lost = 4 (100% loss)
    

    sysctl -w net.inet.ipsec.dfbit=2

    net.inet.ipsec.dfbit: 1 -> 2

    Result:

    ping 192.168.178.202 -f -l 1472
    
    Pinging 192.168.178.202 with 1472 bytes of data:
    Reply from 192.168.178.202: bytes=1472 time=17ms TTL=63
    Reply from 192.168.178.202: bytes=1472 time=31ms TTL=63
    Reply from 192.168.178.202: bytes=1472 time=15ms TTL=63
    Reply from 192.168.178.202: bytes=1472 time=25ms TTL=63
    
    Ping statistics for 192.168.178.202:
        Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
    Approximate round trip times in milli-seconds:
        Minimum = 15ms, Maximum = 31ms, Average = 22ms
    

    So, unfortunately not the expected results..


  • Netgate

    So many things could be wrong with your setup (blocking ICMP 'frag needed messages, needing to set dfbit on both ends, etc.)

    We'll work it out in the lab, likely after 2.3



  • Are there any updates on this issue? I can reproduce on 2.3.

    Also, I can't find any related bug filed on Redmine



  • There's some kind of issue there in FreeBSD. It needs to be duplicated on stock 11-CURRENT, quantified, and reported upstream. Still on my to do list.



  • I can confirm this issue still exists in 2.3.1-Release-p5.

    After placing a pfsense/strongswan in place of Ubuntu/strongswan, the IPSEC tunnel accepts up to length 1472 pings (1500 overall) instead of the "correct" non-fragmenting size that StrongSwan/Ubuntu 14.04 calculates, of 1410 (1438 overall)

    The issue has probably gone un-noticed by many users, as the fragmentation is transparent to the application using the tunnel, so it requires a high level of networking expertise to even detect it is occurring.

    I do not see a bug for this in upstream 11. Being that 11-Alpha4 is out now, this needs addressed quickly if it is to be fixed in 11-Release. cmb, perhaps you have some leverage in the upstream dev team and could kick this along into their bug tracker?

    I'm building up a vanilla 11-Alpha4 with the latest StrongSwan to test functionality independent of pfsense, and on the latest possible version to confirm the issue is really an "upstream" one. I'll report back here in the next few days with the results. Having a little trouble being that the default kernel doesn't include the IPSEC option that strongswan needs, so learning how to compile a kernel in freebsd became a pre-req to testing strongswan!

    Thanks!

    -Ben



  • Just finished testing StrongSwan 5.4.0 on FreeBSD-11-Alpha4 to check if the "upstream" components have the same issue. And they do… Same results as OP's, and my experience in PFSense 2.3.1_5. So this is clearly not a PFSense specific issue, and it hasn't been fixed in the upcoming FreeBSD 11 release either.

    Stuck with Ubuntu/StrongSwan for a bit longer...