IPSec PMTU

pakjebakmeel

Thanks for the suggestion, the tunnel is now terminated on a Ubuntu VM. I'll need to find the time this evening to move the tunnel back to PfSense for testing. I'm going to give this a try this evening.

Currently the value is:

net.inet.ipsec.dfbit = 0

If set to 0, the DF bit on the outer IPv4 header will be cleared while 1 means that the outer DF bit is set regardless from the inner DF bit and 2 indicates that the DF bit is copied from the inner header to the outer one.

I would suggest to set the value to '2'. But yes, currently set to 0 would imply that my ping with DF set gets stripped on the outter layer and cause the problems observed. It would indeed explain what I am seeing.

jwt

we've seen some trouble when setting it to =2

it's on the list to investigate, as both cmb and I think '2' makes the most sense.

pakjebakmeel

I have moved the tunnel back to PfSense.. The tunnel is up and passing data. The DF bit is being cleared as expected and the traffic is getting fragmented.

As soon as I set the value to 1 the traffic starts dropping. When I set it to 2 there is no change, traffic is still fragmenting.

sysctl -w net.inet.ipsec.dfbit=0

net.inet.ipsec.dfbit: 0 -> 0

Result:

ping 192.168.178.202 -f -l 1472

Pinging 192.168.178.202 with 1472 bytes of data:
Reply from 192.168.178.202: bytes=1472 time=17ms TTL=63
Reply from 192.168.178.202: bytes=1472 time=16ms TTL=63
Reply from 192.168.178.202: bytes=1472 time=17ms TTL=63
Reply from 192.168.178.202: bytes=1472 time=16ms TTL=63

Ping statistics for 192.168.178.202:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 16ms, Maximum = 17ms, Average = 16ms

sysctl -w net.inet.ipsec.dfbit=1

net.inet.ipsec.dfbit: 0 -> 1

Result:

ping 192.168.178.202 -f -l 1472

Pinging 192.168.178.202 with 1472 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 192.168.178.202:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss)

sysctl -w net.inet.ipsec.dfbit=2

net.inet.ipsec.dfbit: 1 -> 2

Result:

ping 192.168.178.202 -f -l 1472

Pinging 192.168.178.202 with 1472 bytes of data:
Reply from 192.168.178.202: bytes=1472 time=17ms TTL=63
Reply from 192.168.178.202: bytes=1472 time=31ms TTL=63
Reply from 192.168.178.202: bytes=1472 time=15ms TTL=63
Reply from 192.168.178.202: bytes=1472 time=25ms TTL=63

Ping statistics for 192.168.178.202:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 15ms, Maximum = 31ms, Average = 22ms

So, unfortunately not the expected results..

jwt

So many things could be wrong with your setup (blocking ICMP 'frag needed messages, needing to set dfbit on both ends, etc.)

We'll work it out in the lab, likely after 2.3

georgeman

Are there any updates on this issue? I can reproduce on 2.3.

Also, I can't find any related bug filed on Redmine

cmb

There's some kind of issue there in FreeBSD. It needs to be duplicated on stock 11-CURRENT, quantified, and reported upstream. Still on my to do list.

carl2187

I can confirm this issue still exists in 2.3.1-Release-p5.

After placing a pfsense/strongswan in place of Ubuntu/strongswan, the IPSEC tunnel accepts up to length 1472 pings (1500 overall) instead of the "correct" non-fragmenting size that StrongSwan/Ubuntu 14.04 calculates, of 1410 (1438 overall)

The issue has probably gone un-noticed by many users, as the fragmentation is transparent to the application using the tunnel, so it requires a high level of networking expertise to even detect it is occurring.

I do not see a bug for this in upstream 11. Being that 11-Alpha4 is out now, this needs addressed quickly if it is to be fixed in 11-Release. cmb, perhaps you have some leverage in the upstream dev team and could kick this along into their bug tracker?

I'm building up a vanilla 11-Alpha4 with the latest StrongSwan to test functionality independent of pfsense, and on the latest possible version to confirm the issue is really an "upstream" one. I'll report back here in the next few days with the results. Having a little trouble being that the default kernel doesn't include the IPSEC option that strongswan needs, so learning how to compile a kernel in freebsd became a pre-req to testing strongswan!

Thanks!

-Ben

carl2187

Just finished testing StrongSwan 5.4.0 on FreeBSD-11-Alpha4 to check if the "upstream" components have the same issue. And they do… Same results as OP's, and my experience in PFSense 2.3.1_5. So this is clearly not a PFSense specific issue, and it hasn't been fixed in the upcoming FreeBSD 11 release either.

Stuck with Ubuntu/StrongSwan for a bit longer...

iz

Hi,

I have a site-to-site IPSec VPN tunnel between two pfSense 4.2.2p2 instances and have run into the same issue described by the OP in this old thread.

My VPN tunnel has the following properties:
Phase 1:
Encryption Algorithm: AES-128
Authentication: SHA1
PFS: DH Group 2 (1024 bit)

Phase 2:
Protocol: ESP
Encryption Algorithm: AES-128
Hash Algorithm: SHA1
PFS: DH Group 2 (1024 bit)

If the net.inet.ipsec.dfbit parameter is set to 0 in pfSense, when I issue ICMP echo requests with the DF flag set (DF=1) and the payload size up to and including 1472 bytes to a virtual IP on the pfSense box on the other end of the site-to-site tunnel, I get ICMP replies. This means that the near-end pfSense instance happily encrypts ICMP packets whose length exceeds 1500 bytes (including the IPSec overhead) and then the near-end pfSense box fragments the encrypted IPSec packets. This behavior is certainly undesirable and one of the worst ways to handle oversized packets.

Instead, I need the pfSense box to drop packets that have the DF bit set (DF=1) and reply with a ICMP Destination Unreachable (type=3, code=4) back to the end host informing the end host of the next-hop MTU. The end host should then lower its Path MTU (PMTU) and re-send the packet with the newly set PMTU. The re-sent packet length should accommodate the IPSec overhead, so that when the near-end pfSense encapsulates the packet in IPSec, the packet size does not exceed the MTU on the pfSense egress interface, so no fragmentation of IPSec-encrypted packets is needed.

This process is called Path MTU Discovery (PMTUD).

So, I assume the reason that PMTUD is not working by default is because the parameter net.inet.ipsec.dfbit=0, which is the default in pfSense. I don't see any way to change this in the pfSense GUI, so I changed the value of this parameter to 1 from the pfSense shell:
sysctl -w net.inet.ipsec.dfbit=1

Then I repeated the ICMP echo request test (with the DF bit set) to the virtual interface on the far-end pfSense across the site-to-site VPN tunnel This time, if I specify the size of the ICMP payload up to and including 1410 bytes, I get the ICMP echo replies back. As soon as the payload size exceeds 1410 bytes, I see the Request timeout message instead of Message too long.

Therefore, with the parameter net.inet.ipsec.dfbit=1, pfSense drops the packets that exceed the egress interface MTU when the IPSec encapsulation is factored it, but pfSense does not send the ICMP Unreachable (type=3, code=4) message back to the host that sends the echo request. Hence, PMTUD is not functioning.

I also see a mention in this thread of the parameter net.inet.ipsec.dfbit value set to 2, which I also tried, but there is no difference from its value being set to 1. I don't know what the expected behavior is if this parameter value is set to 2, but I assume it may be "copy DF bit from the inner IP header to the outer IP header".

It's been a few years since this issue was reported by the OP, so I would like to understand why this is still not fixed. Is this the upstream issue? It's hard to believe that such an obvious bug has not been squashed yet.

This PMTUD behavior is only broken when the destination is through the IPSec tunnel. When the destination is outside the IPSec tunnel, PMTUD is working properly. For example, if I ping google.com with the DF bit set (DF=1) and the ICMP payload of 1473, I get the Message too long response, which means that pfSense drops the packet and sends ICMP Destination Unreachable (type=3, code=4) back to the host that issues the ping.

Thank you.

monster4000

There has been a bug open for a long time on this issue now:

https://redmine.pfsense.org/issues/7801

rolytheflycatcher

Is this still an open bug?

carl2187

@rolytheflycatcher @jwt

TLDR: PMTUD (path mtu discovery) is indeed still completely broken in pfsense when using ipsec tunnels.

The closest thing to a fix right now is to set net.inet.ipsec.dfbit = 2. But on pfsense/freebsd, this makes things worse because the ICMP packet too big response is not sent for the discarded packets that exceed the kernel calculated max MTU.

default is "0" in pfsense, which just clears the do not fragment bit, this works OK, but introduces delay and modification of the sent packets. "2" says "copy the do not fragment bit in both inner and outer packet". This is the desired behavior...

But setting to "2" exposes the underlying issue, there is no "ICMP Packet too big" sent back to the sender the way it should. It just discards the packet without the ICMP response.

This results in pfsense becoming a "black hole" router, as packets are SILENTLY discarded without an ICMP too big response.

The freebsd kernel seems to calculate the correct MTU for the interface with the ipsec overhead, as it starts throwing packets away at the right MTU at least.

Like the OP, I've found Linux(ubuntu 20.04)+Strongswan works perfect by default. It defaults to net.inet.ipsec.dfbit = 2, and correctly sends ICMP too big when the MTU is exceeded. Resulting in correct operation of any app because PMTUD works as designed. No MSS clamping or exotic tricks required. PMTUD accounts for all of that, so long as you don't have black hole routers in the path between communicating devices.

rolytheflycatcher

@carl2187 thank you. Your statement confirms the behaviour I am witnessing. I guess my problem (trying to make EAPTLS work) is compounded by the oversized UDP packets that the (cisco) AP tries to send during RADIUS handshake.

I guess I'll be sticking with Draytek for the foreseeable then - a shame as there is so much to like about pfSense.

rolytheflycatcher

I see that in 2.6 there are some new check-boxes specifically for VPN ("These setting will affect IPsec, OpenVPN and PPPoE Server network traffic")

IP Do-Not-Fragment compatibility
IP Fragment Reassemble

Is there any official word on whether these options are designed to circumvent the PMTUD black hole issue, or indeed if there is any sign of the fundamental issue being resolved?

rolytheflycatcher

@carl2187 It appears that using a route-based IPSec tunnel (VTI) resolves the PMTUD issue, without need for any other workarounds.

ltctech

I am bumping up against this exact same issue.

We have a S2S IPsec tunnel between two pfSense routers. Running an iperf3 between the two sites only nets about half the bandwidth. However, if I reduce the MSS for TCP to 1406 or buffer length for UDP to 1418 it gives me 90% of line bandwidth:

iperf3 -c testserver -R -M 1406
iperf3 -c testserver -R -u -b 320M -l 1418

Anything more gets fragmented into a full 1514 frame and a tiny 60 frame as verified with a packet capture on the WAN interface. This causes the bandwidth to be halved.

MSS clamping at 1406 on pfSense does seem to help for both UDP and TCP traffic. Though it may not work for all sites for all types of traffic. That's why PMTU is so important.

Are there any plans to fix this?

rolytheflycatcher

@ltctech have you tried using a route-based IPSec tunnel?

keyser

@rolytheflycatcher Really interesting - or rather sad - that this bug/issue has been there for so many years.
Suggests that FreeBSD is seeing less and less use in large installations/organisations - or that the FreeBSD community is starved for people with knowledge on how to fix core issues like this.

Such a fundamental problem does not go unnoticed in bigger installations, so it would seem policy based IPsec tunneling sees very little use when based on FreeBSD.