IPSec PMTU

iz

Hi,

I have a site-to-site IPSec VPN tunnel between two pfSense 4.2.2p2 instances and have run into the same issue described by the OP in this old thread.

My VPN tunnel has the following properties:
Phase 1:
Encryption Algorithm: AES-128
Authentication: SHA1
PFS: DH Group 2 (1024 bit)

Phase 2:
Protocol: ESP
Encryption Algorithm: AES-128
Hash Algorithm: SHA1
PFS: DH Group 2 (1024 bit)

If the net.inet.ipsec.dfbit parameter is set to 0 in pfSense, when I issue ICMP echo requests with the DF flag set (DF=1) and the payload size up to and including 1472 bytes to a virtual IP on the pfSense box on the other end of the site-to-site tunnel, I get ICMP replies. This means that the near-end pfSense instance happily encrypts ICMP packets whose length exceeds 1500 bytes (including the IPSec overhead) and then the near-end pfSense box fragments the encrypted IPSec packets. This behavior is certainly undesirable and one of the worst ways to handle oversized packets.

Instead, I need the pfSense box to drop packets that have the DF bit set (DF=1) and reply with a ICMP Destination Unreachable (type=3, code=4) back to the end host informing the end host of the next-hop MTU. The end host should then lower its Path MTU (PMTU) and re-send the packet with the newly set PMTU. The re-sent packet length should accommodate the IPSec overhead, so that when the near-end pfSense encapsulates the packet in IPSec, the packet size does not exceed the MTU on the pfSense egress interface, so no fragmentation of IPSec-encrypted packets is needed.

This process is called Path MTU Discovery (PMTUD).

So, I assume the reason that PMTUD is not working by default is because the parameter net.inet.ipsec.dfbit=0, which is the default in pfSense. I don't see any way to change this in the pfSense GUI, so I changed the value of this parameter to 1 from the pfSense shell:
sysctl -w net.inet.ipsec.dfbit=1

Then I repeated the ICMP echo request test (with the DF bit set) to the virtual interface on the far-end pfSense across the site-to-site VPN tunnel This time, if I specify the size of the ICMP payload up to and including 1410 bytes, I get the ICMP echo replies back. As soon as the payload size exceeds 1410 bytes, I see the Request timeout message instead of Message too long.

Therefore, with the parameter net.inet.ipsec.dfbit=1, pfSense drops the packets that exceed the egress interface MTU when the IPSec encapsulation is factored it, but pfSense does not send the ICMP Unreachable (type=3, code=4) message back to the host that sends the echo request. Hence, PMTUD is not functioning.

I also see a mention in this thread of the parameter net.inet.ipsec.dfbit value set to 2, which I also tried, but there is no difference from its value being set to 1. I don't know what the expected behavior is if this parameter value is set to 2, but I assume it may be "copy DF bit from the inner IP header to the outer IP header".

It's been a few years since this issue was reported by the OP, so I would like to understand why this is still not fixed. Is this the upstream issue? It's hard to believe that such an obvious bug has not been squashed yet.

This PMTUD behavior is only broken when the destination is through the IPSec tunnel. When the destination is outside the IPSec tunnel, PMTUD is working properly. For example, if I ping google.com with the DF bit set (DF=1) and the ICMP payload of 1473, I get the Message too long response, which means that pfSense drops the packet and sends ICMP Destination Unreachable (type=3, code=4) back to the host that issues the ping.

Thank you.

monster4000

There has been a bug open for a long time on this issue now:

https://redmine.pfsense.org/issues/7801

rolytheflycatcher

Is this still an open bug?

carl2187

@rolytheflycatcher @jwt

TLDR: PMTUD (path mtu discovery) is indeed still completely broken in pfsense when using ipsec tunnels.

The closest thing to a fix right now is to set net.inet.ipsec.dfbit = 2. But on pfsense/freebsd, this makes things worse because the ICMP packet too big response is not sent for the discarded packets that exceed the kernel calculated max MTU.

default is "0" in pfsense, which just clears the do not fragment bit, this works OK, but introduces delay and modification of the sent packets. "2" says "copy the do not fragment bit in both inner and outer packet". This is the desired behavior...

But setting to "2" exposes the underlying issue, there is no "ICMP Packet too big" sent back to the sender the way it should. It just discards the packet without the ICMP response.

This results in pfsense becoming a "black hole" router, as packets are SILENTLY discarded without an ICMP too big response.

The freebsd kernel seems to calculate the correct MTU for the interface with the ipsec overhead, as it starts throwing packets away at the right MTU at least.

Like the OP, I've found Linux(ubuntu 20.04)+Strongswan works perfect by default. It defaults to net.inet.ipsec.dfbit = 2, and correctly sends ICMP too big when the MTU is exceeded. Resulting in correct operation of any app because PMTUD works as designed. No MSS clamping or exotic tricks required. PMTUD accounts for all of that, so long as you don't have black hole routers in the path between communicating devices.

rolytheflycatcher

@carl2187 thank you. Your statement confirms the behaviour I am witnessing. I guess my problem (trying to make EAPTLS work) is compounded by the oversized UDP packets that the (cisco) AP tries to send during RADIUS handshake.

I guess I'll be sticking with Draytek for the foreseeable then - a shame as there is so much to like about pfSense.

rolytheflycatcher

I see that in 2.6 there are some new check-boxes specifically for VPN ("These setting will affect IPsec, OpenVPN and PPPoE Server network traffic")

IP Do-Not-Fragment compatibility
IP Fragment Reassemble

Is there any official word on whether these options are designed to circumvent the PMTUD black hole issue, or indeed if there is any sign of the fundamental issue being resolved?

rolytheflycatcher

@carl2187 It appears that using a route-based IPSec tunnel (VTI) resolves the PMTUD issue, without need for any other workarounds.

ltctech

I am bumping up against this exact same issue.

We have a S2S IPsec tunnel between two pfSense routers. Running an iperf3 between the two sites only nets about half the bandwidth. However, if I reduce the MSS for TCP to 1406 or buffer length for UDP to 1418 it gives me 90% of line bandwidth:

iperf3 -c testserver -R -M 1406
iperf3 -c testserver -R -u -b 320M -l 1418

Anything more gets fragmented into a full 1514 frame and a tiny 60 frame as verified with a packet capture on the WAN interface. This causes the bandwidth to be halved.

MSS clamping at 1406 on pfSense does seem to help for both UDP and TCP traffic. Though it may not work for all sites for all types of traffic. That's why PMTU is so important.

Are there any plans to fix this?

rolytheflycatcher

@ltctech have you tried using a route-based IPSec tunnel?

keyser

@rolytheflycatcher Really interesting - or rather sad - that this bug/issue has been there for so many years.
Suggests that FreeBSD is seeing less and less use in large installations/organisations - or that the FreeBSD community is starved for people with knowledge on how to fix core issues like this.

Such a fundamental problem does not go unnoticed in bigger installations, so it would seem policy based IPsec tunneling sees very little use when based on FreeBSD.