Problems with MTU and dropped packets

Derelict

Note how if the traffic comes over IPsec reassembled it works fine. But that is up to the other side.

That is probably why we don't see this come up more - because pfSense scrubs/reassembles before sending it over IPsec so pfSense-to-pfSense IPsec works in this case.

JKnott

@Derelict:

Note how if the traffic comes over IPsec reassembled it works fine. But that is up to the other side.

That is probably why we don't see this come up more - because pfSense scrubs/reassembles before sending it over IPsec so pfSense-to-pfSense IPsec works in this case.

That's a violation of how it's supposed to work. Only the destination is supposed to reassemble the fragments. Otherwise, you'd have a router reassemble a packet, only to have another router further on fragment it again. That's on top of wasting the router CPU time. In fact, this is a big reason why IPv6 routers are not allowed to fragment a packet. If it's too big, drop it and send back an ICMP error message. IPv4 is moving to that as well. If you look at the traffic you'll see the do not fragment flag set on everything from Linux and TCP from Windows 10. I have no idea why UDP and ICMP don't use the DF flag in Windows.

Harvy66

@JKnott:

@Derelict:

Note how if the traffic comes over IPsec reassembled it works fine. But that is up to the other side.

That is probably why we don't see this come up more - because pfSense scrubs/reassembles before sending it over IPsec so pfSense-to-pfSense IPsec works in this case.

That's a violation of how it's supposed to work. Only the destination is supposed to reassemble the fragments. Otherwise, you'd have a router reassemble a packet, only to have another router further on fragment it again. That's on top of wasting the router CPU time. In fact, this is a big reason why IPv6 routers are not allowed to fragment a packet. If it's too big, drop it and send back an ICMP error message. IPv4 is moving to that as well. If you look at the traffic you'll see the do not fragment flag set on everything from Linux and TCP from Windows 10. I have no idea why UDP and ICMP don't use the DF flag in Windows.

Maybe a violation of a router, but a firewall has an obligation to inspect the entire packet before forwarding it. There are attack vectors if a firewall does not reassemble the packet. I remember one year there was all kinds of crazy security issues in tech news because firewalls where allowing fragments through without inspecting the entire packet.

JKnott

@Harvy66:

@JKnott:

@Derelict:

Note how if the traffic comes over IPsec reassembled it works fine. But that is up to the other side.

That is probably why we don't see this come up more - because pfSense scrubs/reassembles before sending it over IPsec so pfSense-to-pfSense IPsec works in this case.

That's a violation of how it's supposed to work. Only the destination is supposed to reassemble the fragments. Otherwise, you'd have a router reassemble a packet, only to have another router further on fragment it again. That's on top of wasting the router CPU time. In fact, this is a big reason why IPv6 routers are not allowed to fragment a packet. If it's too big, drop it and send back an ICMP error message. IPv4 is moving to that as well. If you look at the traffic you'll see the do not fragment flag set on everything from Linux and TCP from Windows 10. I have no idea why UDP and ICMP don't use the DF flag in Windows.

Maybe a violation of a router, but a firewall has an obligation to inspect the entire packet before forwarding it. There are attack vectors if a firewall does not reassemble the packet. I remember one year there was all kinds of crazy security issues in tech news because firewalls where allowing fragments through without inspecting the entire packet.

Does pfSense do deep packet inspection? As I recall, firewalls work by remembering the outgoing connections and accepting incoming packets for those connections (stateful firewall). This requires only looking at the headers. Is it actually inspecting the entire packet, looking at the contents etc.? If not, then it doesn't need to assemble the entire packet. Regardless, if MTU path discovery is used, as is often the case these days, then fragmentation shouldn't happen. With MTUPD, an oversize packet will be discarded, not fragmented.

Harvy66

Remember, the packet gets fragmented at the IP layer, which means the TCP/UDP port info is only in the first packet.

https://en.wikipedia.org/wiki/IP_fragmentation_attack

Proper firewalling could be done without reassembly, but it is very complex and complexity is the enemy of security. It also prevents well known DOS attacks by forcing the firewall to reassemble.

JKnott

Perhaps it's time to review the situation. You say you have a FreeBSD host connected to pfSense, both with a 1500 byte MTU.
Later you say you have a VPN that the packets are coming in on. What is the MTU on the VPN?

You also say:

But strangely, when I change the mtu to 1200, i get packet loss between 30% and 100%, sometimes fragmenting the packets, sometimes not…

I assume this is the VPN? Or is it on the FreeBSD host?

This is where Wireshark can come in handy. Can you run it on the FreeBSD host?

Ruu

@JKnott:

Perhaps it's time to review the situation. You say you have a FreeBSD host connected to pfSense, both with a 1500 byte MTU.
Later you say you have a VPN that the packets are coming in on. What is the MTU on the VPN?

You also say:

But strangely, when I change the mtu to 1200, i get packet loss between 30% and 100%, sometimes fragmenting the packets, sometimes not…

I assume this is the VPN? Or is it on the FreeBSD host?

This is where Wireshark can come in handy. Can you run it on the FreeBSD host?

just to sum it all up:

the setup was
Router –-(IPSec)--- PFSense ---(LAN)--- FreeBSD

every interface is set to MTU 1500.

ping -s 1472 (or smaller) from Router to FreeBSD -> ok (packet loss 0%)

ping -s 1473 (or larger) from Router to FreeBSD:
IPSec: fragmented, LAN: not fragmented -> dropped by FreeBSD (packet loss 100%)

reducing MTU on LAN i.e. to 1200:
IPSec: fragmented, LAN: sometimes fragmented -> packet loss between 30% and 100%

already filed as bug some months ago, posted by Derelict (https://redmine.pfsense.org/issues/7801 and probably related https://redmine.pfsense.org/issues/7779)
so nothing to do except waiting for a fix, hopefully in 2.4.3

just switched the affected application (sip server) to tcp until there is a fix available...

JKnott

Those packets seem large for SIP. Typically, a SIP packet is small, as it only contains 20 mS or so of audio. Regardless, what it that server running on? Normally path MTU discovery should prevent oversize packets from being used, but Windows, at least in W10, does not set the do not fragment flag on UDP or ICMP packets. Linux sets it on everything. Without the DF flag, PMTUD won't work and fragmentation will occur with oversize packets.

Derelict

The investigation which triggered that bug report was due to SIP packets.

Some SIP packets can be large enough to prompt fragmentation.

It sounds like you are describing RTP, not SIP.

JKnott

^^^^
I am aware of the difference between SIP & RTP. However, many people use SIP to mean the entire connection, not separating functions. Also, while SIP can be carried over UDP, it's typically on TCP. Either way, I'm still curious as to why PMTUD wasn't used to avoid the problem, as the world is moving to it, to avoid the issues with fragmentation. As I mentioned, Linux uses it for everything, but Windows appears to use it for just TCP.

lst_hoe

I suspect we are suffering from the same issue. Our VoIP PBX does not work in some cases with VPN connected clients because of "oversized" SIP UDP datagram never really reaching the client software. We have both the server and the client on Windows and a pfSense box as IPSEC VPN gateway. The problem arises from two sides IMHO:

Windows does not set DF bit on UDP traffic, so no PMTUD is kicking in
It looks like pfSense does reassemble fragmented UDP datagrams and pass it down as "oversized" UDP inside fragmented ESP

The receiving end does decrypt the ESP fragments, but throw away the oversized UDP datagram without notice because it is bigger than the MTU on the interface it should leave and have no DF bit set. According to https://tools.ietf.org/html/rfc4303#section-3.3.4 i suspect that reassemble before encryption is only necessary for AH (transport mode), not for ESP tunnel mode. So maybe the smartes move would be to pass fragments as is in Tunnel mode and fragment as needed the ESP packets too, no?

DEHAAS

@lst_hoe Sorry for replying to such and old thread, but did you ever find a solution here? I suspect I am seeing the same issue on 23.01. (https://forum.netgate.com/topic/180105/fragmentation-issue-on-ipsec-vti-tunnel)

stephenw10

VTI didn't exist 5 years ago so any solution that might have been appropriate then cannot apply to your situation.

Steve