Problems with MTU and dropped packets

Ruu

I have a freebsd host directly connected to pfsense. On both sides the MTU is set to 1500, but when a larger udp packet ist sent through the pfsense, it's not fragmented, but just sent and then silently dropped by the receiving host.

this is a packet that goes through: IP 192.168.178.12.5095 > 10.11.5.18.5095: UDP, length 1472 (wireshark: 192.168.178.12 10.11.5.18 SIP/SDP 1514 Status: 200 OK)
this gets dropped by freebsd host: IP 192.168.178.12.5095 > 10.11.5.18.5095: UDP, length 1473 (wireshark: 192.168.178.12 10.11.5.18 SIP/SDP 1515 Status: 200 OK)

I expected any packet that's too large for the host to be fragmented, because same mtu is set on both interfaces. What could cause this and how could I tell pfsense to fragment any packets > MTU?

Actually pfsense even receives it as a fragmented packet on another interface, reassembles it and sends the too large packet that gets dropped…

JKnott

1473 should be accepted by a computer with a 1500 byte MTU. What happens if you send different packet sizes from pfSense? You can do this with ping, using the -l option on Windows or -s on Linux to set the ping size.

Ruu

that the 1473 byte packet gets dropped is correct (+ 28 byte overhead = 1501 byte total). But instead of just sending the 1501 byte packet, pfsense should split it into 2 fragmented packets ("don't fragment" flag ist not set).

I tried ping with different packet sizes >1472, with the same result (100% packet loss). But strangely, when I change the mtu to 1200, i get packet loss between 30% and 100%, sometimes fragmenting the packets, sometimes not…

JKnott

@Ruu:

that the 1473 byte packet gets dropped is correct (+ 28 byte overhead = 1501 byte total). But instead of just sending the 1501 byte packet, pfsense should split it into 2 fragmented packets ("don't fragment" flag ist not set).

The MTU refers to the payload of the Ethernet frame. So, the total packet size is 1500 plus Ethernet header & CRC.

Here's info:
https://en.wikipedia.org/wiki/Maximum_transmission_unit#MTUs_for_common_media

It must also not be confused with the maximum size for the physically transmitted frame. In the case of an Ethernet frame this adds an overhead of 18 bytes, or 22 bytes with an IEEE 802.1Q tag for VLAN or quality of service.

Ruu

well, the problem remains the same: small packets go through, bigger ones get send out and dropped, instead of getting fragmented by pfsense. So far I haven't found out what might be wrong in the configuration to cause this behaviour…

ifconfig shows
igb2: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
options=6400bb <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso,rxcsum_ipv6,txcsum_ipv6></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,jumbo_mtu,vlan_hwcsum,vlan_hwtso,rxcsum_ipv6,txcsum_ipv6></up,broadcast,running,simplex,multicast>

JKnott

@Ruu:

well, the problem remains the same: small packets go through, bigger ones get send out and dropped, instead of getting fragmented by pfsense.

If it's not larger than what pfSense is configured for, then it won't fragment, as it doesn't know there's a problem. Also, these days, fragmentation is often not used, as the do not fragment bit is set, causing pfSense to discard the packet and send back an ICMP message, advising the proper MTU for the next hop. In fact, this could happen, if there are multiple routers, each with a smaller next hop.

Bottom line, if pfSense is configured for 1500, then those packets should be passed without problem. Are you certain the destination is configured for 1500?

What happens if you ping with various size packets from pfSense to that computer? Or from another computer on the same network?

Derelict

What pfSense version?

That sounds similar to this:

https://redmine.pfsense.org/issues/7779

And along the same lines as this:

https://redmine.pfsense.org/issues/7801

JKnott

@Derelict:

What pfSense version?

That sounds similar to this:

https://redmine.pfsense.org/issues/7779

And along the same lines as this:

https://redmine.pfsense.org/issues/7801

Did the OP mention a VPN? Regardless, if both pfSense interfaces have a 1500 MTU, it won't fragment or reject those packets.

Ruu

@Derelict:

And along the same lines as this:

https://redmine.pfsense.org/issues/7801

Thank you Derelict, this must be the same issue, the test traffic is actually coming from an ipsec tunnel.
I failed to mention this, because everything seemed to be ok on the ipsec interface.

Pfsense Version was originally 2.3.4, then updated and retested with 2.4.1

Derelict

Note how if the traffic comes over IPsec reassembled it works fine. But that is up to the other side.

That is probably why we don't see this come up more - because pfSense scrubs/reassembles before sending it over IPsec so pfSense-to-pfSense IPsec works in this case.

JKnott

@Derelict:

Note how if the traffic comes over IPsec reassembled it works fine. But that is up to the other side.

That is probably why we don't see this come up more - because pfSense scrubs/reassembles before sending it over IPsec so pfSense-to-pfSense IPsec works in this case.

That's a violation of how it's supposed to work. Only the destination is supposed to reassemble the fragments. Otherwise, you'd have a router reassemble a packet, only to have another router further on fragment it again. That's on top of wasting the router CPU time. In fact, this is a big reason why IPv6 routers are not allowed to fragment a packet. If it's too big, drop it and send back an ICMP error message. IPv4 is moving to that as well. If you look at the traffic you'll see the do not fragment flag set on everything from Linux and TCP from Windows 10. I have no idea why UDP and ICMP don't use the DF flag in Windows.

Harvy66

@JKnott:

@Derelict:

Note how if the traffic comes over IPsec reassembled it works fine. But that is up to the other side.

That is probably why we don't see this come up more - because pfSense scrubs/reassembles before sending it over IPsec so pfSense-to-pfSense IPsec works in this case.

That's a violation of how it's supposed to work. Only the destination is supposed to reassemble the fragments. Otherwise, you'd have a router reassemble a packet, only to have another router further on fragment it again. That's on top of wasting the router CPU time. In fact, this is a big reason why IPv6 routers are not allowed to fragment a packet. If it's too big, drop it and send back an ICMP error message. IPv4 is moving to that as well. If you look at the traffic you'll see the do not fragment flag set on everything from Linux and TCP from Windows 10. I have no idea why UDP and ICMP don't use the DF flag in Windows.

Maybe a violation of a router, but a firewall has an obligation to inspect the entire packet before forwarding it. There are attack vectors if a firewall does not reassemble the packet. I remember one year there was all kinds of crazy security issues in tech news because firewalls where allowing fragments through without inspecting the entire packet.

JKnott

@Harvy66:

@JKnott:

@Derelict:

Note how if the traffic comes over IPsec reassembled it works fine. But that is up to the other side.

That is probably why we don't see this come up more - because pfSense scrubs/reassembles before sending it over IPsec so pfSense-to-pfSense IPsec works in this case.

That's a violation of how it's supposed to work. Only the destination is supposed to reassemble the fragments. Otherwise, you'd have a router reassemble a packet, only to have another router further on fragment it again. That's on top of wasting the router CPU time. In fact, this is a big reason why IPv6 routers are not allowed to fragment a packet. If it's too big, drop it and send back an ICMP error message. IPv4 is moving to that as well. If you look at the traffic you'll see the do not fragment flag set on everything from Linux and TCP from Windows 10. I have no idea why UDP and ICMP don't use the DF flag in Windows.

Maybe a violation of a router, but a firewall has an obligation to inspect the entire packet before forwarding it. There are attack vectors if a firewall does not reassemble the packet. I remember one year there was all kinds of crazy security issues in tech news because firewalls where allowing fragments through without inspecting the entire packet.

Does pfSense do deep packet inspection? As I recall, firewalls work by remembering the outgoing connections and accepting incoming packets for those connections (stateful firewall). This requires only looking at the headers. Is it actually inspecting the entire packet, looking at the contents etc.? If not, then it doesn't need to assemble the entire packet. Regardless, if MTU path discovery is used, as is often the case these days, then fragmentation shouldn't happen. With MTUPD, an oversize packet will be discarded, not fragmented.

Harvy66

Remember, the packet gets fragmented at the IP layer, which means the TCP/UDP port info is only in the first packet.

https://en.wikipedia.org/wiki/IP_fragmentation_attack

Proper firewalling could be done without reassembly, but it is very complex and complexity is the enemy of security. It also prevents well known DOS attacks by forcing the firewall to reassemble.

JKnott

Perhaps it's time to review the situation. You say you have a FreeBSD host connected to pfSense, both with a 1500 byte MTU.
Later you say you have a VPN that the packets are coming in on. What is the MTU on the VPN?

You also say:

But strangely, when I change the mtu to 1200, i get packet loss between 30% and 100%, sometimes fragmenting the packets, sometimes not…

I assume this is the VPN? Or is it on the FreeBSD host?

This is where Wireshark can come in handy. Can you run it on the FreeBSD host?

Ruu

@JKnott:

Perhaps it's time to review the situation. You say you have a FreeBSD host connected to pfSense, both with a 1500 byte MTU.
Later you say you have a VPN that the packets are coming in on. What is the MTU on the VPN?

You also say:

But strangely, when I change the mtu to 1200, i get packet loss between 30% and 100%, sometimes fragmenting the packets, sometimes not…

I assume this is the VPN? Or is it on the FreeBSD host?

This is where Wireshark can come in handy. Can you run it on the FreeBSD host?

just to sum it all up:

the setup was
Router –-(IPSec)--- PFSense ---(LAN)--- FreeBSD

every interface is set to MTU 1500.

ping -s 1472 (or smaller) from Router to FreeBSD -> ok (packet loss 0%)

ping -s 1473 (or larger) from Router to FreeBSD:
IPSec: fragmented, LAN: not fragmented -> dropped by FreeBSD (packet loss 100%)

reducing MTU on LAN i.e. to 1200:
IPSec: fragmented, LAN: sometimes fragmented -> packet loss between 30% and 100%

already filed as bug some months ago, posted by Derelict (https://redmine.pfsense.org/issues/7801 and probably related https://redmine.pfsense.org/issues/7779)
so nothing to do except waiting for a fix, hopefully in 2.4.3

just switched the affected application (sip server) to tcp until there is a fix available...

JKnott

Those packets seem large for SIP. Typically, a SIP packet is small, as it only contains 20 mS or so of audio. Regardless, what it that server running on? Normally path MTU discovery should prevent oversize packets from being used, but Windows, at least in W10, does not set the do not fragment flag on UDP or ICMP packets. Linux sets it on everything. Without the DF flag, PMTUD won't work and fragmentation will occur with oversize packets.

Derelict

The investigation which triggered that bug report was due to SIP packets.

Some SIP packets can be large enough to prompt fragmentation.

It sounds like you are describing RTP, not SIP.

JKnott

^^^^
I am aware of the difference between SIP & RTP. However, many people use SIP to mean the entire connection, not separating functions. Also, while SIP can be carried over UDP, it's typically on TCP. Either way, I'm still curious as to why PMTUD wasn't used to avoid the problem, as the world is moving to it, to avoid the issues with fragmentation. As I mentioned, Linux uses it for everything, but Windows appears to use it for just TCP.

lst_hoe

I suspect we are suffering from the same issue. Our VoIP PBX does not work in some cases with VPN connected clients because of "oversized" SIP UDP datagram never really reaching the client software. We have both the server and the client on Windows and a pfSense box as IPSEC VPN gateway. The problem arises from two sides IMHO:

Windows does not set DF bit on UDP traffic, so no PMTUD is kicking in
It looks like pfSense does reassemble fragmented UDP datagrams and pass it down as "oversized" UDP inside fragmented ESP

The receiving end does decrypt the ESP fragments, but throw away the oversized UDP datagram without notice because it is bigger than the MTU on the interface it should leave and have no DF bit set. According to https://tools.ietf.org/html/rfc4303#section-3.3.4 i suspect that reassemble before encryption is only necessary for AH (transport mode), not for ESP tunnel mode. So maybe the smartes move would be to pass fragments as is in Tunnel mode and fragment as needed the ESP packets too, no?