IPv6 outbound fragmentation issue - appears to be pf related

  • Hello

    I'm having an odd issue with IPv6 fragmentation. My LAN connections are all ethernet, any my WAN connection is via PPPoE with a MTU of 1492 (and MSS clamping set to 1452 as per my ISP's instructions).
    When I try to ping an external host with

    ping -6 [host] -s 1444

    everything works fine (as expected), and wireshark reports that the outgoing ICMPv6 echo request has a length of 1506 bytes, i.e. 1492 after the 14 byte ethernet header is accounted for. When I increase the size to above 1444 I receive an ICMPv6 Packet Too Big message from the internal pfSense interface (again, as expected), and the subsequent ICMPv6 echo request packet is fragmented into one with a length of 1502 bytes and one with a length of 76 bytes - however now I get a more generic ICMPv6 Destination Unreachable error (bbfc is my computer, d4ee is pfSense, 4ad3 is a remote Linode instance).
    I tried again with scrub disabled in the advanced options - this did not help.
    I then (temporarily) disabled pf - and the problem went away.

    After re-enabling pf the problem returned. I made sure that the default deny rule was logged, and then set all of my blocking rules to log. I validated this by running

    grep block /tmp/rules.debug | grep -v log

    (i.e. lines which contain block but not log) in the command prompt and the only lines returned were comments. I looked at the firewall log after testing again (with an ICMPv6 packet size > 1444) and the packets showed up as being passed.

    I then modified /etc/inc/filter.inc, firstly to create a blank rules.debug (no traffic was passed, as expected), and then to create only "pass out all" (or something similar - I can't remember exactly) and all traffic was passed (as expected), including IPv6 fragments.

    This lead me to believe that pf is the culprit somehow, as it doesn't appear to be a MTU or PMTUD issue if it works when pf is disabled.

    I then restored /etc/inc/filter.inc to it's normal state, and then tried two more tests:
    I pinged (again with a packet size large enough to cause fragmentation) a host (named ns1 for future reference) on another VLAN within my local network, i.e. the MTUs of both links are 1500 bytes. This was successful (the fragments were forwarded).
    I then artificially lowered the MTU on the interface for the subnet that ns1 is on to 1492 on the pfSense router and tried the test again. This time it didn't work - and I didn't even receive an ICMPv6 packet too large error. I checked the firewall log and it was blocked this time - and all I'd changed was the MTU?
    I then pinged ns1 from outside my network (the remote Linode instance). It received an ICMPv6 Packet Too Large message and set it's fragment sizes accordingly. In this case the firewall passed the fragments in, but blocked the outbound echo reply, despite the packet lengths being identical for the request and reply fragments (LAN side shown here - 2c15 is ns1).
    On the WAN side the sizes for the request fragments were 10 bytes lower.
    I thought was odd but I looked at the raw output from the pfSense packet capture and nothing seemed unusual there:

    LAN - IP addresses replaced with a [name]:
    20:20:27.947594 IP6 [linode_server] > [ns1]: frag (0|1440) ICMP6, echo request, seq 5, length 1440
    20:20:27.947599 IP6 [linode_server] > [ns1]: frag (1440|20)
    20:20:27.947805 IP6 [ns1] > [linode_server]: frag (0|1440) ICMP6, echo reply, seq 5, length 1440
    20:20:27.947809 IP6 [ns1] > [linode_server]: frag (1440|20)
    20:20:27.947826 IP6 [pfSense] > [ns1]: ICMP6, packet too big, mtu 1492, length 1240

    WAN (captured after the LAN capture) - IP addresses replaced with a [name]:
    20:20:57.643614 IP6 [linode_server] > [ns1]: frag (0|1440) ICMP6, echo request, seq 34, length 1440
    20:20:57.643619 IP6 [linode_server] > [ns1]: frag (1440|20)

    But in any case I know that if wireshark reports a length of 1502 bytes then it will be successfully forwarded as it can forward packets up to a wireshark-reported length of 1506 bytes.

    My conclusion thus far is that pf appears to be blocking outbound fragments, but I can't figure out why. I feel like I'm missing something obvious.

    Any help would be appreciated.

    Thank you

  • @whitburnlg

    A while ago, a similar issue was discussed WRT UDP, IIRC. I don't recall the details, but it might have been due to the way TCP vs other is handled. With TCP, when a packet fragmented, the position and count were changed to reflect the fragmentation. That doesn't happen with anything else. So, there is nothing to tie the fragments together. The first gets through, but the 2nd doesn't as the packet filter doesn't know how to handle it. Take a look at the header of the 2nd fragment to see what it contains re ICMP message type and compare with the first.

    Also, you really don't need to hide IPv6 addresses the way you might with IPv4. With SLAAC, you have a different address every day for outgoing connections and you can generate new ones, just by rebooting. This means an attacker has to find a working address somewhere in 18.4 billion, billion addresses, of a /64 prefix. That will take quite a bit of time to scan through.

    BTW, why don't people ever have even issues? 😉

  • @JKnott
    Thank you for your reply.
    I did see some things like this and this. I tried pinging the remote host from pfSense with a large packet size and that didn't work unless I disabled pf.
    My issue does seem very similar to 8165 (in that no fragments are being passed, not even the one with the ICMPv6 header) except that was apparently fixed in 2.4.4 (and I'm running 2.4.5-RELEASE-p1). Maybe it could be related to 7801?
    Plus, I was under the impression that the scrub feature in pf attempts to resemble fragments into a whole packet before checking it against the firewall rules. Maybe something is trying to forward the (too large) reassembled packet for some reason?
    I've repeated the ping from WAN to LAN test (with a packet size of 4096 this time) and have included the captures here (WAN) and here (LAN) (filtered using !tcp && !tls && !openvpn && !udp && !icmp in wireshark to filter out the irrelevant traffic - although the WAN one still has the dpinger traffic).
    The ICMPv6 header only appears in one fragment (for each ICMPv6 request / reply), but each fragment has the Identification field in the fragment header set correctly (i.e. the same for each fragment).
    The servers in question have AAAA records thus the IP addresses don't change, but I suppose since they appear in DNS I don't really need to redact them then.
    If I had an "even" issue I probably would have fixed it by now ☺

  • @whitburnlg

    I just tried pinging ipv6.google.com with 1500 bytes and got the fragmentation and no reply, as expected. However, I did not see the destination unreachable messages on either the LAN or WAN side. However, I also didn't see any response to the ping on the WAN side, which means that the ping is likely being dropped elsewhere.

    One thing to bear in mind is you're unlikely to see such a thing in practice, as most non TCP traffic would be long enough to cause this. About the only exception I can think of would be something like TFTP, which is typically used within a LAN for things like booting up diskless systems. Even VoIP phones wouldn't require much data.

  • @JKnott
    When I ping ipv6.google.com I get a reply but only ever up to 68 data bytes (no matter the amount of data I send). I'm assuming that's a bandwidth saving feature from Google though...
    The reason I was trying to get it working is because of DNSSEC responses from a name server on the LAN side of my firewall. I have a workaround in place (lowering the max-udp-size and edns-udp-size in bind), I was just wanting to see if I could get it working properly.
    I'll keep trying and update this if I have any success. Thanks again for your help.

Log in to reply