Issues with IKEv2, MSchapv2, windows 10, and udp packet size



  • Hi,
      I've been having some weird issues getting ikev2 set up. It looks like pfsense is dropping re-assembled UDP packets with a length of 1620?

    I followed the guide at https://doc.pfsense.org/index.php/IKEv2_with_EAP-MSCHAPv2#Set_up_Mobile_IPsec_for_IKEv2.2BEAP-MSCHAPv2

    Wan interface is a tagged vlan, attached to a l2 switch. I also observed this behavior with the interface configured as untagged, and tested with scrubbing on and off.

    Packet logs are below:

    On windows, I'm getting error 809, but before that I see "connection successfully established"
    On pfsense, in the ipsec logs, I see: (IPs removed)

    Oct 15 16:18:29 charon 02[JOB] next event in 29s 999ms, waiting
    Oct 15 16:18:29 charon 01[NET] sending packet: from xxx.xxx.xxx.wan [500] to yyy.yyy.yyy.client[500]
    Oct 15 16:18:29 charon 10[MGR] <25> checkin of IKE_SA successful
    Oct 15 16:18:29 charon 10[MGR] <25> checkin IKE_SA (unnamed)[25]
    Oct 15 16:18:29 charon 10[NET] <25> sending packet: from xxx.xxx.xxx.wan[500] to yyy.yyy.yyy.client[500] (337 bytes)
    Oct 15 16:18:29 charon 10[ENC] <25> generating IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) CERTREQ N(MULT_AUTH) ]

    On my local machine, I see the packet arrive in wireshark. My system then sends responses 3 times, which don't appear to be received.

    No.    Time          Source                Destination          Protocol Length Info
        217 19.477167      yyy.yyy.yyy.client        xxx.xxx.xxx.wan      ISAKMP  922    IKE_SA_INIT MID=00 Initiator Request
        218 19.499955      xxx.xxx.xxx.wan      yyy.yyy.yyy.client        ISAKMP  379    IKE_SA_INIT MID=00 Responder Response
        219 19.503516      yyy.yyy.yyy.client        xxx.xxx.xxx.wan      IPv4    1514  Fragmented IP protocol (proto=UDP 17, off=0, ID=646f) [Reassembled in #220]
        220 19.503540      yyy.yyy.yyy.client        xxx.xxx.xxx.wan      ISAKMP  182    IKE_AUTH MID=01 Initiator Request
        226 20.502970      yyy.yyy.yyy.client        xxx.xxx.xxx.wan      IPv4    1514  Fragmented IP protocol (proto=UDP 17, off=0, ID=6470) [Reassembled in #227]
        227 20.503028      yyy.yyy.yyy.client        xxx.xxx.xxx.wan      ISAKMP  182    IKE_AUTH MID=01 Initiator Request
        231 21.507730      yyy.yyy.yyy.client        xxx.xxx.xxx.wan      IPv4    1514  Fragmented IP protocol (proto=UDP 17, off=0, ID=6471) [Reassembled in #232]
        232 21.507782      yyy.yyy.yyy.client        xxx.xxx.xxx.wan      ISAKMP  182    IKE_AUTH MID=01 Initiator Request

    However, I see them arrive in a packet capture on the firewall, but are dropped? because of a udp length that's too long?

    16:18:29.221836 IP yyy.yyy.yyy.client_nat.500 > xxx.xxx.xxx.wan.500: UDP, length 880
    16:18:29.225505 IP xxx.xxx.xxx.wan.500 > yyy.yyy.yyy.client_nat.500: UDP, length 337
    16:18:29.253890 IP yyy.yyy.yyy.client_nat > xxx.xxx.xxx.wan: ip-proto-17
    16:18:29.253930 IP yyy.yyy.yyy.client_nat.4500 > xxx.xxx.xxx.wan.4500: UDP, bad length 1620 > 1472
    16:18:30.252108 IP yyy.yyy.yyy.client_nat.4500 > xxx.xxx.xxx.wan.4500: UDP, bad length 1620 > 1472
    16:18:30.252149 IP yyy.yyy.yyy.client_nat > xxx.xxx.xxx.wan: ip-proto-17
    16:18:31.252835 IP yyy.yyy.yyy.client_nat.4500 > xxx.xxx.xxx.wan.4500: UDP, bad length 1620 > 1472
    16:18:31.257487 IP yyy.yyy.yyy.client_nat > xxx.xxx.xxx.wan: ip-proto-17



  • Update:

    UDP fragmentation seems to be caused by Windows 10's lack of support for ike fragmentation. https://wiki.strongswan.org/projects/strongswan/wiki/FAQ#Public-key-authentication-fails-with-retransmissions-2
    I also investigated decreasing the packet size as per https://forum.pfsense.org/index.php?topic=128520.0 , but we're currenly using MSchapV2 instead of client certs, so this was of limited benefit.

    The real issue is that pfSENSE wasn't reconstructing the "too large" packets. They only even appear in a packet dump in promiscuous mode.

    I ended up disabling Hardware Checksum Offloading, and the packets are now reassembled properly.

    For reference, this is on a supermicro x10sdv-8c-tln4f board, with a chelsio t520-so-cr network card serving the WAN interface.



  • Another update:
        Reinstalled the firewall from scratch, and everything works fine.
        For about 10 minutes. Then I observe the symptoms from https://forum.pfsense.org/index.php?topic=117827.15
        I see the state table for the IPSEC interface full of nonsensical entries as well.
        This seems to affect ONLY TCP replies to a ipsec mobile client. ICMP and UDP are unaffected, as is downlink TCP.
        Testing with iperf, I observe 200mb/s down, and one packet up.

    Edit:
      I've resolved this.

    My current configuration is using RADIUS and MSCHAPv2 credentials, so multiple devices for the same user, with identical credentials.
      These were getting mapped to the same SA, apparently causing forwarding wierdness?

    The fix was to set peer identifier to peer ip, and replace sa to never.
      Finally, to get windows 10 working, I needed to disable hardware checksum offloading. This is with a chelsio t520-so-cr, wan on a vlan, on a lacp lagg. So I may be poking an edge case. It reported bad udp checksums on the fragments, and pfsense didn't even see them when not in promiscuous mode.

    Is there a wiki or something where I can contribute troubleshooting steps and known working settings? The failure modes were not what I expected, which made this take much longer to troubleshoot.
    I expected that either only one client would work, or they all would, not all working for download, but breaking state tracking.