Issues with IKEv2, MSchapv2, windows 10, and udp packet size
I've been having some weird issues getting ikev2 set up. It looks like pfsense is dropping re-assembled UDP packets with a length of 1620?
Wan interface is a tagged vlan, attached to a l2 switch. I also observed this behavior with the interface configured as untagged, and tested with scrubbing on and off.
Packet logs are below:
On windows, I'm getting error 809, but before that I see "connection successfully established"
On pfsense, in the ipsec logs, I see: (IPs removed)
Oct 15 16:18:29 charon 02[JOB] next event in 29s 999ms, waiting
Oct 15 16:18:29 charon 01[NET] sending packet: from xxx.xxx.xxx.wan  to yyy.yyy.yyy.client
Oct 15 16:18:29 charon 10[MGR] <25> checkin of IKE_SA successful
Oct 15 16:18:29 charon 10[MGR] <25> checkin IKE_SA (unnamed)
Oct 15 16:18:29 charon 10[NET] <25> sending packet: from xxx.xxx.xxx.wan to yyy.yyy.yyy.client (337 bytes)
Oct 15 16:18:29 charon 10[ENC] <25> generating IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) CERTREQ N(MULT_AUTH) ]
On my local machine, I see the packet arrive in wireshark. My system then sends responses 3 times, which don't appear to be received.
No. Time Source Destination Protocol Length Info
217 19.477167 yyy.yyy.yyy.client xxx.xxx.xxx.wan ISAKMP 922 IKE_SA_INIT MID=00 Initiator Request
218 19.499955 xxx.xxx.xxx.wan yyy.yyy.yyy.client ISAKMP 379 IKE_SA_INIT MID=00 Responder Response
219 19.503516 yyy.yyy.yyy.client xxx.xxx.xxx.wan IPv4 1514 Fragmented IP protocol (proto=UDP 17, off=0, ID=646f) [Reassembled in #220]
220 19.503540 yyy.yyy.yyy.client xxx.xxx.xxx.wan ISAKMP 182 IKE_AUTH MID=01 Initiator Request
226 20.502970 yyy.yyy.yyy.client xxx.xxx.xxx.wan IPv4 1514 Fragmented IP protocol (proto=UDP 17, off=0, ID=6470) [Reassembled in #227]
227 20.503028 yyy.yyy.yyy.client xxx.xxx.xxx.wan ISAKMP 182 IKE_AUTH MID=01 Initiator Request
231 21.507730 yyy.yyy.yyy.client xxx.xxx.xxx.wan IPv4 1514 Fragmented IP protocol (proto=UDP 17, off=0, ID=6471) [Reassembled in #232]
232 21.507782 yyy.yyy.yyy.client xxx.xxx.xxx.wan ISAKMP 182 IKE_AUTH MID=01 Initiator Request
However, I see them arrive in a packet capture on the firewall, but are dropped? because of a udp length that's too long?
16:18:29.221836 IP yyy.yyy.yyy.client_nat.500 > xxx.xxx.xxx.wan.500: UDP, length 880
16:18:29.225505 IP xxx.xxx.xxx.wan.500 > yyy.yyy.yyy.client_nat.500: UDP, length 337
16:18:29.253890 IP yyy.yyy.yyy.client_nat > xxx.xxx.xxx.wan: ip-proto-17
16:18:29.253930 IP yyy.yyy.yyy.client_nat.4500 > xxx.xxx.xxx.wan.4500: UDP, bad length 1620 > 1472
16:18:30.252108 IP yyy.yyy.yyy.client_nat.4500 > xxx.xxx.xxx.wan.4500: UDP, bad length 1620 > 1472
16:18:30.252149 IP yyy.yyy.yyy.client_nat > xxx.xxx.xxx.wan: ip-proto-17
16:18:31.252835 IP yyy.yyy.yyy.client_nat.4500 > xxx.xxx.xxx.wan.4500: UDP, bad length 1620 > 1472
16:18:31.257487 IP yyy.yyy.yyy.client_nat > xxx.xxx.xxx.wan: ip-proto-17
UDP fragmentation seems to be caused by Windows 10's lack of support for ike fragmentation. https://wiki.strongswan.org/projects/strongswan/wiki/FAQ#Public-key-authentication-fails-with-retransmissions-2
I also investigated decreasing the packet size as per https://forum.pfsense.org/index.php?topic=128520.0 , but we're currenly using MSchapV2 instead of client certs, so this was of limited benefit.
The real issue is that pfSENSE wasn't reconstructing the "too large" packets. They only even appear in a packet dump in promiscuous mode.
I ended up disabling Hardware Checksum Offloading, and the packets are now reassembled properly.
For reference, this is on a supermicro x10sdv-8c-tln4f board, with a chelsio t520-so-cr network card serving the WAN interface.
Reinstalled the firewall from scratch, and everything works fine.
For about 10 minutes. Then I observe the symptoms from https://forum.pfsense.org/index.php?topic=117827.15
I see the state table for the IPSEC interface full of nonsensical entries as well.
This seems to affect ONLY TCP replies to a ipsec mobile client. ICMP and UDP are unaffected, as is downlink TCP.
Testing with iperf, I observe 200mb/s down, and one packet up.
I've resolved this.
My current configuration is using RADIUS and MSCHAPv2 credentials, so multiple devices for the same user, with identical credentials.
These were getting mapped to the same SA, apparently causing forwarding wierdness?
The fix was to set peer identifier to peer ip, and replace sa to never.
Finally, to get windows 10 working, I needed to disable hardware checksum offloading. This is with a chelsio t520-so-cr, wan on a vlan, on a lacp lagg. So I may be poking an edge case. It reported bad udp checksums on the fragments, and pfsense didn't even see them when not in promiscuous mode.
Is there a wiki or something where I can contribute troubleshooting steps and known working settings? The failure modes were not what I expected, which made this take much longer to troubleshoot.
I expected that either only one client would work, or they all would, not all working for download, but breaking state tracking.