Another update:
Reinstalled the firewall from scratch, and everything works fine.
For about 10 minutes. Then I observe the symptoms from https://forum.pfsense.org/index.php?topic=117827.15
I see the state table for the IPSEC interface full of nonsensical entries as well.
This seems to affect ONLY TCP replies to a ipsec mobile client. ICMP and UDP are unaffected, as is downlink TCP.
Testing with iperf, I observe 200mb/s down, and one packet up.
Edit:
I've resolved this.
My current configuration is using RADIUS and MSCHAPv2 credentials, so multiple devices for the same user, with identical credentials.
These were getting mapped to the same SA, apparently causing forwarding wierdness?
The fix was to set peer identifier to peer ip, and replace sa to never.
Finally, to get windows 10 working, I needed to disable hardware checksum offloading. This is with a chelsio t520-so-cr, wan on a vlan, on a lacp lagg. So I may be poking an edge case. It reported bad udp checksums on the fragments, and pfsense didn't even see them when not in promiscuous mode.
Is there a wiki or something where I can contribute troubleshooting steps and known working settings? The failure modes were not what I expected, which made this take much longer to troubleshoot.
I expected that either only one client would work, or they all would, not all working for download, but breaking state tracking.