Site to Site IPsec IKEv2 MTU/MSS clarification



  • I have two SG-3100s, both under my control (labeled pfSense below). Headquarters is behind NAT with the edge router forwarding UDP 500 & 4500, and the SG-3100 is the edge router at the remote site. I am at the remote site.
    pf_netmap.JPG
    P1 and P2 seem happy, no errors. End workstations can ping each other fine. Ping test shows 1472 bytes can traverse the tunnel, 1473 fails needs to be fragmented but DF set. Without setting DF, I can watch a giant 60kb ping fragment successfully and get a response.
    Config ("remote" at top)
    Inkedpf_config_LI.jpg
    Status ("remote" at top)
    InkedInkedpf_status_LI.jpg
    However, I am having crushing packet loss across the tunnel once I try to actually do anything, either direction. Loading the login screen of a remote switch often won't complete, and takes 20 seconds when it works. It's bad. SMB is a fail fest of duplicate ACKs, retransmissions, and all kinds of garbage in Wireshark. Some traffic trickles through. (Seems like a lot of 566 byte packets at the remote workstation if that means anything) From PC 10.0.0.12, trying to load the switch web interface from 192.168.100.2 exhibits this packet loss.

    More config / testing notes:

    • All adapters are currently at default MTU/MSS
    • MSS clamping in IPSEC | Advanced -- have tried values as low as 1200 with no improvement.
    • Set a HQ server for 1300 MTU, it did not change anything when trying to access it from remote
    • Have tried "Disable hardware checksum offload" on both ends, doesn't seem to change anything.
    • Have tried all manner of MTU/MSS settings, in IPsec | Advanced, as well as on LAN and WAN adapters. Was particularly disappointed that reducing the WAN MTU on the NAT-ted HQ didn't solve it.

    Guidance on how I can end this misery and get back to a productive life?  A couple pointed questions:

    • I've read that poking at the IPSEC config too much can cause problems that only a factory reset will cure. True or urban legend? (It would be a bit of a hassle to do this on the HQ side since I'm not there)
    • At bash shell, SG-3100 can not ping anything on the other side of the tunnel - is this as expected? Same for both. Does this indicate some misconfig?
    • Since HQ is behind a router, it would seem I'd have to reduce MTU on pfSense WAN(?) so as not to have fragmentation on the little transport network between pfSense and the edge router? Right?
    • I have not tried VTI. Would it make any difference?
    • Any reason to try values lower than 1200 on MSS?
    • I do not control the USG Pro - could there be some config missing / wrong there? Any particularly easy to overlook config in setting up this forwarding?
    • It seems to me that the MTU of some interface would definitely need to be reduced. Which?
    • Could "IP Do-Not-Fragment compatibility" or "Disable Firewall Scrub" be useful here?
    • I do not understand the use case for "NAT/BINAT translation".. could that be needed here?

    Thank you so much anyone who wants to try to help. I've been at this for days, and I know you know this is a special kind of suffering, and I feel like I'm at that point were I might/must be overlooking an obvious solution. Happy to provide other info, configs, logs.



  • Update:

    • I tried a VTI. NAT stopped working
    • I spun up a mobile client and it works fine (getting close to wire speed too at about 200mbps. Unfortunately 3DES since windows client. Any tips on better / faster cipher appreciated)

    That last one seems important. Maybe I have a routing problem and not a fragmentation problem on my S2S? Any tips appreciated.



  • @tjcooks4829 said in Site to Site IPsec IKEv2 MTU/MSS clarification:

    Guidance on how I can end this misery and get back to a productive life?  A couple pointed questions:

    I've read that poking at the IPSEC config too much can cause problems that only a factory reset will cure. True or urban legend? (It would be a bit of a hassle to do this on the HQ side since I'm not there)

    Factory reset is Urban legend, but I have found that rebooting the box does cure some strangeness when making many changes to the configuration, particularly in earlier versions.

    At bash shell, SG-3100 can not ping anything on the other side of the tunnel - is this as expected? Same for both. Does this indicate some misconfig?

    You need to explicitly set the source interface to your LAN side when you do this or use the GUI, otherwise it might be using the WAN interface to source the packets.

    Since HQ is behind a router, it would seem I'd have to reduce MTU on pfSense WAN(?) so as not to have fragmentation on the little transport network between pfSense and the edge router? Right?

    It depends... Cable modems tend to run 1500 MTU while DSL and FTTH services tend to be PPPoE encapsulated so 1492.
    A better test would be to solicit this to an outside MTU test system independently from each end to see if there is a mismatch.

    I have not tried VTI. Would it make any difference?

    VTI is a routed approach, so the tunnel only knows about the IP address of each end of the tunnel, then you need to route the appropriate subnets into the tunnel.

    Any reason to try values lower than 1200 on MSS?

    Probably not.

    I do not control the USG Pro - could there be some config missing / wrong there? Any particularly easy to overlook config in setting up this forwarding?

    IPSEC needs to be running with NAT-T since the USG Pro is doing NAT

    It seems to me that the MTU of some interface would definitely need to be reduced. Which?

    Only if MTU discovery is blocked, in which case someone has blocked ICMP packets.



  • @awebster Thank you so much, great info.

    I've abandoned S2S for now, as I've spent way too much time on it and have to deal with a bunch of stuff that has piled up in the meantime.

    Mobile client is working (almost) perfectly, and I'm super pleased with the throughput.

    A couple responses:

    • oh boy have i rebooted. Managed switch is telling me 140 link state changes -- since the last time i rebooted the switch. :-) Mostly because I've read some messages with confusion about how to properly restart IPSEC, and reboot means it for sure restarted.
    • MTU... no packet loss using Mobile Client, all defaults. My cable modem (remote side) is 1500. HQ is a fiber connection that I don't manage, but between pfSense and the USG Pro i have confirmed that it's 1500. Even so, seems like I should have to account for encapsulation overhead.... but it seems to be working. I mean, maybe the USG is just handling the fragmentation well, but I feel like I would not have the performance that I'm getting if so.

    Cheers


Log in to reply