SSH connections over IPSec hang: how to configure MTU for IPSec properly?

vinc.pii

We have an IPSec tunnel configured to reach a remote network.

Both the phase1 and the phase2 are established correctly, however, when trying to SSH to a remote machine, SSH hangs after showing these messages:


debug1: Local version string SSH-2.0-OpenSSH_7.2 FreeBSD-20160310
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8
debug1: match: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8 pat OpenSSH_6.6.1* compat 0x04000000
debug1: Authenticating to XXX as YYY
debug1: SSH2_MSG_KEXINIT sent

This behavior is 100% reproducible.

The MTU of the NIC is 1500.
If I try to set it to 1400, then SSH works perfectly with no issues.

Unfortunately, changing the NIC MTU on all the machines that should connect to the remote endpoint is not feasible.
Changing it on the interface that we use on pfsense to establish the phase 1 is also not a viable option (plus I am not sure this would work).

I would need a permanent fix to this.
I hoped that MSS clamping on the IPSec settings could have helped, but it doesn't (we tried with values as low as 1200).

We have pfense 2.3.1.

What other options can I try to solve this problem? I don't have control of the configuration on the remote side, but I can ask for changes there.

Some additional information:

1. The remote end of IPSec is a CISCO device with interface MTU of 1500

2. Ping with big packets works in both directions


/root: ping -s 3000 -S <phase2ip>XX.XX.XX.XX
PING XX.XX.XX.XX (XX.XX.XX.XX) from <phase2_ip>: 3000 data bytes
3008 bytes from XX.XX.XX.XX: icmp_seq=0 ttl=60 time=225.563 ms
3008 bytes from XX.XX.XX.XX: icmp_seq=1 ttl=60 time=225.007 ms</phase2_ip></phase2ip>

3. If we try to establish multiple phase 2 connections from the same endpoint to different remote endpoints (one phase2 per remote endpoint), only one of them works at a time (this may be completely unrelated)

4. If I connect to my company's network (e.g., from home) via VPN, then SSH works (probably OVPN has a lower MTU)

jimp

All you should have to do is activate the MSS clamping option under the advanced IPsec options.

That should perform MSS Clamping on all traffic entering and exiting the VPN going to/from the phase 2 networks. So unless somehow it didn't match the MSS clamping rules, it should have worked.

You can look for "scrub" in /tmp/rules.debug to confirm that the MSS clamping rules are there, they should look like:


scrub from any to <vpn_networks>max-mss 1400
scrub from <vpn_networks>to any max-mss 1400</vpn_networks></vpn_networks>

vinc.pii

Thanks for the reply!

I can see the rules in the rules.debug file, so MSS clamping is happening.

To debug this further, I tried some tcpdumping on the two SSH ends: client on pfsense machine and server on a remote machine.

I can see that after some initials packets that are correctly exchanged, when the server starts generating big packets (2948 B, with TSO on the interface), the client doesn't receive them (thus hanging at "SSH2_MSG_KEXINIT sent").

Is this an indicator that the issue may be outside of the pfsense "domain"?
What can be done on pfsense to induce a correct behavior (MSS size on one side of course cannot control the one on the other direction)?

Derelict

Fragmentation happens on the sending side. If they are sending and you are not receiving there's nothing you can do to change that on the receiving side.

vinc.pii

At the end we worked this around and changed the MTU of the target machines (SSH servers) as we can afford the MTU change there (differently than on pfsense).