Just got a Protectli FW4C!

stephenw10

Well its above 100Mbps at least so it's not something restricting all traffic in the path. It may still be ESP traffic though.

I would also test setting some MSS values on the tunnel. If you are seeing packet fragmentation it can really hurt throughput.

Steve

michmoor

For what its worth i did have a similar issue like yours with IPsec throughput. Moving to NAT-T and having packets encapsulated with UDP helped alot. There was something in the path not liking ESP and clearly reducing my speed because of it.

TheWaterbug

@stephenw10 said in Just got a Protectli FW4C!:

Well its above 100Mbps at least so it's not something restricting all traffic in the path. It may still be ESP traffic though.

I would also test setting some MSS values on the tunnel. If you are seeing packet fragmentation it can really hurt throughput.

Steve

@michmoor said in Just got a Protectli FW4C!:

For what its worth i did have a similar issue like yours with IPsec throughput. Moving to NAT-T and having packets encapsulated with UDP helped alot. There was something in the path not liking ESP and clearly reducing my speed because of it.

Thanks for both of your suggestions.

I turned on MSS clamping with a max value of 1392, and my best throughput did increase from ~160 Mbps up to ~220:

[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   262 MBytes   220 Mbits/sec                  sender
[  4]   0.00-10.00  sec   259 MBytes   217 Mbits/sec                  receiver

Switching NAT-T from Auto to Force and back again did not change the results.

So it's getting better, but inch by inch.

stephenw10

You might try a much lower value just to check. I have seen IPSec tunnels that require MSS as low as 1100 to prevent fragmentation. Though not over a route as short as 10ms.

michmoor

@stephenw10 For good measure i would test another protocol like wireguard if you can. Curious if the low performance follows.

TheWaterbug

@stephenw10 said in Just got a Protectli FW4C!:

You might try a much lower value just to check. I have seen IPSec tunnels that require MSS as low as 1100 to prevent fragmentation. Though not over a route as short as 10ms.

Ok, I'll try that tonight. Does the MSS have to be set on both sides of the tunnel? And does the tunnel have to be disconnected and reconnected in order for the new value to take effect?

TheWaterbug

@michmoor said in Just got a Protectli FW4C!:

@stephenw10 For good measure i would test another protocol like wireguard if you can. Curious if the low performance follows.

The problem with WG is that I don't have a baseline, and Protectli doesn't, either. So if I get some performance number, I won't know if it's higher, lower, or exactly as expected.

I also was not successful in setting it up last time I tried.

Whereas for IPSec, we have a Netgate person letting us know that I'm way under expectations.

But WG testing would be useful down the road, once I have IPSec established and optimized.

stephenw10

It should only need to be set on one side but it doesn't hurt to se it on both.

michmoor

@thewaterbug Not sure it was asked but what Phase 2 parameters are you using?

TheWaterbug

@michmoor

Both Phase 1 and Phase 2 are AES-GCM-128, SHA256, and DH14.

michmoor

@thewaterbug Ahhh theres one more setting that helped out a lot for me. PowerD settings. Enable and set to either Maximum or HiAdaptative.

When i was running OPNsense on a Protectli a year ago i had problems with poor performance on Wireguard. The recommendation was to enable this. Once i did that things moved a lot better.

stephenw10

AES-GCM doesn't require a hash for authentication, that's one of the reasons it's faster. You can remove that. It should just ignore it already though.

TheWaterbug

@stephenw10

Ah yes. It was selected before, when I was using AES-CBC to work around the SG-1100/SafeXcel problem, and once I deselected AES-CBC and selected AES-GCM, the hash just stayed selected.

TheWaterbug

@michmoor

I'm already set to HiAdaptive on both sides. It doesn't make a difference in my test results.

TheWaterbug

@stephenw10 said in Just got a Protectli FW4C!:

AES-GCM doesn't require a hash for authentication, that's one of the reasons it's faster. You can remove that. It should just ignore it already though.

Is this true for both Phase 1 and Phase 2? If yes, I'm curious as to why the Phase 1 setup has a selector for Hash if AES-GCM is chosen as the encryption:

stephenw10

It is true but it doesn't really matter at phase 1. The phase 2 config is what actually governs the traffic over the tunnel once it's established.

TheWaterbug

@stephenw10

While I'm mulling over how to improve throughput on the MBT-2220 side, I thought I'd put the two FW4C units on the bench and try them out, side-by-side, with only 6' of cabling between them, <<<< 1 ms ping, and no other traffic:

alt text

The best I could achieve was 626 Mbps over a 10 hour period.

Things that puzzle me:

Throughput seems to vary from run to run, despite there being very few variables in the setup.
- There is no internet traffic, no routing outside of the two units, and not even a switch (I have the two WAN ports connected with a cable at 2500BaseT).
- Sometimes a 10 second run will achieve ~720 Mbps
- Sometimes a 10 second run will achieve only ~300 Mbps
CPU utilization on both sender and receiver get no higher than 80%, and core temps no higher than 61ºC, but I'm still getting significantly less than the ~980 Mbps reported by Protectli.

Things I fiddled with that made no improvement:

NAT-T
MSS Clamping
Connecting the WAN ports through a 1000BaseT switch.
- This reduced throughput by maybe 5 Mbps, but that might be just sampling error.
Unchecked all the "Disable . . . " checkboxes in System > Advanced > Networking > Network Interface
iperf simultaneous connections, e.g. "-P 2" or "-P 4". No improvement, and significant degradation at > 4.
iperf TCP window size, e.g. "-w 2M" or "-w 4M". No improvement.
iperf direction, e.g. "-R". Performance is the same, and just as variable, in both directions.

Are there another tunables that might improve things in this type of lab scenario?

My real goal is to maximize application throughput in the real world, where I have 2 ISPs, 8 miles, and 10 msec of ping between my two locations, but first I want to optimize in the lab to see what's possible.

michmoor

@thewaterbug If you do just an iPerf test without VPN, what do you get?
I gotta be honest with you, I got a Protectli 2 port and 6 port. Inter-vlan at a house i can get around 970Mbps. Over a IPsec vlan where the remote site is capped at 200/10 i can saturate that link no issue. Local speedtests at each site i can cap the connection no issue.

Could it be possible that your FW4C is a dud?

TheWaterbug

@michmoor

I didn't test through a port forward, and I took one of the units out of the lab, but I can do that test on Monday. I might also be using iperf incorrectly.

Right now I've returned one of the FW4C units into service at my house, with the tunnel up between it and the MBT-2220 at the main office, and I'm watching a Veeam Backup Copy Job in progress.

This tunnel will iperf from FW4C-->MBT-2220 at ~200 Mbps, and will iperf the other way MBT-2220-->FW4C at ~135 Mbps, but this Veeam copy job is showing throughput in the "slow" direction (MBT-2220-->FW4C) of up to 250 - 300 Mbps, which is several times the iperf speed, and a lot closer to what Netgate has suggested the MBT-2220 is capable of:

alt text

So it's possible that I'm not using iperf optimally. When I iperf across the LAN, with no tunnel or port forwarding, I get 940 Mbps between two machines with just:

./iperf3 -s

on the server and

./iperf3 -c iperf.server.ip.address

or

./iperf3 -c iperf.server.ip.address -R

on the client, so I've been using that same command across the tunnel. I've experimented with -P and -w as noted above, but there might be other knobs I can turn to improve the throughput.

Because I'm not building an iperf tunnel; I'm building a tunnel to do real work, like backup copy jobs, so I need to measure it using the correct metrics.

michmoor

@thewaterbug to take the whole iperf thing out of the equation I use Speedtest running on a container. Just run and go to your server. The end.
https://hub.docker.com/r/openspeedtest/latest