Bandwidth problems between sites
-
Try running an iperf test between the firewalls directly either using the iperf package in the gui or just pkg install at the command line. A test like that loads the firewall a lot more but you should see a restriction at 25Mbps easily if something exists on the LAN side at either site. It's possible there is simple a restriction somewhere in the route between those sites that you are otherwise not hitting.
Steve
-
@stephenw10 said in Bandwidth problems between sites:
Try running an iperf test between the firewalls directly either using the iperf package in the gui or just pkg install at the command line. A test like that loads the firewall a lot more but you should see a restriction at 25Mbps easily if something exists on the LAN side at either site. It's possible there is simple a restriction somewhere in the route between those sites that you are otherwise not hitting.
Steve
Got some interesting results on this. Just using default settings on iperf client at HQ to iperf server at Branch Office, I got about the same results. Increasing the interval value improved the results, though I only ever really hit 40mbps at best, not the 90mbps I expected to hit.
I also tried UDP traffic. This gives about the performance you'd expect. From HQ to Branch office, I hit about 90mbps.
UDP through the VPN tunnel, firewall to firewall was weird. Basically it refused to work, unless I left the UDP bandwidth at the default value. Then it would work sometimes.
-
It hit's 90Mbps with minimal packet loss using UDP?
What's the latency between the sites?
-
@stephenw10 said in Bandwidth problems between sites:
It hit's 90Mbps with minimal packet loss using UDP?
What's the latency between the sites?
Yes, it hit 90mbps without issue. Maybe a tiny handful of packets lost, under 0.1%.
Typical latency is around 40ms.
-
Are you able to show is the TCP and UDP results from that test?
What sort of VPN are you using there?
-
@bp81 said in Bandwidth problems between sites:
@stephenw10 said in Bandwidth problems between sites:
It hit's 90Mbps with minimal packet loss using UDP?
What's the latency between the sites?
Yes, it hit 90mbps without issue. Maybe a tiny handful of packets lost, under 0.1%.
Typical latency is around 40ms.
There’s your culprit (Latency). TCP in modern operating systems does not scale their congestion/sliding window well if latency is high. Any out of order packets or packetloss will cause throughput to drop catastophically when on high latency.
40ms is a “very high” latency for TCP implementations and generally will not yield more than between 10 and 30mbps if there is the slightest out of order delivery of packets (usually are on high latency parallel processed encryption tunnels.https://accedian.com/blog/measuring-network-performance-latency-throughput-packet-loss/
You can “tweak this” by extending the TCP windows manually on operating systems in both ends of the pipe
-
@keyser said in Bandwidth problems between sites:
@bp81 said in Bandwidth problems between sites:
@stephenw10 said in Bandwidth problems between sites:
It hit's 90Mbps with minimal packet loss using UDP?
What's the latency between the sites?
Yes, it hit 90mbps without issue. Maybe a tiny handful of packets lost, under 0.1%.
Typical latency is around 40ms.
There’s your culprit (Latency). TCP in modern operating systems does not scale their congestion/sliding window well if latency is high. Any out of order packets or packetloss will cause throughput to drop catastophically when on high latency.
40ms is a “very high” latency for TCP implementations and generally will not yield more than between 10 and 30mbps if there is the slightest out of order delivery of packets (usually are on high latency parallel processed encryption tunnels.https://accedian.com/blog/measuring-network-performance-latency-throughput-packet-loss/
You can “tweak this” by extending the TCP windows manually on operating systems in both ends of the pipe
Know of any sources of guidance on how to do this for VMware ESXi and Synology DSM?
-
@bp81 said in Bandwidth problems between sites:
Know of any sources of guidance on how to do this for VMware ESXi and Synology DSM?
No, unfortunately not - haven’t tried it on those systems.
Another possible solution is to insert a WAN accelerator device/software stack in both ends. A accelerator proxys traffic between sites and fools operating systems in both ends with immidiate TCP ACKs and what not. They also compress traffic and filters unneeded packets from the WAN link. A WAN accelerator can be ENORMOUSLY effective and allow you to utilise the link almost as LAN.
-
Yup, those can make a huge difference on high latency links. Pretty much required on Sat links for example.
However 25Mbps still 'feels' low to me for 40ms. I can hit my WAN limit here ~70Mbps when downloading from Austin and that's ~120ms.
Here's a quick test:
[22.05-RELEASE][admin@6100-2.stevew.lan]/root: fetch -o /dev/null https://atxfiles.netgate.com/mirror/downloads/pfSense-CE-2.6.0-RELEASE-amd64.iso.gz /dev/null 416 MB 5085 kBps 01m24s [22.05-RELEASE][admin@6100-2.stevew.lan]/root: ping -c 2 atxfiles.netgate.com PING files.atx.netgate.com (208.123.73.81): 56 data bytes 64 bytes from 208.123.73.81: icmp_seq=0 ttl=50 time=110.073 ms 64 bytes from 208.123.73.81: icmp_seq=1 ttl=50 time=109.770 ms --- files.atx.netgate.com ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 109.770/109.921/110.073/0.151 ms
Not as good as I have seen, 40Mbps over 110ms, but still. It does take a while for the window to scale up though. You need to test for at least a minute to actually see the maximum.
Steve
-
@keyser said in Bandwidth problems between sites:
@bp81 said in Bandwidth problems between sites:
Know of any sources of guidance on how to do this for VMware ESXi and Synology DSM?
No, unfortunately not - haven’t tried it on those systems.
Another possible solution is to insert a WAN accelerator device/software stack in both ends. A accelerator proxys traffic between sites and fools operating systems in both ends with immidiate TCP ACKs and what not. They also compress traffic and filters unneeded packets from the WAN link. A WAN accelerator can be ENORMOUSLY effective and allow you to utilise the link almost as LAN.
Are there any pfSense packages that do this, or do I need to start looking for an appliance?
-
There are not. And, as far as I know, there are no FreeBSD ports either. Someone would probably have tested it by now otherwise. I believe there are some FreeBSD based solutions that so this but I don't think there are any that are freely available. I could be wrong...
-
@stephenw10 said in Bandwidth problems between sites:
There are not. And, as far as I know, there are no FreeBSD ports either. Someone would probably have tested it by now otherwise. I believe there are some FreeBSD based solutions that so this but I don't think there are any that are freely available. I could be wrong...
Is there anything available that woudn't require me to sell my children?
-
Did some more testing this morning and I think there is at least 2, maybe 3 things going on here. I changed the settings of iperf to a longer test time and longer interval (60 second test, 30 second interval. Seems like this would simulate large file transfers pretty well). I also tested against another branch office (let's call it Brance Office 2). Branch Office 2 also has 100mbps fiber.
Branch Office 2, if I run iperf outside the vpn tunnel, I can get about 36mbps transfer speed. Latency between HQ and Branch Office 2 is 47ms. Branch Office 1, I was able to get around 40mbps if using the longer test interval with iperf and not pushing traffic through the tunnel.
In both branch offices, pushing traffic through the vpn tunnel with the longer interval is still garbage. 6-8 mbps at best.
So I do tend to think there is a VPN problem after all, but I don't think it's the only problem. I expect the latency issue is also causing problems but, at the end of the day, VPN performance seems to be the biggest immediate problem.
I know that OpenVPN is not renowned for its speed. I am in a position to use something else if necessary, and I can tweak OpenVPN if needed. Would limited packet size inside the tunnel help (figuring there could be some fragmentation issues?)
-
Ha, that I don't know. I've looked into this a few times and never found anything practical that we could test. You really would think there would be an open source implementation of this somewhere....
-
@stephenw10 said in Bandwidth problems between sites:
Ha, that I don't know. I've looked into this a few times and never found anything practical that we could test. You really would think there would be an open source implementation of this somewhere....
I found OpenNOP that seems to be an open source solution for this. It's Linux based, I'll give it a test drive and see how it goes.
But from my earlier results this morning, I don't think this is the only problem. I do think there's a problem with my VPN tunnels in addition to latency issues.
-
-
OK, got some more weirdness with additional testing. Rather than testing iperf running on each firewall, I went back to testing from a workstation at HQ to a server at Branch Office. I am testing exclusively traffic passing through the VPN tunnel.
I ran iperf and set my TCP window to the max, about 3 megabytes. In the first couple of seconds I hit some decent bandwidth numbers, around 60mbps, but it quickly recedes back to around 10mbps and settles there.
I reran iperf with the large tcp window and set the packet size to 576, just in case I was tripping across a fragmentation issue. Same behavior.
I did UDP with iperf and set my bandwidth to 80 mbps, got huge packet loss on that. Even at a miserable 10mbps it was still around 1.5%.
It feels like the router's can't keep up with traffic crossing the vpn tunnel. The CPUs are not overworked on the appliances. CPU usage on either end spiked to around 10%. It feels more like there is a buffer somewhere filling up and it can't keep up, but for the life of me I can't see that on my routers. Is there some kind of bandwidth limitation that OpenVPN does by default I'm not aware of?
-
@bp81 The jumping up and down in throughput is a textbook example of how TCP reacts during attempts to upscale the sliding windows for faster transfers, and then out of order packets arrive, or a packet it lost. Whenever that happens, TCP will reduce its scaling windows size to half.
So if you are seeing lost packets or out-of-order packets, you will have this elevator up and down on actual bandwidth/throughput.
-
@keyser said in Bandwidth problems between sites:
@bp81 The jumping up and down in throughput is a textbook example of how TCP reacts during attempts to upscale the sliding windows for faster transfers, and then out of order packets arrive, or a packet it lost. Whenever that happens, TCP will reduce its scaling windows size to half.
So if you are seeing lost packets or out-of-order packets, you will have this elevator up and down on actual bandwidth/throughput.
It's far more acute of a problem when traffic passes through VPN tunnel than going router to routr outside the tunnel though. I think it's a problem in both places, but it absolutely kills VPN performance.
-
@bp81 said in Bandwidth problems between sites:
It's far more acute of a problem when traffic passes through VPN tunnel than going router to routr outside the tunnel though. I think it's a problem in both places, but it absolutely kills VPN performance.
Your next test should be testing the TCP throughput from your workstation with Wireshark installed and running. Wiresharks decoder should fairly easily point out to you if packets are lost or recieved out-of-order. It will also clearly show you if retransmits are occuring