Bandwidth problems between sites
-
@keyser said in Bandwidth problems between sites:
@bp81 said in Bandwidth problems between sites:
@stephenw10 said in Bandwidth problems between sites:
It hit's 90Mbps with minimal packet loss using UDP?
What's the latency between the sites?
Yes, it hit 90mbps without issue. Maybe a tiny handful of packets lost, under 0.1%.
Typical latency is around 40ms.
There’s your culprit (Latency). TCP in modern operating systems does not scale their congestion/sliding window well if latency is high. Any out of order packets or packetloss will cause throughput to drop catastophically when on high latency.
40ms is a “very high” latency for TCP implementations and generally will not yield more than between 10 and 30mbps if there is the slightest out of order delivery of packets (usually are on high latency parallel processed encryption tunnels.https://accedian.com/blog/measuring-network-performance-latency-throughput-packet-loss/
You can “tweak this” by extending the TCP windows manually on operating systems in both ends of the pipe
Know of any sources of guidance on how to do this for VMware ESXi and Synology DSM?
-
@bp81 said in Bandwidth problems between sites:
Know of any sources of guidance on how to do this for VMware ESXi and Synology DSM?
No, unfortunately not - haven’t tried it on those systems.
Another possible solution is to insert a WAN accelerator device/software stack in both ends. A accelerator proxys traffic between sites and fools operating systems in both ends with immidiate TCP ACKs and what not. They also compress traffic and filters unneeded packets from the WAN link. A WAN accelerator can be ENORMOUSLY effective and allow you to utilise the link almost as LAN.
-
Yup, those can make a huge difference on high latency links. Pretty much required on Sat links for example.
However 25Mbps still 'feels' low to me for 40ms. I can hit my WAN limit here ~70Mbps when downloading from Austin and that's ~120ms.
Here's a quick test:
[22.05-RELEASE][admin@6100-2.stevew.lan]/root: fetch -o /dev/null https://atxfiles.netgate.com/mirror/downloads/pfSense-CE-2.6.0-RELEASE-amd64.iso.gz /dev/null 416 MB 5085 kBps 01m24s [22.05-RELEASE][admin@6100-2.stevew.lan]/root: ping -c 2 atxfiles.netgate.com PING files.atx.netgate.com (208.123.73.81): 56 data bytes 64 bytes from 208.123.73.81: icmp_seq=0 ttl=50 time=110.073 ms 64 bytes from 208.123.73.81: icmp_seq=1 ttl=50 time=109.770 ms --- files.atx.netgate.com ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 109.770/109.921/110.073/0.151 ms
Not as good as I have seen, 40Mbps over 110ms, but still. It does take a while for the window to scale up though. You need to test for at least a minute to actually see the maximum.
Steve
-
@keyser said in Bandwidth problems between sites:
@bp81 said in Bandwidth problems between sites:
Know of any sources of guidance on how to do this for VMware ESXi and Synology DSM?
No, unfortunately not - haven’t tried it on those systems.
Another possible solution is to insert a WAN accelerator device/software stack in both ends. A accelerator proxys traffic between sites and fools operating systems in both ends with immidiate TCP ACKs and what not. They also compress traffic and filters unneeded packets from the WAN link. A WAN accelerator can be ENORMOUSLY effective and allow you to utilise the link almost as LAN.
Are there any pfSense packages that do this, or do I need to start looking for an appliance?
-
There are not. And, as far as I know, there are no FreeBSD ports either. Someone would probably have tested it by now otherwise. I believe there are some FreeBSD based solutions that so this but I don't think there are any that are freely available. I could be wrong...
-
@stephenw10 said in Bandwidth problems between sites:
There are not. And, as far as I know, there are no FreeBSD ports either. Someone would probably have tested it by now otherwise. I believe there are some FreeBSD based solutions that so this but I don't think there are any that are freely available. I could be wrong...
Is there anything available that woudn't require me to sell my children?
-
Did some more testing this morning and I think there is at least 2, maybe 3 things going on here. I changed the settings of iperf to a longer test time and longer interval (60 second test, 30 second interval. Seems like this would simulate large file transfers pretty well). I also tested against another branch office (let's call it Brance Office 2). Branch Office 2 also has 100mbps fiber.
Branch Office 2, if I run iperf outside the vpn tunnel, I can get about 36mbps transfer speed. Latency between HQ and Branch Office 2 is 47ms. Branch Office 1, I was able to get around 40mbps if using the longer test interval with iperf and not pushing traffic through the tunnel.
In both branch offices, pushing traffic through the vpn tunnel with the longer interval is still garbage. 6-8 mbps at best.
So I do tend to think there is a VPN problem after all, but I don't think it's the only problem. I expect the latency issue is also causing problems but, at the end of the day, VPN performance seems to be the biggest immediate problem.
I know that OpenVPN is not renowned for its speed. I am in a position to use something else if necessary, and I can tweak OpenVPN if needed. Would limited packet size inside the tunnel help (figuring there could be some fragmentation issues?)
-
Ha, that I don't know. I've looked into this a few times and never found anything practical that we could test. You really would think there would be an open source implementation of this somewhere....
-
@stephenw10 said in Bandwidth problems between sites:
Ha, that I don't know. I've looked into this a few times and never found anything practical that we could test. You really would think there would be an open source implementation of this somewhere....
I found OpenNOP that seems to be an open source solution for this. It's Linux based, I'll give it a test drive and see how it goes.
But from my earlier results this morning, I don't think this is the only problem. I do think there's a problem with my VPN tunnels in addition to latency issues.
-
-
OK, got some more weirdness with additional testing. Rather than testing iperf running on each firewall, I went back to testing from a workstation at HQ to a server at Branch Office. I am testing exclusively traffic passing through the VPN tunnel.
I ran iperf and set my TCP window to the max, about 3 megabytes. In the first couple of seconds I hit some decent bandwidth numbers, around 60mbps, but it quickly recedes back to around 10mbps and settles there.
I reran iperf with the large tcp window and set the packet size to 576, just in case I was tripping across a fragmentation issue. Same behavior.
I did UDP with iperf and set my bandwidth to 80 mbps, got huge packet loss on that. Even at a miserable 10mbps it was still around 1.5%.
It feels like the router's can't keep up with traffic crossing the vpn tunnel. The CPUs are not overworked on the appliances. CPU usage on either end spiked to around 10%. It feels more like there is a buffer somewhere filling up and it can't keep up, but for the life of me I can't see that on my routers. Is there some kind of bandwidth limitation that OpenVPN does by default I'm not aware of?
-
@bp81 The jumping up and down in throughput is a textbook example of how TCP reacts during attempts to upscale the sliding windows for faster transfers, and then out of order packets arrive, or a packet it lost. Whenever that happens, TCP will reduce its scaling windows size to half.
So if you are seeing lost packets or out-of-order packets, you will have this elevator up and down on actual bandwidth/throughput.
-
@keyser said in Bandwidth problems between sites:
@bp81 The jumping up and down in throughput is a textbook example of how TCP reacts during attempts to upscale the sliding windows for faster transfers, and then out of order packets arrive, or a packet it lost. Whenever that happens, TCP will reduce its scaling windows size to half.
So if you are seeing lost packets or out-of-order packets, you will have this elevator up and down on actual bandwidth/throughput.
It's far more acute of a problem when traffic passes through VPN tunnel than going router to routr outside the tunnel though. I think it's a problem in both places, but it absolutely kills VPN performance.
-
@bp81 said in Bandwidth problems between sites:
It's far more acute of a problem when traffic passes through VPN tunnel than going router to routr outside the tunnel though. I think it's a problem in both places, but it absolutely kills VPN performance.
Your next test should be testing the TCP throughput from your workstation with Wireshark installed and running. Wiresharks decoder should fairly easily point out to you if packets are lost or recieved out-of-order. It will also clearly show you if retransmits are occuring
-
What sort of VPN are you using there? How is it configured?
-
@keyser said in Bandwidth problems between sites:
@bp81 said in Bandwidth problems between sites:
It's far more acute of a problem when traffic passes through VPN tunnel than going router to routr outside the tunnel though. I think it's a problem in both places, but it absolutely kills VPN performance.
Your next test should be testing the TCP throughput from your workstation with Wireshark installed and running. Wiresharks decoder should fairly easily point out to you if packets are lost or recieved out-of-order. It will also clearly show you if retransmits are occuring
I’ll do Wireshark next week but I did a quick and dirty test that confirmed some of my hypothesis.
I replaced the open vpn tunnel with ipsec. A default iperf test through the tunnel went from 8 mbps to 40mbps. I extended the tcp window to 3mb and I hit max bandwidth (100mbps) and sat on it for the entirety of the test.
I think this demonstrates that open vpn is slow (known issue) and that the latency is also an issue pretty conclusively.
I’m probably going to replace my site to site links with ipsec. I’d rather not, as administration of openvpn is simpler, but I have a good reason to do it. I’m not going to switch client to site links with ipsec. OpenVPN just has too much going for it in that role, and the bandwidth isn’t an issue for our limited client to site use.
-
@bp81 Very interesting observation about the VPN type.
And pretty intereseting you can get the throughput up so evenly considering you had some issues outside VPN with iPerf as well..Anyhow, never quite understod why so many love OpenVPN for enduser VPN.
I think the OpenVPN client is a pita when it comes to userinterface, maintenance and deployment. I much much prefer the simplicity of the operatingsystems builtin VPN client (IPSec). The UI is very simple and integrated in the OS. Setup is either a simple manual guide or a simple script/configurationfile that needs to be deployed.
And all modern operating systems works beautifully with the Mobile IPsec VPN in pfSense. -
@keyser said in Bandwidth problems between sites:
@bp81 Very interesting observation about the VPN type.
And pretty intereseting you can get the throughput up so evenly considering you had some issues outside VPN with iPerf as well..Anyhow, never quite understod why so many love OpenVPN for enduser VPN.
I think the OpenVPN client is a pita when it comes to userinterface, maintenance and deployment. I much much prefer the simplicity of the operatingsystems builtin VPN client (IPSec). The UI is very simple and integrated in the OS. Setup is either a simple manual guide or a simple script/configurationfile that needs to be deployed.
And all modern operating systems works beautifully with the Mobile IPsec VPN in pfSense.Our experience with the Windows VPN client has been lackluster. We are also under cybersecurity and compliance obligations to implement "paranoid levels of security" let us say. Authentication for end users is via AD / LDAP authentication AND client certificate. That last bit with the authentication does not work nicely with IPSEC as implemented in Windows; it will authenticate one and only one thing (be that certificate or credentials). It does ok if it's certificate only, but requirements are for two steps of authentication, and that's not negotiable at any level (eventually I will add the Azure AD extensions to our NPS servers and use Azure's authenticator app pop up as our second factor as we do with other systems. This might be friendlier to IPSEC).
On the flipside, with OpenVPN, all I need to do is issue a client certificate, generate a profile, and deploy that profile via PDQ deploy or simple network file copy. I can deploy to any end user at any time without even touching their workstation as long as they're inside a corporate network.
-
@bp81 said in Bandwidth problems between sites:
@keyser said in Bandwidth problems between sites:
@bp81 Very interesting observation about the VPN type.
And pretty intereseting you can get the throughput up so evenly considering you had some issues outside VPN with iPerf as well..Anyhow, never quite understod why so many love OpenVPN for enduser VPN.
I think the OpenVPN client is a pita when it comes to userinterface, maintenance and deployment. I much much prefer the simplicity of the operatingsystems builtin VPN client (IPSec). The UI is very simple and integrated in the OS. Setup is either a simple manual guide or a simple script/configurationfile that needs to be deployed.
And all modern operating systems works beautifully with the Mobile IPsec VPN in pfSense.Our experience with the Windows VPN client has been lackluster. We are also under cybersecurity and compliance obligations to implement "paranoid levels of security" let us say. Authentication for end users is via AD / LDAP authentication AND client certificate. That last bit with the authentication does not work nicely with IPSEC as implemented in Windows; it will authenticate one and only one thing (be that certificate or credentials). It does ok if it's certificate only, but requirements are for two steps of authentication, and that's not negotiable at any level (eventually I will add the Azure AD extensions to our NPS servers and use Azure's authenticator app pop up as our second factor as we do with other systems. This might be friendlier to IPSEC).
On the flipside, with OpenVPN, all I need to do is issue a client certificate, generate a profile, and deploy that profile via PDQ deploy or simple network file copy. I can deploy to any end user at any time without even touching their workstation as long as they're inside a corporate network.
It’s true that Windows IPSec client is less than happy about anything but simple authentication (User or certificate).
But all my clients are using Azure/Office365 anyways, so they all use two factor auth on VPN with the Azure plugin on the Authenticating Radius Server. This does require the clients to have the Microsoft authenticator app on a smart device, but it works beautifully :-) -
@bp81 And then you only have to have a AD GPO that sets up the VPN on all required clients - fully automatic and never requires user intervention or manual procedures.