Mediocre download speed for some clients - upload OK
The following behavior drives me crazy:
In a policy routed multi-wan environment, the download speed of network clients are mediocre - we are talking about around 0.3MBit/s.
In controverse to this, the upload is fine and reaches around 10MBi/s.
clients -> local pfsense --openVPN--> remote pfsense (WAN-ip1)
More setup details:
the local pfsense 2.4.5.p1 has 3 GWs:
- openVPN to pfsense 2.4.5.p1 (let's name it WAN-ip1)
- openVPN to opnsense (let's name it WAN-ip2)
- direct WAN (let's name it WAN)
These GWs are also grouped in a GW group by failover, ordered WAN-ip2, WAN-ip1, WAN. So WAN ist last in Tier order.
The both openVPN connections are exactly configured the same.
Of course the IP configurations are NOT the same ;-)
The WAN-ip2 connection has a normal / expected download.
There is no traffic shaper involved.
relevant FW rules:
- policyHosts to RFC1918 => 'do nothing'
- policyHosts to * => useWAN-ip1 GW
or for my testing client:
- testingClient to RFC1918 => 'do nothing'
- testingClient to * => use GW-Group
- If WAN-ip2 is online, routing works, speed is ok, download and upload wise.
- If WAN-ip2 is down, WAN-ip1 is used from within the GW-Group. download is crippled, upload is fine.
In this case I assume, everything works the same, except the GWG choses another GW.
The openVPN connections (both) works. because of:
As WAN-ip1 is also used to route incoming openVPN connections, which do work fine, as long as they exit WAN or WAN-ip2, I can say: WAN-ip1 is able to reach normal speeds.
specialClientVPNs --> remote pfsense (WAN-ip1) --forward traffic to, via openVPN (ip1) --> local pfSense specialClientVPN server.
This said, I don't see it as a tunnel problem, as the tunnel itselfs delivers enough throughput in EACH direction in general.
Somehow the local pfSense clients won't peak over 30kb/s (stays mostly ~ 16kb/s) in download, when routed over WAN-IP1!
I am out of wisdom and knowledge for how I can debug this better.
Also this wasn't since the beginning. Somewhen it worked as expected, but I can't remember when.
I tested the tunnel with iperf and the tunnel (WAN-IP1) is able to deliver around 30MBits/s in both directions.
the WAN-IP1 pfsense has knowledge and can reach the related server within my local pfsense subnet directly. This is tested - at least by ping; in both directions.
Also good to now: the only NAT is done outgoing from WAN-IP1.
I try to avoid a double-NAT scenario.
Still, if this server tries to download something, we are at 1/100 of capacity: 0.3MBits/s consitently.
Also there are some re-transmissions in wireguard.
MTR and Traceroute, as Ping, work without lost pakets and as expected.
The WAN-IP1 is nearly in mint (unchanged from default) configuration, except a verbose HAProxy conf and minimal firewall rules to enable HAProxies duties.
My best workaround up-to-date seems to route http(s) traffic over WAN-IP2 and live with the 0.3MBit. Which, IMHO, sounds kinda ridiculous.
Any help to pin point this behavior are welcome. I even consider a bounty for solving this!
Another test. I made a GWG with WAN-IP1 as Tier1 and WAN as Tier2.
As WAN-IP1 is not able to exist without WAN, PFsense will use WAN as failover, build WAN-IP1 and use this now as Tier1 as expected.
The results are the same as from clients of local PFSense: 16kb/s.
So I can't see it as a routing problem of pfsense local.
PFsense local is able to build a 30MBits/s tunnel, but when using it not for core openVPN but instead as consumer, it also is only able to get 16kb/s (0.3MBits/s), like any network device routed through this.
As said the tunnel with OPNSense on WAN-IP2 doesn't show the behavior.
So the assumption is:
As WAN-ip2 (OPNSense) does not show this behavior, local PFSense is the same, the problem is even if the request originates from local PFSense, it must be something with the VPN Server End, the PFSense WAN-IP2. More I can't pinpoint.
Any thought appreciated!
Found it. Either this thread can be deleted or we can see if others would see the error.
Hint: I would have had assumed that with this error, nothing would have had worked, but instead it did - even with mediocre speed ;-)