Version is 2.5.2 on the Azure VM and 21.05-RELEASE (amd64) on the 5100s
OVPN is site-to-site, pre-shared key, UDP on IPV4 only, Layer 3. On the remote server there is a point-to-site server (for use as a remote internet gateway). It's for travel use but nobody's travelling so there are no connections.
Latency is 27-32 ms, WAN Azure to WAN local; 100-130 ms to the other sites from WAN local.
I only have one local device so I haven't tried to replicate here. I could spin up a Hyper-V guest but not now, I am currently working on alternative method, most likely a Linux server on the local LAN, running OpenVPN as a server and NAT port forward Linux server. We are up interactively but backups through the tunnels are an issue.
Not an expert regarding state tables so I wouldn't know what to look for. I can try clearing the state tables after the trouble begins to see if that reset avoids a reboot to restore WAN performance. Would that provide useful information?
We're not running IPSEC now. We were, but IPSEC failed after a recent upgrade. We switched to OpenVPN. I have read that the IPSEC issue has been resolved but haven't switched back.
One more observation. We do have a point-to-site server running locally. There is one user, a Synology raid device that phones home and stays connected 24x7. It is used as an off-site backup device accepting snapshot replication and file share backups. It's been running without issues. It seems to be the site-to-site tunnels that are tripping us up, on the client-side.
Yeah what confused me the most is that router->internet was testing at 750mbps+ with speedtest-cli, and desktop to router (through the same switch I'm normally connected through) was testing at 400+mbps (when downloading a dummy 1GB zip file I put in the web dir on the router... and it probably could have ramped up faster if it hadn't yanked the entire file in ~20 seconds).
That's what got me looking at the router configuration itself, because obviously the router->internet connection was fine, and the PC->router connection was fine -- so it had to be something on the router itself that was bottlenecking things.
UPD: the same issue as described at the beginning of my post is happening when connecting switch to pfSense and RouterA and RouterB to that switch thus hanging two routers on one pfSense port. Seems to be not an issue with virtual switch on pfSense as in this scenario using only one port.
Once separated Port5 and Port6 on pfSense to different private subnets and attaching RouterA and RouterB independently to pfSense box (+NAT with public VIPs) issue is gone. It appeared when both routers are connected to the same bridge or external switch they can't work reliably together. But I would still appreciate if someone can point me to the right direction how to investigate that further and perhaps with some Layer-2 debugging.
@akuma1x Yes it's easy enough to buy some secondhand/commodity hardware.
Anything you can find with enough network ports and an Atom C3XXX, or Intel i3/i5/i7 processors, or even some of the more recent fast Celeron and Xeon processors. Those are all good for a pfsense box. Try to stay away from the laptop-grade mobile processors, and the older Celeron J1900 stuff. Those are going to show their age and weaknesses quicker than the other ones.
HP and Dell made/make some good small form factor stuff. Just make sure you can add at least 1 multi-port INTEL network card in there and you'll be all set with a nice pfsense firewall box.