Peculiar throughput problem pfSense to pfSense
-
@stephenw10 Yup, and there is… Seems I’m suffering an upstream (from 2100) packet loss problem when transmission speed is going up.
Quite interesting that the clients seem to handle that loss with much less consequense for the overall throughput. They are Linux and Windows Clients.
I’ll be looking into my options for tuning or replacing the GPON SFP…..
-
Mmm, the TCP congestion control in pfSense is nothing special because it's tuned for forwading not as a TCP endpoint You may be hitting that in some unusually extreme way!
That might also explain why you see problems across the tunnel too since, presumably, the tunnel is also lossy.
-
@stephenw10 Hmm, well I took a look at the packet loss in general (from my monitoring systems), and there actually is none: < 0.0001%.
The thing is this site rarely uses its upstream bandwidth, and when it does it’s always from WiFi clients. The site has older WiFi 6 AP’s with a best case max bandwidth of slightly less than 400mbps. This is more less what the GPON link is (about 360/360).
So now I’m starting to think: Is the issue really the GPON bridge lacking buffers, but since the WiFi speed is more or less the same as the GPON, buffer drops rarely happens, whereas pfSense itself thinks it’s a Gbit Ethernet link (SFP), so it pushes way to many packets initially causing lots of bufferdrops?
If so, could I create some limiter/bandwidth shaping policy to remediate that?
-
Yes, you could create some outbound Limiters and use floating rules to capture that. Or just add an altq based shaper queue as default on WAN with a limit on it less than 360Mbps.
It would be a good test either way.
Or try testing from a wired client behind the 2100 which should easily hit the issue if it exists.
-
@stephenw10 I will run a test with a wired client later tonight.
Does the 2100 mvneta interace support ALTQ? I cannot seem to locate it in the supported interfaces list. -
Oh good point!.... Yes it is, so you should be good to test either shaper type.
It is in the list in Plus but not CE.
-
@stephenw10 Right.... Sooo, the plot thickens :-(
I cannot replicate the issue - nor the packet loss - from internal clients. Any internal client on Site B that connects to the public IP of the Site A 6100 (no IPsec) transfers the file without packet loss. The speed of the wired client starts out faster than wireless clients, but very quickly tapers and settles at 7 MB/s throughput which is similar to the wireless client.
So now I'm at a complete loss... This seem to suggest that something the pfSense itself does on heavy WAN access causes the interface or GPON to drop packets that pfSense believes it has transmitted. But the same thing does not happen to packets it forwards....
Incidentally - the wired client showed that my WAN speed is actually 450 Mbps symmetrically. I can consistently get those numbers in different tests from a wired client. The 360 Mbps I reported is obviously capped by Wireless then.
This is just baffling.....
Should I try to implement ALTQ with Codel to see if makes a difference?
-
Yes I would try using codel.
But it still 'feels' like a TCP issue from the 2100 directly. Especially since that would also apply to traffic going over the VPN.
You might try setting one of the other congestion control algorithms like:
[25.03-BETA][admin@2100-3.stevew.lan]/root: kldload cc_vegas [25.03-BETA][admin@2100-3.stevew.lan]/root: sysctl net.inet.tcp.cc net.inet.tcp.cc.vegas.beta: 3 net.inet.tcp.cc.vegas.alpha: 1 net.inet.tcp.cc.abe_frlossreduce: 0 net.inet.tcp.cc.abe: 0 net.inet.tcp.cc.hystartplusplus.bblogs: 0 net.inet.tcp.cc.hystartplusplus.css_rounds: 5 net.inet.tcp.cc.hystartplusplus.css_growth_div: 4 net.inet.tcp.cc.hystartplusplus.n_rttsamples: 8 net.inet.tcp.cc.hystartplusplus.maxrtt_thresh: 16000 net.inet.tcp.cc.hystartplusplus.minrtt_thresh: 4000 net.inet.tcp.cc.available: CCmod D PCB count cubic * 30 vegas 0 net.inet.tcp.cc.algorithm: cubic [25.03-BETA][admin@2100-3.stevew.lan]/root: sysctl net.inet.tcp.cc.algorithm=vegas net.inet.tcp.cc.algorithm: cubic -> vegas
If that makes any difference at all it would be a good clue.
-
@stephenw10 I’m leaving the site now, so this might be a tad to experimental to enable when it will be months before I’m back (in case it all goes south). Since I’m not really transferring data in/out of the PfSense itself, this is not a major issue right now. I’ll have a further look when I return
-
@stephenw10 but THANK YOU
for your invaluable knowledge and desire to help. You really are indirectly one of the invaluable qualities that makes pfSense such a fantastic product.