10GB Lan causing strange performance issues, goes away when switched over to 1GB
-
See: https://docs.netgate.com/pfsense/en/latest/hardware/tune.html#flow-control
For ix NICs, which I assume you have, you can set it globally like that or via a sysctl for each NIC like:
[24.11-RELEASE][admin@6100.stevew.lan]/root: sysctl -d dev.ix.0.fc dev.ix.0.fc: Set flow control mode using these values: 0 - off 1 - rx pause 2 - tx pause 3 - tx and rx pause [24.11-RELEASE][admin@6100.stevew.lan]/root: sysctl dev.ix.0.fc=3 dev.ix.0.fc: 0 -> 3
-
@stephenw10 Could I simply add the following below to the System Tuneables section or is that not preferred, I've read that some people are having issues with this setting being persistent.
Last should I disable Flow control on just the WAN nic or LAN or both WAN & LAN Nics, thanks.
dev.ix.0.fc=0
-
I see the opposite behavior. I have Netgear XS728T switch (24x10Gb/s and 4 SFP+). All my machines that have 10Gb/s NICs (mix of Intel x710 and x550) have full internet speed (1.5Gb/s). All machines that have 1Gb/s NICs max out at 300 Mb/s in tests, be it Ookla SpeedTest or iperf3 test to public servers.
pfSense has Intel x710 NIC connected via SPF+ to RJ45 adapter to Rogers (Canada) XB8 modem, and Intel SPF+ to Netgear Switch SFP+ port. I tried all possible permutations of switch parameters with regard to flow control, green Ethernet, etc. to no avail.
Any idea how to change flow control for ixl on pfSense? That's about the only thing I did not try. Would that even make a difference? Works for faster connection, but not for slower one. Doesn't look like flow control issue. Any other idea?
iperf3 tests on LAN always max out regardless of 10 or 1 Gb/s NIC.
-
So I first tried disabling Flow Control on the WAN via the system tunables, ran benchmark and same performance issue. I then disabled on LAN in addition, same issue. I rebooted firewall, same issue.
Starting to think this is not worth the hassle and I should just go back to 1GB on WAN & LAN, however I would really like to understand where my issue lies.
Thanks.
-
I also tried it your way with
sysctl dev.ix.0.fc=0
and
sysctl dev.ix.1.fc=0
via command line.
No change to benchmark, still only 250Mbps. I also tried changing the LAN MTU to 9000, made no difference. I have reverted all settings and moved back to 1GB again and boom, speed is back.
I was planning on upgrading to a full 10GB switch for my internal lan in the near future so I would love to figure this out. Any other suggestions or logs I can look at.
-
It may have been disabled already. Some connections require flow control to be enabled to prevent continuously overrunning the buffers.
It also won't do anything (or shouldn't) if the other end has flow control disabled. You want to have both ends set the same.
-
@stephenw10 The output of the command was that the value went from 3 --> 0. I benchmarked it all ways, both on, both off and mixed, no difference in speed, all terrible. I also confirmed that my Cisco 3650 switch does not support flow control, so its disabled there already.
Any other suggestions, tests, logs, etc.
Thanks.
-
Try running some iperf tests locally across that LAN link. See if you can replicate the same low throughput there.
-
I ran some tests, net result = internal transfer speeds are perfect.
pfSense LAN on 1Gb Port - (Workstation on 1Gb Nic – SFTP File Transfer of 1.5GB ISO file to /Home = 109MiB/s (Full Speed)
pfSense LAN on 10Gb NIC (SPF+ RJ45) - (Workstation on 1Gb Nic – SFTP File Transfer of 1.5GB ISO file to /Home = 110MiB/s (Full Speed)
Internet still suffers when Lan connect to 10Gb, this should be 930Mbps not 190Mbps.
What else can we try, any logs worth pulling or viewing while doing an internet speedtest ?
-
Hmm, you said you tried setting MTU values but this does feel like it could be a fragmentation issue. A packet capture should show that.
Is the speed equally bad in both directions?
-
@stephenw10 I captured a PCAP, nothing is jumping out at me, anything thing specifically I should be filtering for or looking for in regards to fragmentation within Wireshark ?
-
I tried the following wireshark filters
ip.fragment
ip.flags.mf ==1 or ip.frag_offset gt 0
I get 0 returned data, this is leading me to believe there is no fragmentation going on.
-
@stephenw10 said in 10GB Lan causing strange performance issues, goes away when switched over to 1GB:
Is the speed equally bad in both directions?
This could be telling if it's not.
-
@stephenw10 Are you suggesting that I send a large file from the pfsense side to a target SFTP server on my LAN and see if it can sustain the same level of performance as my other tests ?
-
Yes. Or just when you test against fast.com do you also see restricted upload? Assuming your WAN is 1G symmetric.
-
@stephenw10 Ah, sorry, that will not be a good test. I am on cable internet. My download speed is 1Gb but my upload is only 30Mb :( so sadly that test will be of no value.
Anything else we can play with or check in logs, again no fragmentation in the PCAP, looks clean. Its like pfsense is just tanking.
I also tried enabling all the hardware offloading, was previously disabled, no difference.
-
This is interesting.
The port on my switch for the client/workstation shows output drops, this rapidly goes up when I run a speed test.
But the 10GB port uplink to the firewall shows none.
Perhaps the issue is on the Cisco side ?
My Understanding of the 3650 is that it does not have true flow control support
-
To add to this, the Total output drops stop once I switch back to the 1Gb Lan connection.
So there is clearly something happening on the Cisco side regarding the 10GB SPF+ connection in that all the client ports are registering output drops.
-
Hmm, that is curious. You would not think the 10G link should make any difference there. The total rate is still limited by the incoming WAN to less than the 1G link to the client.
But it does start to look like an issue between the switch and client I agree. Try testing from a different client or different NIC type.
I would also try enabling whatever flow control the switch does have. At least as a test.
-
This article seems to describe my issue.
https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.htmlSo far I tried disabling QOS on all ports on the switch and the performance has since doubled, getting 600Mpbs now appose to 300Mbps. I am still seeing output drops but not as many, so getting closer. I am at least happy and convinced this issue is purely a Cisco switch issue and not a pfSense bug.
the article is a little confusing but I sill if what they recommend does the trick.