10GB Lan causing strange performance issues, goes away when switched over to 1GB
-
So I first tried disabling Flow Control on the WAN via the system tunables, ran benchmark and same performance issue. I then disabled on LAN in addition, same issue. I rebooted firewall, same issue.
Starting to think this is not worth the hassle and I should just go back to 1GB on WAN & LAN, however I would really like to understand where my issue lies.
Thanks.
-
I also tried it your way with
sysctl dev.ix.0.fc=0
and
sysctl dev.ix.1.fc=0
via command line.
No change to benchmark, still only 250Mbps. I also tried changing the LAN MTU to 9000, made no difference. I have reverted all settings and moved back to 1GB again and boom, speed is back.
I was planning on upgrading to a full 10GB switch for my internal lan in the near future so I would love to figure this out. Any other suggestions or logs I can look at.
-
It may have been disabled already. Some connections require flow control to be enabled to prevent continuously overrunning the buffers.
It also won't do anything (or shouldn't) if the other end has flow control disabled. You want to have both ends set the same.
-
@stephenw10 The output of the command was that the value went from 3 --> 0. I benchmarked it all ways, both on, both off and mixed, no difference in speed, all terrible. I also confirmed that my Cisco 3650 switch does not support flow control, so its disabled there already.
Any other suggestions, tests, logs, etc.
Thanks.
-
Try running some iperf tests locally across that LAN link. See if you can replicate the same low throughput there.
-
I ran some tests, net result = internal transfer speeds are perfect.
pfSense LAN on 1Gb Port - (Workstation on 1Gb Nic โ SFTP File Transfer of 1.5GB ISO file to /Home = 109MiB/s (Full Speed)
pfSense LAN on 10Gb NIC (SPF+ RJ45) - (Workstation on 1Gb Nic โ SFTP File Transfer of 1.5GB ISO file to /Home = 110MiB/s (Full Speed)
Internet still suffers when Lan connect to 10Gb, this should be 930Mbps not 190Mbps.
What else can we try, any logs worth pulling or viewing while doing an internet speedtest ?
-
Hmm, you said you tried setting MTU values but this does feel like it could be a fragmentation issue. A packet capture should show that.
Is the speed equally bad in both directions?
-
@stephenw10 I captured a PCAP, nothing is jumping out at me, anything thing specifically I should be filtering for or looking for in regards to fragmentation within Wireshark ?
-
I tried the following wireshark filters
ip.fragment
ip.flags.mf ==1 or ip.frag_offset gt 0
I get 0 returned data, this is leading me to believe there is no fragmentation going on.
-
@stephenw10 said in 10GB Lan causing strange performance issues, goes away when switched over to 1GB:
Is the speed equally bad in both directions?
This could be telling if it's not.
-
@stephenw10 Are you suggesting that I send a large file from the pfsense side to a target SFTP server on my LAN and see if it can sustain the same level of performance as my other tests ?
-
Yes. Or just when you test against fast.com do you also see restricted upload? Assuming your WAN is 1G symmetric.
-
@stephenw10 Ah, sorry, that will not be a good test. I am on cable internet. My download speed is 1Gb but my upload is only 30Mb :( so sadly that test will be of no value.
Anything else we can play with or check in logs, again no fragmentation in the PCAP, looks clean. Its like pfsense is just tanking.
I also tried enabling all the hardware offloading, was previously disabled, no difference.
-
This is interesting.
The port on my switch for the client/workstation shows output drops, this rapidly goes up when I run a speed test.
But the 10GB port uplink to the firewall shows none.
Perhaps the issue is on the Cisco side ?
My Understanding of the 3650 is that it does not have true flow control support
-
To add to this, the Total output drops stop once I switch back to the 1Gb Lan connection.
So there is clearly something happening on the Cisco side regarding the 10GB SPF+ connection in that all the client ports are registering output drops.
-
Hmm, that is curious. You would not think the 10G link should make any difference there. The total rate is still limited by the incoming WAN to less than the 1G link to the client.
But it does start to look like an issue between the switch and client I agree. Try testing from a different client or different NIC type.
I would also try enabling whatever flow control the switch does have. At least as a test.
-
This article seems to describe my issue.
https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.htmlSo far I tried disabling QOS on all ports on the switch and the performance has since doubled, getting 600Mpbs now appose to 300Mbps. I am still seeing output drops but not as many, so getting closer. I am at least happy and convinced this issue is purely a Cisco switch issue and not a pfSense bug.
the article is a little confusing but I sill if what they recommend does the trick.
-
Ah, nice. Yeah I would never have suspected that, good catch!
-
@ngr2001 This was discussed 3+ years ago @ this thread
This is a TCP flow control negotiation issue that exists somewhere upstream from the 1GbE LAN client. For me, I am unsure if this is pfSense or the Comcast Cable modem. One way to deal with this is using ethernet flow control but it is an ugly sledgehammer solution.
The Cisco solution is to put this in your 3850 config to increase the buffers for the switch ports that are suffering from output drops:
qos queue-softmax-multiplier 1200
-
Hmm, hard to see how TCP flow control could lead to packet drops from a switch...
Unless the client fills it's buffers and can no longer accept packets maybe...