Baffling pfSense 2.6.0 Issue (10G Performance)
-
Hi all,
Since upgrading to 2.6, my LAN1 to LAN2 throughput is cut in half, from ~9.7 Gbits/sec to 4.3 Gbits/sec. Tested using iperf from LAN1 client to server on LAN2 I've also tried iperf from LAN1 client to iperf server on pfsense and throughput is ~2 Gbits/sec.Oddly, iperf from LAN2 client to LAN1 server gets 9.7 Gbits/sec. Iperf from pfsense client to LAN1 server gets 9.7 Gbits/sec. Actual file transfers (SMB) between LAN1 and LAN2 exhibit same issue (max throughput from LAN1 to LAN2 about half of LAN2 to LAN1).
I get full 10G speeds on same LAN segment with the same clients/servers.
The only tunable that has changed in the upgrade to 2.6 is net.link.ifqmaxlen, which was overwritten to net.link.ifqmaxlen=128: see issue here https://redmine.pfsense.org/issues/12862. But I applied the patch and changed the value in .local but no difference in speed (the patch results in a conflict between loader.conf and loader.conf.local so I reverted).
I would really appreciate any tips or suggestions as to how to diagnose/fix this!
Thank you very much for any help.
-
What did you upgrade from?
What hardware are you running?
Do you see any errors on the NICs?
Since it looks like you are seeing different speeds in each direction try opening the state the other way. I.e. run iperf3 with the '-R' switch where it reverses the test direction but still opens the state the same way. Does the slow speed follow the test direction or the state?
Steve
-
@stephenw10 Hi I upgraded from 2.5.2
The pfsense is BM on Supermicro 5018D-FN8T, NICs are LAN1 intel x552 and LAN2 x710.
I see no errors. I've tried the -R switch. The slow speed follows the test direction. In other words, client to server without -R is 4.3 Gbps but with -R is 9.7 Gpbs.
Thank you for your help!
-
@anyn12 said in Baffling pfSense 2.6.0 Issue (10G Performance):
client to server without -R is 4.3 Gbps but with -R is 9.7 Gpbs
That's from a client on LAN1 to server on LAN2?
That implies it's slower with the X552 receiving. How many queues are those NICs creating? Half as many on the X552?
What does the per-core CPU usage look like while you're testing?
Runtop -aSH
at the CLI.I'm not aware of anything that would have changed going to 2.6 that would affect that but there might be some default that is calculated differently on your hardware.
I will say that I'm surprised you were seeing 9.4G in 2.5.2 on that hardware.
Steve
-
This post is deleted! -
Client on LAN1 (x552) sending a file to server on LAN2 (x710) is half the speed (4.3 Gbps) of server on LAN2 sending a file to the client. The issue seems to impact LAN2 (x710) in exactly the same way -- LAN2 as client to pfsense as server is half the speed of pfsense as server to LAN2 client.
Both the x552 and x710 have 4 queues. I've never had a problem reaching really close to 10G line speeds, and even now I can hit 9.7 Gbps in both directions if I use more than one stream in iperf. I should have clarified at the outset, I'm using single stream iperf to diagnose the throughput issue for windows file transfers.
This is top LAN1 to LAN2 (~4.5 Gbps):
🔒 Log in to viewThis is top LAN2 to LAN1 (~9.5 Gbps):
🔒 Log in to view -
Testing to or from pfSense is not really a valid test. Did you actually test with a client on LAN2 to a server on LAN1? And was the result the same with the reverse switch?
We need to determine if this is a hardware or a ruleset issue.
None of those top outputs look like a problem.
Steve
-
Yes the screenshots of top I posted above were taken while running an iperf server on LAN2 (server is running FreeBSD) and a client running windows. Yes the -R switch shows the same behavior (full 10G speeds if test initiated from client with -R).
I have done separate tests using LAN to pfsense to be sure the issue was replicated, and it was.
-
Right but just to be clear running with client on LAN1 and server on LAN2 is not the same as running in reverse with the client on LAN2 and server on LAN1. The states are opened in different directions so you would hit different rules in each case.
If all your rules are just pass all that shouldn't make any difference. So if it does that implies pf is doing something unexpected.Steve
-
Thank you again for all of your suggestions. I think I understand what you mean. I've run the tests both ways:
Client on LAN1 runs iperf -c while server on LAN2 runs iperf -s. Result is ~4.3 Gbps.
Client on LAN1 runs iperf -c -R while server on LAN2 runs iperf -s. Result is ~9.7 Gbps.
Server on LAN1 runs iperf -s while client on LAN2 runs iperf -c. Result is ~9.7 Gbps.
Server on LAN1 runs iperf -s while client on LAN2 runs iperf -c -R. Result is ~4.3 Gbps.
The behavior is the same even if disabling the firewall.
-
Ok, great so it is all about the traffic direction and not about the states or firewall all.
And the slow direction is with the X552 mostly receiving.
And just to be clear you can get full rate in both directions between those boxes if they are both on the same subnet? I think you said above they could but just to be sure.
Are you able to test between interfaces using the same NIC/driver type?
I would bet between two x700 (ixl) NICs you will see full rate. If so we can dig into why ix is suddenly slower there.
Steve
-
I can get full rate in both directions on the windows client and freebsd (freenas) server if they are on the same subnet (I move the windows client from LAN1 to LAN2 so both freenas server and client are on same subnet).
I've also tested by moving LAN1 from the x552 and putting it on an x710, so that both LAN1 and LAN2 are on x710s. I actually get worse performance - about 4.3Gbps throughput on transmit from LAN1 to LAN2, but ~7.5 Gbps throughput on transmit from LAN2 to LAN1.
That leads me to think it isn't a issue with the x552 alone.
-
Mmm, indeed. Potentially an issue with the x710. Can you test both on x500 NICs?
-
It would be tough to reconfigure to use x552s on both LANs. To try to rule out a specific NIC, I did iperf tests between the x552 and pfesense and the x710 and pfsense. Both showed the same issue: ~2 Gbps from LAN as client to router as server, but 9.7 Gbps from router as client to LAN as server.
I understand your point that a test to the router isn't necessarily indicative of performance.
It seems like something to do maybe with queues not being used on the receive end, or something causing underutilization of CPU on receive?
This person had what looks like a similar issue but not sure if it was resolved:
https://forum.netgate.com/topic/131517/10gbps-performance-issue -
That thread is not likely related. It's using a much older version and you were seeing correct throughput in 2.5.2. And it's using the bxe driver which is very different to ix/ixl.
-
Hi Stephen,
I think your guess on the x710 may have been right. I enable RX flow control on the x710, per this thread: https://forum.netgate.com/topic/162333/intel-x710-issues.After enabling flow control, I can get full 10Gbps throughput on iperf from LAN1 (x552) to LAN2 (x710). Real world file transfer files from LAN1 to LAN2 are about 1 GB/s.
But, it is intermittent: every few seconds during iperf test I get a drop back down to 4.3 Gbps, and sometimes the throughput drops back to 4.3 Gbps for an extended time. I've attached an iperf graph.
Do you have any thoughts on what could be happening here?
-
Hmm, well often that is some sort of TCP windowing issue. However that is negotiated between the endpoints and that, I assume, has not changed since you were running 2.5.2?
-
The endpoint configurations have not changed since 2.5.2. And, the endpoints get 10Gbps both directions when on same segment. Do you think the fact that iperf between pfsense and endpoints shows 9.7Gbps from pfsense to endpoints but ~2 Gbps from endpoints to pfsense is an indicator of anything?
I'm not sure if this is helpful, but UDP tests with iperf show 10Gbps in both directions.
-
It shows that pfSense is a bad TCP endpoint but that is known. It's configured to be a router not a server. Testing to or from pfSense directly always gives poor results. It's a useful test to prove it's linked at >1Gbps but not much else.
It also shows the receive side is significantly more processor intensive.The UDP test shows zero packet loss at 10G line rate?
Steve
-
Packet loss per iperf UDP test is about 0.15% both ways.
Also, enabling flow control looked like a red herring, I'm back at a steady 4.3 Gbps. I also noticed the same thing when enabling powerD - seemed like the problem was fixed but then after a few hours the problem reappeared. Something to do with changing settings and rebooting?
I guess if you don't have any other thoughts I'll reconfigure to use the x552 for both LANs and see what happens -- I'm getting desperate!
-
Yeah, I would try to test using the x500 for both if you can.
If you simply reboot does it pass at full speed for some time?
Steve
-
A reboot doesn't change anything. But, I did notice with the powerD change, and then with the flow control change, throughput was restored for a short time. I've made tons of tunable changes with reboots that have had no impact on throughput.
I'll reconfigure the x500s and report back. Thanks again for helping me.
-
Steve - you were correct, moving both LANs to the x552s solved the problem. I now get 9.8 Gbps both ways on iperf single stream and real-world file transfers are at 10G speeds. I am very grateful for your help!
But, any guess as to why the x710 doesn't perform? I now need to use the x710 for WAN connections so I still have an interest in sorting this out.
-
Hmm, well it's almost certainly some change in the ixl driver.
What exact card are you using? How does it appear in
pciconf -lv
?Are you able to test a 2.7 snapshot? It's possible this has already been solved.
Steve
-
In pciconf is appears as "Ethernet Controller X710 for 10GbE SFP+"
I'll give 2.7 a try and report back.
-
Do you have the actual PCI IDs shown? It could be something specific to the chip.
-
We see the same odd performance on several NIC and its related to cache and writes on the hardware.
-
Hi Steve,
Here is the full output:ixl0@pci0:6:0:0: class=0x020000 card=0x02581374 chip=0x15728086 rev=0x02 hdr=0x00
vendor = 'Intel Corporation'
device = 'Ethernet Controller X710 for 10GbE SFP+'
class = network
subclass = ethernetixl1@pci0:6:0:1: class=0x020000 card=0x00001374 chip=0x15728086 rev=0x02 hdr=0x00
vendor = 'Intel Corporation'
device = 'Ethernet Controller X710 for 10GbE SFP+'
class = network
subclass = ethernet -
Thank you -- can you point me to any online discussion/thread where this is discussed so I can follow along?