Baffling pfSense 2.6.0 Issue (10G Performance)
-
Yes the screenshots of top I posted above were taken while running an iperf server on LAN2 (server is running FreeBSD) and a client running windows. Yes the -R switch shows the same behavior (full 10G speeds if test initiated from client with -R).
I have done separate tests using LAN to pfsense to be sure the issue was replicated, and it was.
-
Right but just to be clear running with client on LAN1 and server on LAN2 is not the same as running in reverse with the client on LAN2 and server on LAN1. The states are opened in different directions so you would hit different rules in each case.
If all your rules are just pass all that shouldn't make any difference. So if it does that implies pf is doing something unexpected.Steve
-
Thank you again for all of your suggestions. I think I understand what you mean. I've run the tests both ways:
Client on LAN1 runs iperf -c while server on LAN2 runs iperf -s. Result is ~4.3 Gbps.
Client on LAN1 runs iperf -c -R while server on LAN2 runs iperf -s. Result is ~9.7 Gbps.
Server on LAN1 runs iperf -s while client on LAN2 runs iperf -c. Result is ~9.7 Gbps.
Server on LAN1 runs iperf -s while client on LAN2 runs iperf -c -R. Result is ~4.3 Gbps.
The behavior is the same even if disabling the firewall.
-
Ok, great so it is all about the traffic direction and not about the states or firewall all.
And the slow direction is with the X552 mostly receiving.
And just to be clear you can get full rate in both directions between those boxes if they are both on the same subnet? I think you said above they could but just to be sure.
Are you able to test between interfaces using the same NIC/driver type?
I would bet between two x700 (ixl) NICs you will see full rate. If so we can dig into why ix is suddenly slower there.
Steve
-
I can get full rate in both directions on the windows client and freebsd (freenas) server if they are on the same subnet (I move the windows client from LAN1 to LAN2 so both freenas server and client are on same subnet).
I've also tested by moving LAN1 from the x552 and putting it on an x710, so that both LAN1 and LAN2 are on x710s. I actually get worse performance - about 4.3Gbps throughput on transmit from LAN1 to LAN2, but ~7.5 Gbps throughput on transmit from LAN2 to LAN1.
That leads me to think it isn't a issue with the x552 alone.
-
Mmm, indeed. Potentially an issue with the x710. Can you test both on x500 NICs?
-
It would be tough to reconfigure to use x552s on both LANs. To try to rule out a specific NIC, I did iperf tests between the x552 and pfesense and the x710 and pfsense. Both showed the same issue: ~2 Gbps from LAN as client to router as server, but 9.7 Gbps from router as client to LAN as server.
I understand your point that a test to the router isn't necessarily indicative of performance.
It seems like something to do maybe with queues not being used on the receive end, or something causing underutilization of CPU on receive?
This person had what looks like a similar issue but not sure if it was resolved:
https://forum.netgate.com/topic/131517/10gbps-performance-issue -
That thread is not likely related. It's using a much older version and you were seeing correct throughput in 2.5.2. And it's using the bxe driver which is very different to ix/ixl.
-
Hi Stephen,
I think your guess on the x710 may have been right. I enable RX flow control on the x710, per this thread: https://forum.netgate.com/topic/162333/intel-x710-issues.After enabling flow control, I can get full 10Gbps throughput on iperf from LAN1 (x552) to LAN2 (x710). Real world file transfer files from LAN1 to LAN2 are about 1 GB/s.
But, it is intermittent: every few seconds during iperf test I get a drop back down to 4.3 Gbps, and sometimes the throughput drops back to 4.3 Gbps for an extended time. I've attached an iperf graph.
Do you have any thoughts on what could be happening here?
-
Hmm, well often that is some sort of TCP windowing issue. However that is negotiated between the endpoints and that, I assume, has not changed since you were running 2.5.2?
-
The endpoint configurations have not changed since 2.5.2. And, the endpoints get 10Gbps both directions when on same segment. Do you think the fact that iperf between pfsense and endpoints shows 9.7Gbps from pfsense to endpoints but ~2 Gbps from endpoints to pfsense is an indicator of anything?
I'm not sure if this is helpful, but UDP tests with iperf show 10Gbps in both directions.
-
It shows that pfSense is a bad TCP endpoint but that is known. It's configured to be a router not a server. Testing to or from pfSense directly always gives poor results. It's a useful test to prove it's linked at >1Gbps but not much else.
It also shows the receive side is significantly more processor intensive.The UDP test shows zero packet loss at 10G line rate?
Steve
-
Packet loss per iperf UDP test is about 0.15% both ways.
Also, enabling flow control looked like a red herring, I'm back at a steady 4.3 Gbps. I also noticed the same thing when enabling powerD - seemed like the problem was fixed but then after a few hours the problem reappeared. Something to do with changing settings and rebooting?
I guess if you don't have any other thoughts I'll reconfigure to use the x552 for both LANs and see what happens -- I'm getting desperate!
-
Yeah, I would try to test using the x500 for both if you can.
If you simply reboot does it pass at full speed for some time?
Steve
-
A reboot doesn't change anything. But, I did notice with the powerD change, and then with the flow control change, throughput was restored for a short time. I've made tons of tunable changes with reboots that have had no impact on throughput.
I'll reconfigure the x500s and report back. Thanks again for helping me.
-
Steve - you were correct, moving both LANs to the x552s solved the problem. I now get 9.8 Gbps both ways on iperf single stream and real-world file transfers are at 10G speeds. I am very grateful for your help!
But, any guess as to why the x710 doesn't perform? I now need to use the x710 for WAN connections so I still have an interest in sorting this out.
-
Hmm, well it's almost certainly some change in the ixl driver.
What exact card are you using? How does it appear in
pciconf -lv
?Are you able to test a 2.7 snapshot? It's possible this has already been solved.
Steve
-
In pciconf is appears as "Ethernet Controller X710 for 10GbE SFP+"
I'll give 2.7 a try and report back.
-
Do you have the actual PCI IDs shown? It could be something specific to the chip.
-
We see the same odd performance on several NIC and its related to cache and writes on the hardware.