Multicore forwarding
-
@oudenos What is the system setup? How many sockets, where the NIC is located, clock speed, core counts, memory amount and layout, 1 dimm, 6 dimms, memory clock etc etc.
Was your testing of the Intel X710 in the same system or something else?
-
@derelict Ok so I run TNSR inside KVM with PCIe device passthrough for the NIC. The hypervisor itself is 2 sockets NUMA, but I allocated 8 cores from the same NUMA node and 8 GB or RAM to the VM.
The Intel X710 runs inside an identical system on another hypervisor.
-
Did you allocate the cores on the same NUMA as the CX4 resides?
The PCI slots are connected to one of the two CPUs controllers directly. If the NIC is on NUMA1 and TNSR is running on NUMA0, then all PCIe requests have to go from Socket 0, to Socket 1, then to the NIC and back. That will put the hurt on performance.
-
@derelict How can I check that?
-
@oudenos Something like this might help you map your system:
apt-get update
apt-get install hwloc
lstopo --output-format png > ~tnsr/lstopo.png
Then scp that image off and view it with your preferred method.
-
@derelict Thank you for your help and sorry for the delay.
As you correctly pointed out, the NIC is owned by the "wrong" CPU. However the same happens with the Intel one. Now I'm getting inconsistent results across reboots which lead me to think some sort of receive side scaling is involved. I'm working to set up a more realistic testbed with T-Rex as traffic generator and will also address NUMA pinning. I will get back to you as soon as I have results worth sharing with the community. -
I'm following the thread
I'm looking forward to the results
good work -
@lukecage unfortunately, there is not much to share.
I noticed that changing the number of queues at runtime often requires to reboot both the VM (TNSR) and the hypervisor (Ubuntu 20.04 + KVM) to work properly, otherwise many packets will get lost. This happens with both the Mellanox and the Intel, though the first one seems to be "more affected". No idea why, I suspect the only way to correctly reinitialize the NIC is to power-cycle it, maybe I did something wrong with KVM and stuff. (*)
Also, I'm pretty sure both NICs use Receive Side Scaling (RSS) and I had to change my traffic generator to TRex in order to have more entropy. I tried this on an Intel E810 card @ 100 Gbps, but it doesn't use scale to more than one CPU core. Again, I believe some sort queue-ish thing went wrong, perhaps I should try with the latest versions of VPP, DPDK, driver and firmware, but it requires building a TNSR-like distro from scratch afaik.(*) If someone is willing to give it a try, it may be worth starting with bare metal rather than KVM: DPDK uses hugepages and KVM by default only supports 2MB hugepages, not 1GB.
Eventually, I dediced to give up on this for the moment. It's a shame, but I don't have enough time do all the tests.
Thank you very much to the community and @Derelict for the help they provided.
-
@oudenos which cpu you using ?
my goal is 38m pps on gre tunel (i cant test more because my upstream dont allowed)
via interface 98m pps
and why do you need to run multiple cores, tnsr performance is fine
-
@lukecage Intel Xeon Silver 4210R
With proper NUMA pinning I can achieve 5.5 Mpps IP forwarding per core. Splitting in multiples queues should enable multicore processing.How did you reach those numbers without tuning??
-
I use it directly as bare metal
not via vmware or any virtualizationAs a result, you are positioning a router, if you need high capacity, you should install it tnsr directly to the server.
my hardware specs;
i9-9900k
32gb ram
240gb ssduptime
14:51:44 up 69 days, 5:37, 2 users, load average: 1.00, 1.00, 1.00I was using tnsr centos version before, as it is understood from uptime, I switched to ubuntu on that date and now I am using it in ubuntu without any problems.
-
@lukecage Please run the following and post the result
dataplane shell sudo vppctl show hardware-interfaces
-
@oudenos check private message