How to get older version of TNSR (20.02.2) ?
-
@Derelict We used that for reference. We bought
mlx4 - for NICs based on the following Mellanox 10/40 Gigabit Ethernet controllers ConnectX-3, ConnectX-3 Pro
as it's in Recommended Components and it's not working in newest version of TNSR.
So we don't know if that document is updated to the newest TNSR version.
We spent few days to troubleshoot why it's not working. After that we found on the release notes in dataplane section note:
DPDK does not function with Mellanox ConnectX-3 drivers [3781]
{ we just think that this is why it's not working }
We don't want to buy another cards which maybe will not work.
If somebody can help us if that card is supported and of course will be we would appreciate it.
Thank you.
-
I understand that and have already brought up that inconsistency internally.
Your options right now are to use another card or wait for that to be fixed.
-
@Derelict Thank you for your time. We will order the Intel cards and we will see.
-
@derelict said in How to get older version of TNSR (20.02.2) ?:
I understand that and have already brought up that inconsistency internally.
I haven't been able to get ConnectX-3 (MCX311A-XCA) nor ConnectX-5 (MCX512A-ACAT) to work in TNSR version 'tnsr-v20.10.1-2'.
For ConnectX-5, I tried both the firmware version the docs say to use and the latest version as advised by forums.
There's not much debuggability, but I did catch these errors in /var/log/messages:
Dec 30 23:35:25 pinetree systemd[1]: Started Vector Packet Processing Process. Dec 30 23:35:25 pinetree vpp[1658]: net_mlx4: cannot load glue library: libmlx4.so.1: cannot open shared object file: No such file or directory Dec 30 23:35:25 pinetree vpp[1658]: net_mlx4: cannot initialize PMD due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx4) Dec 30 23:35:25 pinetree vpp[1658]: common_mlx5: Cannot load glue library: libmlx5.so.1: cannot open shared object file: No such file or directory Dec 30 23:35:25 pinetree vpp[1658]: common_mlx5: Cannot initialize MLX5 common due to missing run-time dependency on rdma-core libraries (libibverbs, libmlx5)
However I don't know if this problem got fixed in the mean time. There are many other things that can go wrong (Mellanox firmware version, mlxconfig settings, virtualization PCI pass-through, tnsr host ownership of the card).
I would second jozefrebjak's "We don't want to buy another cards which maybe will not work," because . . . I have already bought another card, and it didn't work.
I see the documentation this thread points to still recommends both Mellanox cards, ConnectX-3 and ConnectX-5.
What's the current status?
-
@carton
dnf install rdma-core libibverbs
-
@derelict yeah, feels close, but I couldn't get past this:
[carton@pinetree ~]$ sudo dnf install libibverbs [...] Error: Problem: package libibverbs-29.0-3.el8.x86_64 requires rdma-core(x86-64) = 29.0-3.el8, but none of the providers can be installed - cannot install the best candidate for the job - package rdma-core-29.0-3.el8.x86_64 is filtered out by modular filtering (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
"What is modular filtering" and "which module is filtering it and why" are probably questions that can be answered, but after spending half a day I can't answer them. looks like a terrible design on RedHat's part imho.
BTW I can't install OFED in the host, either. (not that that would be a good idea, but just for reference because the error's the same. . .) It complains gcc-gfortran is blocked by modular filtering.
I installed a stock Centos 8 system (to get OFED, just to see if it could resurrect the cards), and the "modular filtering" problems are not present there:
[carton@mlnx ~]$ sudo dnf install rdma-core libibverbs [sudo] password for carton: Last metadata expiration check: 0:31:07 ago on Thu 31 Dec 2020 08:12:55 AM GMT. Package mlnx-ofa_kernel-4.9-OFED.4.9.2.2.4.1.rhel8u2.x86_64 is already installed. Package libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64 is already installed. Dependencies resolved. Nothing to do. Complete!
so I assumed it was something tnsr did deliberately and gave up early.
-
@carton
unsupported but if you want to trysudo dnf install wget wget http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/rdma-core-29.0-3.el8.x86_64.rpm sudo rpm --install rdma-core-29.0-3.el8.x86_64.rpm sudo dnf install libibverbs
if you want to install it with dnf you probably need to reset BaseOS with
sudo dnf module reset BaseOS
this will unlock alot of stuff, it's probably dangerous, after that you should be able to install it with
sudo dnf install libibverbs rdma-core
idk why it's filtered, maybe there are problems with tnsr+rdma-core
-
@kiokoman said in How to get older version of TNSR (20.02.2) ?:
sudo dnf module reset BaseOS sudo dnf install libibverbs rdma-core
Thanks! That got rid of the libmlx{4,5}.so.1 errors in /var/log/messages, and I don't see any new errors.
sadly ConnectX-5 still doesn't work for me, though.
-
This post is deleted! -
I got tnsr-v20.10.1-2 working within virtualization with Intel XXV710-DA2. This still faced some challenges:
- upgrading firmware to 7.30 as suggested to match tnsr's DPDK version here wouldn't work from within virtualization. I had to put the card in a bare hardware machine or else ./nvmupdate64e found the card but showed "Access Error" on the RHS of the card table.
- After downloading Intel's firmware tool, a.k.a. "NVM", for old revision 7.30, the tool refused to touch the card at first, "no update available" while showing version "6.128(6.80)." The documented versions of intel i40e firmware seem to correspond to the "hex" version in parenthesis shown in nvmupdate64e, yet as is their typical style they needlessly show both. stfw showed there are a lot of OEM cards that Intel tries to force you to the OEM's payware service plans for updates, but closely reading Intel's docs 4.0, 'ethtool -i <device>' reveals the "EtrackID" as the second field of "firmware-revision", a hex number like 0x8000xxxx. Adding this number to the REPLACES: field of nvmupdate.cfg of a similar card (good luck!) will force the update to go through anyway.
- Intel's MAC is picky about SFP+ modules. A module with Cisco srom worked. A Dell module that works fine in ConnectX-3 En didn't work. Updated 7.30 firmware printed a dmesg warning about disliking the module on each insertion, but older 6.80 firmware silently showed link down, IIRC.
For me, my bias/impression that the Intel parts would be overcomplicated and buggy wrt Mellanox was confirmed. There could be something subtly wrong with my virtualization config, or something I can't even think of, blocking the ConnectX-3 and ConnectX-5 from working, but partially arguing against that at least I can confirm Intel XXV710-DA2 works with TNSR in a controlled situation where Mellanox parts don't.