Chelsio cxl0 (SFP link) going UP 5 minutes after booting
-
Hello,
I have a Chelsio T520 card, in a device running pfSense 2.4.1.
When the system boots up, the link on the SFP interface stays DOWN, then it goes UP 5 minutes after that.
If I disable the cxl0 interface, then enable it, it will not go UP immediately, but in a multiple of 5 minutes (after the last startup).So it seems there is a CRON running every 5 minutes after the boot is complete, that finally realize the cxl0 interface should be UP, and brings it back to life.
Few system logs below:
Nov 4 14:22:59 check_reload_status Reloading filter
Nov 4 14:22:59 check_reload_status rc.newwanip starting cxl0
Nov 4 14:22:59 php-fpm 72100 /rc.linkup: Hotplug event detected for INT1(opt1) static IP (X.X.X.X )
Nov 4 14:22:58 kernel cxl0: link state changed to UP
Nov 4 14:22:58 check_reload_status Linkup starting cxl0
…..
Nov 4 14:18:04 kernel ix0: link state changed to UP
Nov 4 14:18:04 check_reload_status Linkup starting ix0
Nov 4 14:18:03 check_reload_status Updating all dyndns
Nov 4 14:18:03 kernel done.
.....
Nov 4 14:18:00 kernel cxl0: link state changed to DOWN
Nov 4 14:18:00 check_reload_status Linkup starting cxl0Any idea to have an (almost) immediate UP on this interface ?
The SFP+ cable is passive (Twinax), plugged in a Cisco switch (SG350)Thank you !
Frederic -
Card and switch have firmware up to date?
-
Switch firmware : up to date.
Card firmware : I don't know how to check that, I will have a look.Thank you
Frederic -
I used FreeNAS to update Chelsio firmware per link:
https://forum.pfsense.org/index.php?topic=133677.msg735116#msg735116
-
Ok here is what is found :
- the firmware version can be checked with the following command : sysctl -n dev.t5nex.0.firmware_version
- my Chelsio firmware is 1.16.45.0, so it seems the correct version according to https://www.freebsd.org/releases/11.1R/relnotes.html
- the firmware can be automatically updated by FreeBSD if hw.cxgbe.fw_install is set to 1 (installation based on heuristics) or 2 (always install newer firmware). Default is 1.
There is a newer firmware for Linux (1.16.63.0), but not for FreeBSD.
-
Did you try a different cable?
Perhaps the port has not automatically configured to 1G speed. You may have to manually configure the port.
Hope this helps.
-
The device is currently far away, so I don't have an easy physical access.
Next steps for me will be :
- Try to set the speed manually on both sides (Chelsio NIC / Cisco switch)
- Replacing the SFP cable
- Trying with an older version of pfSense
-
Here is a follow up. I tried :
- Replacing the SFP+ Twinax cable, but still with a genuine CISCO cable (but another length and version)
- Plugging the other side of the Twinax cable in another SFP+ switch, to see if it's the Cisco that could be the problem
- Setting manual speed on pfSense for the interface
- Downgrading the pfSense version to 2.4 then 2.3.4
=> The probem persits.
Some additional information :
- When the link is UP, if I unplug / plug back the SFP connector on the Chelsio, the interface comes UP immediately. No delay.
- When the link is disabled / enabled (so it stays DOWN for a few minutes), if I go on the Cisco, disable the link the enable it immediately, the status on the pfSense goes UP immediately. Seems it reinitializes the negociation process, forcing the pfSense to react.
- If I plug a SFP to RJ45 module in the Chelsio card, then plug a RJ45 cable to a regular switch, there is no problem at all, the interface always comes UP immediately after a startup, or being deactivated / activated.
It really sounds like a driver / firmware issue. I think it could be a good idea to try with a Chelsio or another brand for the Twinax cable, but I don't have any right now. Usually Twinax cables are pretty compatible across brands.
Could someone try to reproduce that issue ?
This could be done on a XG-1540 or XG-1541, or any device with the Chelsio T520 running on.Thank you !
-
I would give the version 2.4.0 a try out here. In 2.4.1 some VLAN related failures are in the game if the VLANs
are on the PPP interface at the WAN port, could this matching to your problem too?In normal you could also have a look on the NIC tunings for pfSense and perhaps it is something that is
related to that stuff shown right there.Perhaps to much num.queues and/or to big or to small mbuf size entries, this could be also a nice and unseen
problem, you can fairly try out to adjust that to num.queue=1, 2 or 4 and mbuf size scaling up from 125000 to
1000000 or vice versa like 125000 to 65000 and more down! In normal this is one of the best cards for
using it together with FreeBSD and pfSense too! Might be that the Cisco Switch is not doing well as he is
perhaps not 10 GbE capable and there will be a link speed miss match. -
I tried downgrading to 2.4 and 2.3.4 with no success.
The switch is a Cisco SG350XG-24F, so it has only 10Gb ports. When then interface finally goes UP, it synchronizes as 10Gb with no problem.
I cannot try immediately your suggestions about mbuf and num.queue, as the device has to go back to production (I will use RJ45 links instead at the moment), but if I can I will of course try this !