Chelsio T520-SO-CR stops running after a short time and console logs “firmware reports adapter error: insufficient airflow”
-
If you add the 'Thermal Sensors' widget to the dashboard it will show you the NIC temperature.
Ignore the fact it will show as a red bar that is expecting a CPU temp and the Chelsio Cards run hot in normal conditions. The sensor is in the NIC processor and it's not unusual to see >90C. But if it's shutting down it may be even higher.
It should have an additional fan in the C2758 specifically to keep the NIC cool, it may have failed.
Steve
-
Thanks for your reply.
I will do that, the temperature monitor, but, it does have the additional 40mm fan, and, it's working well.
-
Hmm, at what point do you actually see that error? Is it possible to grab a picture of it?
Steve
-
This picture is following a warm reboot once the 10GbE adapter stops working.
-
@nzkiwi68 said in Chelsio T520-SO-CR stops running after a short time and console logs “firmware reports adapter error: insufficient airflow”:
firmware reports adapter error: insufficient airflow
Hmm well it certainly seems to be from the cxgbe driver so check the reported temps first.
https://github.com/pfsense/FreeBSD-src/blob/dd2966ccebbf44dee773537c980f53b16045c1c3/sys/dev/cxgbe/common/t4_hw.c#L207Steve
-
Yes, it seems to be overheating, despite the correct and working additional high airflow 40mm cooling fan.
So, if it's overheating, what do think, faulty adapter?
-
Just out of curiosity, what is the ambient temperature and environment like?
-
@spoggle Air conditioned server room, a/c set for 20 deg C
-
Hard to say, I don't think we've seen that before. Have to see what the temps it's actually reporting are. Compare that with the NIC in the other node.
Steve
-
if it's the temperature and if it isn't under warranty i will check the heatsink if it's well sitted and/or i will try to change the thermal paste
-
@kiokoman Thanks!
That's a very good tip. I'll be sure to try that too.
-
@stephenw10 Thanks for your help btw.
I have ordered a replacement NIC card (even if it just ends up as a spare) so I'll post once the new card arrives and we figure out the fault.
-
Were you able to get any temperature readings from the card? You can read it directly from the command line via the sysctls if that's easier for you.
Steve
-
@stephenw10 That C2758 is currently shut down and the customer is running on the backup HA C2758.
I've ordered a new T520 and when that arrives next week, one evening I'll have a look the temperature etc outside of production hours.
I will post my findings.
Thanks again for all your help!
-
Replaced the Chelsio T520 card and it's now stable.
Was too busy to and time constrained during the shutdown window to try the old card, measure the temperature etc, but, the old card is certainly faulty. The heatsink seems well attached, etc.
Anyway, the replacement card solved the problem, the C2758 is now stable.
Thanks everyone for your helpful posts.