Chelsio card becoming unresponsive on PFSense 2.2.6
-
Hi,
We have tried Chelsio T520 series cards in our PFSense firewall. The first was a T520-LL-CR, the second was a T520-CR. Both cards exhibit the same problem. The second card was the very model that PFSense offers for sale on their store. And, from what I've read, PFSense appliances use Chelsio 10G cards without incident. So, naturally I'm confused when the cards we installed in our hardware become unresponsive, and ultimately require us to reboot our firewall to bring it back up. Specifically, we get the following error in the system logs….
Jun 6 15:02:55 kernel: [zone: mbuf] kern.ipc.nmbufs limit reached
… Other than this message, the only thing we have noticed is an increase in interrupts in the RRD Graph right about the same time. Research on this message indicates that some system tunables need to be modified. But, we've tried a few of the changes mentioned in this forum and still the problem occurs. It happens about every 24 hours. Once the previous message occurs, everything goes down hill and a reboot is required. Can anyone offer any suggestions as to how to resolve this problem? Any replies are appreciated. Thanks.
Eric
-
Jun 6 15:02:55 kernel: [zone: mbuf] kern.ipc.nmbufs limit reached
You should high up the mbuf size to 1000000 or 2000000 in your case, but only if you have enough RAM
inside otherwise you could be ending up in a so called booting loop due the fact of an to less amount of RAM.And please create a loader.conf.local file and insert this changes the inside too.
pfSense NIC tuning guide -
Thanks for your reply. The nmbclusters is already at 1000000. We can try 2000000. The system has 16GB of RAM. But, somehow, I don't think it is the mbuf size, as the mbuf usage on the dashboard never goes much above 36k. The dashboard indicator usually says… 4% (36230/1000000). Regardless of what the error message says, I believe it may be a result of the initial problem, which could have been caused by the interrupt spikes we noticed in the RRD graph at about the time the error message occurred. If the interrupt spike caused the card to become unresponsive, then the mbuf limit would have eventually been reached. Regardless, we'll try the 2000000 setting. Any other suggestions besides the tuning guide would be helpful as we have tried most all of the tuning suggestions in it. Thanks.
Eric
-
Do you have any active cooling for the Chelsio cards?
Those units can run rather hot if you do not have forced airflow over them.
-
Yes. We do have fans keeping a good flow over the card. It stays within the operating temperature laid out in the card specs.
-
Can you share what are the tunings you've tried?
I believe they'd be mostly related to MSI/ MSIX but what else have you tried?
-
There's an mbuf leak under some circumstance in the Chelsio driver. Almost certain that's what you're running into given the description. It's been fixed in newer, not yet released OS versions. The workaround in the mean time, put the following in /boot/loader.conf.local
hw.cxgbe.allow_mbufs_in_cluster=0