[SOLVED] 2.4 + i340 = crash?

LucaTo

fresh 2.4 install on:
AM1-MA , 5350 APU, 8Gb RAM, SSD 120Gb, i340-T4 (4 ports igb0-3), i210-T1 (igb4)
randomly crashes (1-2 hours) and reboot (even if almost no traffic)

in log I can't found any useful info….

maybe could be related to this?....
https://redmine.pfsense.org/issues/7149

note: same hardware showed no issue at all (in about 1 month of test) using 2.3.3

Guest

Hello, did you try out the following things that may help you out here?

Activate PowerD (hi adaptive)
On some systems, the cpu clock speed is pinned and not scaling up and down if more horse power is needed
high up the mbuf size to 1000000
If enough RAM is available you can free the limited kernel space to the RAM
short down the num_queue to 1 (hw.igb.num_queues="1")
It can be that for each cpu core and each lan port a queue is created, that means here, 4 core x 6 lan ports = 24 queues in total
perhaps to much for the system?

LucaTo

@BlueKobold:

Hello, did you try out the following things that may help you out here?

Activate PowerD (hi adaptive)
On some systems, the cpu clock speed is pinned and not scaling up and down if more horse power is needed

high up the mbuf size to 1000000
If enough RAM is available you can free the limited kernel space to the RAM

short down the num_queue to 1 (hw.igb.num_queues="1")
It can be that for each cpu core and each lan port a queue is created, that means here, 4 core x 6 lan ports = 24 queues in total
perhaps to much for the system?

Hi BlueKobold and thanks for your reply,
yes, I already tuned that parameters, here my loader.conf.local file:


kern.ipc.nmbclusters="1000000"
hw.igb.num_queues="1"

the only difference is that if I remove hw.igb.num_queues setting then system crashes almost immediately (1-2 minutes or so)…
Note that, a part of crashes, system is very reactive (push 800-900 Mbps between interfaces),
the problem seems to be not related with network load ???

Guest

Hi again LucaTo,

you may also be able to adjust or play with that numbers to comes nearly your optimum settings that is matching your hardware!
For many other it may work well to set this standard numbers, but in some cases it could be really nice to set the mbuf size to a
equal number likes 65000 (together with 10GbE lan ports) and so you should test more and more to come closer to a not failing
system! This I s not a set it up and forget it tuning also likes the PowerD (hi adaptive) settings can be adjust to PowerD (maximum)
and then your system will be running smooth and liquid and so on and so on.

PowerD (high adaptive)

kern.ipc.nmbclusters="1000000"
hw.igb.num_queues="4"

PowerD (high adaptive)

kern.ipc.nmbclusters="1000000"
hw.igb.num_queues="8"

PowerD (high adaptive)

kern.ipc.nmbclusters="250000"
hw.igb.num_queues="4"

PowerD (high adaptive)

kern.ipc.nmbclusters="250000"
hw.igb.num_queues="8"

LucaTo

after many many test, trying and retrying…
seem that the most stable combination is:
kern.ipc.nmbclusters="1000000"
and no hw.igb.num_queues (that mean hw.igb.num_queues automatic if I'm right)
but... even in this way I notice 2 crash and reboot in the last 12h...

I hope the situation will stabilized in the future (If the issue is related with some unknown bugs in current developing 2.4 or in the freebsd release)

Guest

after many many test, trying and retrying…
seem that the most stable combination is:
kern.ipc.nmbclusters="1000000"

Would be also my favorite way at this time only for some rarely 10GbE NICs will work 65000 better.

and no hw.igb.num_queues (that mean hw.igb.num_queues automatic if I'm right)
but… even in this way I notice 2 crash and reboot in the last 12h...

For automatically setting up the num.queues you should configure for each driver that is used likes for igb, ix and so on the
following inside of the loader.conf and please also create a loader.conf.local file to survive a reboot and the default number
for each driver might be also varying!

hw.igb.num_queues="0" which means that the igb driver configures the number of queues automatically

hw.igb.num_queues="0" is the setting for the igb driver (default)
hw.ix.num_queues="8" is the setting for the ix driver (default)

So please check at first which drivers are in use for your both NICs one with 2 port and one with 4 ports.

Important: Reboot after this change, then verify:

[2.4.0-BETA][admin@fw]/boot: sysctl -a | grep num_queue
vfs.aio.num_queue_count: 0
hw.ix.num_queues: 1
hw.igb.num_queues: 1

Nearly the same situation likes yours:
I have 6 igb (Intel pro 1000) interfaces (4 on the asus mainboard and 2 on an intel 2-port nic).
The box randomly crashed without a trace. In many cases, there were no information what happened
and I had to hard reset the box via ipmi.

But with a total other solution to solve it out
Reproducing this behavior was easy. I just had to run a speedtest of my wan connection to trigger the crash / hang.
As soon as I added the queues=1 to the loader.conf.local the crash was no longer reproducible.

LucaTo

at the end…. I see the light :-)
This is my stable configuration (no more crashes in the last 48h)

kern.ipc.nmbclusters="1000000"
hw.igb.num_queues="0"
hw.igb.rxd=4096
hw.igb.txd=4096

in System/Advanced/Networking:
"Hardware Large Receive Offloading" checked (others checkbox deselected)

In this way I can reach almost 1Gbps bethween nics i340-t4 ports, seeing cpu load reach around 30% and no warnings/errors in log :-)