Super Micro C2758 crashes

athurdent

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x78
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80e33224
stack pointer = 0x28:0xfffffe01ed0977e0
frame pointer = 0x28:0xfffffe01ed097860
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (irq269: igb1:que 3)

I have uploaded the crash info, this is the second time it happened in the last 24 hours.
Uploading IP ends with .143.1.27

Edit: kern.ipc.nmbclusters="1000000" has been configured

chrismacmahon

looking at both crash reports (yesterday/today), you are having different call outs.

Potentially add: hw.igb.num_queues=1 in loader.conf

Hope that helps.

athurdent

Thanks, I have added this and rebooted. Will report back.

w0w

If "safe" hw.igb.num_queues=1 works for for you, then you can try
hw.igb.num_queues=2
C2758 is 8 core CPU and you have 4 intel ports, according to freebsd tuning guide https://calomel.org/freebsd_network_tuning.html

athurdent

The board has been stable, so I modified the loader conf to use 2 queues. While there I also installed the latest snapshot and rebooted.
On Sunday I upgraded to BIOS 1.1a, the board was on 1.1 before, forgot to mention that. Don't know if anything relevant has changed with this BIOS, as I cannot find any changelog / release notes from Supermicro.

athurdent

OK, no luck with 2 queues.

I can reboot the board with

hping3 -c 100 -d 120 -S -w 64 -p 443 --flood 192.168.x.10

from LAN to my DMZ (produces about 130000 states).
I just send in the Crash Report for that.

With only one queue I can even use "–rand-source" until the state table is full, no reboot.

w0w

I'll do the test on similar hardware later this week.

athurdent

Thanks w0w, looking forward to your results.

BTW the hping3 crash looked different:

panic: bpf_mcopy
cpuid = 6
KDB: enter: panic

Edit: Just found this on Reddit, so I guess only one queue per interface should be OK for everyone just passing traffic trough?

https://www.reddit.com/r/PFSENSE/comments/5obhlm/what_are_the_ramifications_of_less_nic_queues/dci77px/

w0w

Tested, but I could not replicate the issue.
How much physical RAM do you have installed?
Do you have any custom tunables enabled, other than you have provided in this topic?
Below you will see network stack tunes I have enabled:
#some magic numbers :)
kern.ipc.nmbjumbo9="20000"
kern.ipc.nmbclusters="1000000"
kern.ipc.maxsockbuf="256000000"
#some more igb tune for GIG links
hw.igb.rxd="4096"
hw.igb.txd="4096"
net.inet.tcp.syncache.hashsize=1024
net.inet.tcp.syncache.bucketlimit=100
net.isr.defaultqlimit=4096
net.link.ifqmaxlen=10240
hw.igb.rx_process_limit="-1"
hw.igb.num_queues=2
#disable flow control on all igb interfaces
dev.igb.0.fc=0
dev.igb.1.fc=0
dev.igb.2.fc=0
dev.igb.3.fc=0
I'll do some test with all tunables disabled and will try to utilize all bandwidth I have on WAN…

athurdent

Thanks for looking into this. I tend to stick with the defaults until I hit a problem. So I only have nmbclusters defined. I have also disabled Hardware Checksum Offloading, can't remember why, though.
Amongst others I'm using CARP, VLANs, the on-board SNMP and Snort. Those could also be resonsible for problems I guess.

chrcoluk

did you try with 1 igb queue? if not please test with it.

–edit--

I see you did, thanks for reporting the findings, it does seem igb currently on FreeBSD 11 has issues with multi queue.

athurdent

@chrcoluk:

I see you did, thanks for reporting the findings, it does seem igb currently on FreeBSD 11 has issues with multi queue.

Thanks, do you have any FreeBSD Bugtracker or forum reference for this?

chrcoluk

sadly I did not bookmark it but will have a look later and if I find it will post here.

w0w

Tested with states over 196015, but nothing. But I have found that what I am doing is a little bit wrong. The problem is that my testing machine have PPPoE link enabled on WAN testing interface and that means that WAN uses only one queue by freebsd design.
Sorry, I can not test it in other way, but it looks like igb(4) drivers are not so good, but at least are not so bad as stock realtek freebsd drivers;)
I did not researched yet, but there are some custom patches can be found over freebsd community.
But if your SuperMicro C2758 doing its job and you have not performance issues than just leave it alone :)

chrcoluk

The problem seems to be triggered by one of these two things or perhaps even both.

FreeBSD 11 changes ref igb driver.

RSS awareness has been added to the igb(4) driver (r268028)
Automatic disabling of multi queue if ALTQ is enabled in kernel.

The latter I think is the most likely as it seems its not fully disabling multi queue, as evident by the fact the loader tunable is defaulting to match number of cpu cores, and then the problem goes away when its forced to a 1 value.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212413

That problem report also links to another couple as well.

athurdent

@chrcoluk: Thank you!

athurdent

Might be fixed now: https://redmine.pfsense.org/issues/7149