Kernel lockup
-
Had a kernel lockup this morning, extract from logs below (note log is set to have newest line on top)
Jan 4 10:46:17 kernel Fatal trap 12: page fault while in kernel mode Jan 4 10:46:17 kernel kernel trap 12 with interrupts disabled Jan 4 10:46:17 kernel current process = 97781 (sh) Jan 4 10:46:17 kernel processor eflags = resume, IOPL = 0 Jan 4 10:46:17 kernel = DPL 0, pres 1, long 1, def32 0, gran 1 Jan 4 10:46:17 kernel code segment = base 0x0, limit 0xfffff, type 0x1b Jan 4 10:46:17 kernel frame pointer = 0x28:0xfffffe01181b2480 Jan 4 10:46:17 kernel stack pointer = 0x28:0xfffffe01181b2450 Jan 4 10:46:17 kernel instruction pointer = 0x20:0xffffffff80cb908c Jan 4 10:46:17 kernel fault code = supervisor read data, page not present Jan 4 10:46:17 kernel fault virtual address = 0x30 Jan 4 10:46:17 kernel cpuid = 0; apic id = 00
Had no lockups on 2.2 but I also havent added load to the box until the past week.
I undid some tunables I set for the nic in loader.conf.custom, and disabled MSIX. Will see if it stabilises, if I get more issues, I will test the ram.
If needed I can swap the ram as its the same exact sodimm that I had put in my laptop last year, so I can remove from laptop to put in the pfsense unit if I find the ram faulty.
-
Was there no other data from the crash? No crash report or backtrace?
There is not enough information in those log messages to make any guesses about a cause.
-
Hi Jimp
Did you have any thoughts about the one I had and posted which was after the vpn fix.
Thanks
-
jimp where do I look for the backtrace? I will check /var/crash if its same as default FreeBSD behaviour.
right I read this
https://doc.pfsense.org/index.php/Unexpected_Reboot_Troubleshooting
yeah, I did disable the swap partition (it does exist tho), so I will enable it again just in case I get another panic.
-
The firewall should present a box on top of the dashboard with a link to see the crash report if it was able to collect one.
-
edited my post, sorry I edited as you replied.
-
Well it looks like you are the second person having this issue.
I posted a thread earlier and got NO reply.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213856
That is the bug and I have sent multiple crash dumps already.
-
Just because you have a "Fatal Trap 12" doesn't mean it's the same bug. The stack trace doesn't look anywhere near the same.
-
Well it looks like you are the second person having this issue.
I posted a thread earlier and got NO reply.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213856
That is the bug and I have sent multiple crash dumps already.
panic can be caused by many things, this is why I didnt jump in claiming its a bug, as could be hardware or software caused.
My fault I had disabled the swap partition (I had configured swap file instead for SSD trim reasons). So the lack of backtrace is because of that.FreeBSD historically has needed issues resolving in a .0 release, this could be one of those issues, but it could be hardware as well.
Also you posted directly to FreeBSD developers not on pfsense bug report.
-
no more panic's but given the number of people who reported panic's on FreeBSD 11 reporting it been fixed with the igb queue count set to 1 or 2, I adjusted it to 2 for safety. I have enabled MSIX again.
-
no more panic's but given the number of people who reported panic's on FreeBSD 11 reporting it been fixed with the igb queue count set to 1 or 2, I adjusted it to 2 for safety. I have enabled MSIX again.
What "it" been fixed? Is there some ticket on freebsd community regarding this problem?
-
yeah there is a ticket but I dont have the url to hand sorry, everyone who was on the ticket reported panics been fixed with setting it to 1.
Only 1 guy said it was also stable on 2 tho and he posted that on here somewhere. The default is to match core count so in my case default is 4.
hw.igb.num_queues=2
- 21 days later
-
I reinstalled pfSense 2.4 2 days ago using zfs, I made sure I have the loader.conf.local in place to reduce igb queues to 1 but I have had a panic late last night, again no dump was made so no backtrace, this time tho a swap partition was enabled.
Will scan my ram.
Will disable adaptive mode for cpu so it stays at one speed only as cpu clock fluctuations can cause instabilities.
Will check bios to see if anything in there that could cause compatibility issues.I do notice on bootup that the boot up log says its not using any dump device, so I will try manually specifying it in rc.conf.local to see if it reports in the boot log one is set.
I wonder if it may be a good idea to track 11-STABLE instead of 11.0 as .0 releases of FreeBSD can be problematic.
-
I wonder if it may be a good idea to track 11-STABLE instead of 11.0 as .0 releases of FreeBSD can be problematic.
The stable branch is a much more moving target than the releng branch, it would be hell a lot more work to keep up with the changes.
-
Have done the 2 ram tests I planned.
Also have the dump device correctly showing during the boot process now so if I have another panic I should get a dump to backtrace.
I have disabled turbo mode and EIST on the cpu and dynamic voltage on the ram.
-
-
That is unacceptable usage. You must say "What hardware are you running pfSense software on?"
;D ;D ::) ::)
SCNR. Won't further comment on this new round of legal bleep. I hoped the unfortunate episode ~2+ years ago was enough of a lesson. Apparently not.
-
chrcoluk
Did you add kern.ipc.nmbclusters="1000000"?
What is your WAN bandwidth? -
I dont know the exact hardware packaged as I purchased a compact unit from amazon which is simply labeled as a BRASWELL N3150 unit, which tells you I have a celeron N3150 on a braswell chipset.
I think I am going to adjust the bios again to enable EIST and turbo mode but also configure powerd to never let the clocks drop below the stock max speed that means I get turbo mode back again for extra performance whilst not letting it drop to idle voltages, I will keep the dynamic voltages disabled on the ram.
Also nmbclusters has been increased in the loader.conf from almost day one of using pfsense.
WAN bandwidth at the time of the crash was just idle, the absolute max bandwidth usage possible is about 70mbit down and 20mbit up.
-
Ok, try to bump voltage on memory, +0.1V