Random kernel panic once or more a day :( fresh build 2.4.3
-
greetings everyone!
i used: pfSense-CE-memstick-2.4.3-RELEASE-amd64.img
to install on a fresh bare metal box: Asus Q170T CSM running latest BIOS vi UEFI mode
i5-6600T
32GB RAM
Samsung 830 SSDafter fresh install i'm getting a persistent crash, logs uploaded, looks like some sort of kernel panic around this:
Crash report details:No PHP errors found.
Filename: /var/crash/bounds
1Filename: /var/crash/info.0
Dump header from device: /dev/gptid/f90e4105-42e3-11e8-ae07-708bcdbdf2bc
Architecture: amd64
Architecture Version: 1
Dump Length: 72704
Blocksize: 512
Dumptime: Fri Apr 20 10:48:23 2018
Hostname: router.localdomain
Magic: FreeBSD Text Dump
Version String: FreeBSD 11.1-RELEASE-p7 #10 r313908+986837ba7e9(RELENG_2_4): Mon Mar 26 18:08:25 CDT 2018
root@buildbot2.netgate.com:/builder/ce-243/tmp/obj/builder/ce-243/tmp/FreeBSD-src/sys/pfSense
Panic String: spin lock held too long
Dump Parity: 4159356250
Bounds: 0
Dump Status: goodFilename: /var/crash/info.last
Dump header from device: /dev/gptid/f90e4105-42e3-11e8-ae07-708bcdbdf2bc
Architecture: amd64
Architecture Version: 1
Dump Length: 72704
Blocksize: 512
Dumptime: Fri Apr 20 10:48:23 2018
Hostname: router.localdomain
Magic: FreeBSD Text Dump
Version String: FreeBSD 11.1-RELEASE-p7 #10 r313908+986837ba7e9(RELENG_2_4): Mon Mar 26 18:08:25 CDT 2018
root@buildbot2.netgate.com:/builder/ce-243/tmp/obj/builder/ce-243/tmp/FreeBSD-src/sys/pfSense
Panic String: spin lock held too long
Dump Parity: 4159356250
Bounds: 0
Dump Status: good….
MCA: Bank 6, Status 0xbe00000000801152
spin lock 0xffffffff82a16f98 (mca) held by 0xfffff8000bf6a000 (tid 100072) too long
spin lock 0xffffffff82a16f98 (mca) held by 0xfffff8000bf6a000 (tid 100072) too long
spin lock 0xffffffff82a16f98 (mca) held by 0xfffff8000bf6a000 (tid 100072) too long
spin lock 0xffffffff82a3d780 (callout) held by 0xfffff800081415c0 (tid 100008) too long
panic: spin lock held too long
cpuid = 0
KDB: enter: panic
panic.txt0600002713266376667 7165 ustarrootwheelspin lock held too longversion.txt06000027413266376667 7644 ustarrootwheelFreeBSD 11.1-RELEASE-p7 #10 r313908+986837ba7e9(RELENG_2_4): Mon Mar 26 18:08:25 CDT 2018
root@buildbot2.netgate.com:/builder/ce-243/tmp/obj/builder/ce-243/tmp/FreeBSD-src/sys/pfSense...
panic: spin lock held too long
cpuid = 0
KDB: enter: panic
panic.txt0600002713266376667 7165 ustarrootwheelspin lock held too longversion.txt06000027413266376667 7644 ustarrootwheelFreeBSD 11.1-RELEASE-p7 #10 r313908+986837ba7e9(RELENG_2_4): Mon Mar 26 18:08:25 CDT 2018
root@buildbot2.netgate.com:/builder/ce-243/tmp/obj/builder/ce-243/tmp/FreeBSD-src/sys/pfSensewhat am i missing here... i did look up this:
https://forum.pfsense.org/index.php?topic=42890.0I did also have epu power saving mode enabled... but i just disabled it and rebooted... hopefully that's it?
anyone else see anything in the logs i'm missing here?
20180420crashreporter.zip -
MCA: Bank 6, Status 0xbe00000000801152
An MCA/MCE can only ever be a hardware problem. Usually there is more to the MCA messages than just that, the panic might have been from the hardware failing to even report the entire error message.
tl;dr version is that your BIOS detected a hardware problem and tried to inform the OS of a fault, and the OS is relaying that message to you.
Could be anything from bad RAM to bad power to a flaky MB/CPU. Need to run diags on the hardware to find out.
-
thank you, it was all brand new =)
i was running w10pro on it for weeks… without any BSOD on windoze side.
the only thing i remembered changing was that epu power saving mode... i've just disabled that... hopefully that was it?
if so... if that is the case... would i need to re-upload a fresh log to compare against the old for possible defect resolution in next build?
i'm not sure if free/community builds look for defect resolution compared to the paid editions.... :)
sorry i'm new to the product.
other than it's freakin brilliant! wished i had gone there sooner!
MCA: Bank 6, Status 0xbe00000000801152
An MCA/MCE can only ever be a hardware problem. Usually there is more to the MCA messages than just that, the panic might have been from the hardware failing to even report the entire error message.
tl;dr version is that your BIOS detected a hardware problem and tried to inform the OS of a fault, and the OS is relaying that message to you.
Could be anything from bad RAM to bad power to a flaky MB/CPU. Need to run diags on the hardware to find out.
-
Sorry but running windows means nothing. The OS cannot trigger an MCE/MCA, those come straight from hardware. Brand new also doesn't mean it's good. It might be defective.
-
memtest is clean ran it for about 48hrs
HDD is clean, i've got other CPU/RAM/HDD parts… all swapped and tested and vetted.
seems fine to me... with the exception of that 1 BIOS setting... i cleared the panic for now... but hopefully that one little change was it. perhaps the sw doesn't deal with super low power modes?
Sorry but running windows means nothing. The OS cannot trigger an MCE/MCA, those come straight from hardware. Brand new also doesn't mean it's good. It might be defective.
-
Again, that type of error cannot be from software. The hardware may not like what the OS set, but that is a pure hardware error. It's not an OS or software issue.
-
is there a list of recommended mobos bare-metal side that pfsense likes to be installed on?
i might just virtualizing it then… that way i might be able to make snapshot backups.
Again, that type of error cannot be from software. The hardware may not like what the OS set, but that is a pure hardware error. It's not an OS or software issue.
-
is there a list of recommended mobos bare-metal side that pfsense likes to be installed on?
i might just virtualizing it then… that way i might be able to make snapshot backups.
Again, that type of error cannot be from software. The hardware may not like what the OS set, but that is a pure hardware error. It's not an OS or software issue.
I see absolutely no reason not to visualize as you got 32 GB of ram in the machine.
But try to find defective hardware first. -
EPS Power Saving Mode didn't resolve issue btw. every component of HW has been replaced… except AC adapter... gonna try that next.
edit: power bricks replaced that wasn't it.
so FINALLY after all is said and done i figured out what was the causing the panic, the case expansion plugs like USB ports, audio plugs, eSATA port. i unplugged all those fancy things and system has been rock solid since that time!
yay! just wanted to toss that in here for future reference!