Unrecoverable machine check exception
-
On Netgate 6100:
Crash report begins. Anonymous machine information: amd64 14.0-CURRENT FreeBSD 14.0-CURRENT amd64 1400094 #1 plus-RELENG_23_09_1-n256200-3de1e293f3a: Wed Dec 6 21:00:32 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1-main/obj/amd64/Obhu6gXB/var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1 Crash report details: No PHP errors found. Filename: /var/crash/info.0 Dump header from device: /dev/mmcsd0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 269824 Blocksize: 512 Compression: none Dumptime: 2023-12-17 21:17:10 +0200 Hostname: pfSense6100.home.arpa Magic: FreeBSD Text Dump Version String: FreeBSD 14.0-CURRENT amd64 1400094 #1 plus-RELENG_23_09_1-n256200-3de1e293f3a: Wed Dec 6 21:00:32 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_09_1-main/obj/am Panic String: Unrecoverable machine check exception Dump Parity: 1383871310 Bounds: 0 Dump Status: good db:1:pfs> bt Tracing pid 11 tid 100006 td 0xfffffe0011fc2e40 kdb_enter() at kdb_enter+0x32/frame 0xfffffe0011dd7d60 vpanic() at vpanic+0x163/frame 0xfffffe0011dd7e90 panic() at panic+0x43/frame 0xfffffe0011dd7ef0 mca_intr() at mca_intr+0xbb/frame 0xfffffe0011dd7f20 mchk_calltrap() at mchk_calltrap+0x8/frame 0xfffffe0011dd7f20 --- trap 0x1c, rip = 0xffffffff8125c96b, rsp = 0xfffffe001079fd70, rbp = 0xfffffe001079fd70 --- acpi_cpu_idle_mwait() at acpi_cpu_idle_mwait+0x6b/frame 0xfffffe001079fd70 acpi_cpu_idle() at acpi_cpu_idle+0x193/frame 0xfffffe001079fdb0 cpu_idle_acpi() at cpu_idle_acpi+0x46/frame 0xfffffe001079fdd0 cpu_idle() at cpu_idle+0x9d/frame 0xfffffe001079fdf0 sched_idletd() at sched_idletd+0x576/frame 0xfffffe001079fef0 fork_exit() at fork_exit+0x7f/frame 0xfffffe001079ff30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001079ff30 --- trap 0xfee00083, rip = 0x20ff600083, rsp = 0x20ffc00083, rbp = 0x20fe000083 --- db:1:pfs> show registers cs 0x20 ds 0x3b es 0x3b fs 0x13 gs 0x1b ss 0x28 rax 0x12 rcx 0xffffffff814591d9 rdx 0x3f8 rbx 0x100 rsp 0xfffffe0011dd7d60 rbp 0xfffffe0011dd7d60 rsi 0x15f3cdbe rdi 0x4 r8 0x14e4015f3cdbe r9 0xfffffe0011fc2e40 r10 0xfffffe0011dd7c40 r11 0xcedfc2df9afff59c r12 0 r13 0x6e8e4000 r14 0xffffffff813b6e40 tame_mouse.butmapmsc+0xe01 r15 0xfffffe0011fc2e40 rip 0xffffffff80d38d62 kdb_enter+0x32 rflags 0x86 kdb_enter+0x32: movq $0,0x2344aa3(%rip) db:1:pfs> show pcpu cpuid = 3 dynamic pcpu = 0xfffffe008efdcf00 curthread = 0xfffffe0011fc2e40: pid 11 tid 100006 critnest 2 "idle: cpu3" curpcb = 0xfffffe0011fc3360 fpcurthread = none idlethread = 0xfffffe0011fc2e40: tid 100006 "idle: cpu3" self = 0xffffffff84013000 curpmap = 0xffffffff83021ab0 tssp = 0xffffffff84013384 rsp0 = 0xfffffe00107a0000 kcr3 = 0xffffffffffffffff ucr3 = 0xffffffffffffffff scr3 = 0x0 gs32p = 0xffffffff84013404 ldt = 0xffffffff84013444 tss = 0xffffffff84013434 curvnet = 0 db:1:pfs> run lockinfo db:2:lockinfo> show locks No such command; use "help" to list available commands db:2:lockinfo> show alllocks No such command; use "help" to list available commands db:2:lockinfo> show lockedvnods Locked vnodes db:1:pfs> acttrace ...
-
You should open a ticket with us:
https://www.netgate.com/tac-support-requestSteve
-
-
@stephenw10 Done, thanks.
-
Support told me, that based on the crash dump, main culprit would be CPU core3. Didn't find references to C3558 getting bad with time and thus causing machine check condition, so this seems a bit odd to me. At one time I had Qnap TS-453 Pro which died, because the Celeron CPU just went bad.
Support also suggested to run memtester package, which I already installed, but because it will not be able to check all the memory (such as occupied by kernel, for example), a better solution for testing memory would be running some tool from USB stick, while 6100 would not be running at all. Is that possible and how would I accomplish this?
Are there any tools available, which would be able to diagnose 6100 hardware?
-
Reviewing the ticket....
-
Hmm, OK. Did this just spontaneously start?
-
Yes, as far as I can tell. I had a session open to 6100, but the PC was at sleep at the time and we were watching AppleTV.
There's a strange hole in the system.log, seems like there are no entries for Dec 16:
Dec 15 22:52:38 pfSense6100 php_pfb[12425]: [pfBlockerNG] filterlog daemon stopped Dec 15 22:52:38 pfSense6100 SuricataStartup[15367]: Suricata STOP for LAN3(8125_igc2)... Dec 15 22:52:39 pfSense6100 SuricataStartup[17040]: Suricata STOP for LAN1(12401_igc0)... Dec 17 21:18:59 pfSense6100 syslogd: kernel boot file is /boot/kernel/kernel Dec 17 21:18:59 pfSense6100 kernel: ---<<BOOT>>--- Dec 17 21:18:59 pfSense6100 kernel: Copyright (c) 1992-2023 The FreeBSD Project. Dec 17 21:18:59 pfSense6100 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 D
-
It may have lost those logs when it crashed. Especially if you have ram disks enabled.
-
Yes, of course... ram disks are indeed enabled.