How To Read Crash Report?
-
I have pfSense 3.2.3 installed on a Zotac CI323 Nano that seems to crash about once a day. I have been cutting back to the bare minimum in hopes of finding the package or feature causing the crash. I am down to NAT, dhcp server and DNS forwarder but still the system crashes.
This seems to be the portion of the crash report that describes the problem, but I don't know how to interpret it.
Fatal trap 9: general protection fault while in kernel mode cpuid = 3; apic id = 06 instruction pointer = 0x20:0xffffffff80f9b926 stack pointer = 0x28:0xfffffe01197b7910 frame pointer = 0x28:0xfffffe01197b79f0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3330 (dc) version.txt06000027012772734656 7635 ustarrootwheelFreeBSD 10.3-RELEASE-p5 #0 7307492(RELENG_2_3_2): Tue Jul 19 13:29:35 CDT 2016 root@ce23-amd64-builder:/builder/pfsense-232/tmp/obj/builder/pfsense-232/tmp/FreeBSD-src/sys/pfSense
I have attached the full crash report for anyone who might lend me a hand.
crash_report.txt -
To read them you need some knowledge of FreeBSD and how it works.
The most important things are generally the end of the message buffer and the backtrace.
<5>arp: unknown hardware address format (0x0f39) (from e3:6d:4c:08:7c:b4 to f7:af:1b:a7:03:3c) ugen0.4: <vendor 0x8087="">at usbus0 (disconnected) ugen0.4: <vendor 0x8087="">at usbus0 Fatal trap 9: general protection fault while in kernel mode cpuid = 3; apic id = 06 instruction pointer = 0x20:0xffffffff80f9b926 stack pointer = 0x28:0xfffffe01197b7910 frame pointer = 0x28:0xfffffe01197b79f0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3330 (dc)</vendor></vendor>
It says the current process is "dc" but that doesn't necessarily indicate it was dc itself that crashed, only that it was the active process when the crash occurred.
The ARP error above in the log is interesting as it could indicate that the NIC driver received corrupted info or it was corrupted somewhere between the NIC and when the OS interpreted the ARP packet. (Read: Hardware/memory issue, most likely)
db:0:kdb.enter.default> show pcpu cpuid = 3 dynamic pcpu = 0xfffffe019147a500 curthread = 0xfffff8010003d000: pid 3330 "dc" curpcb = 0xfffffe01197b7cc0 fpcurthread = 0xfffff8010003d000: pid 3330 "dc" idlethread = 0xfffff80003943000: tid 100006 "idle: cpu3" curpmap = 0xfffff8000394c4b8 tssp = 0xffffffff821135c8 commontssp = 0xffffffff821135c8 rsp0 = 0xfffffe01197b7cc0 gs32p = 0xffffffff82115020 ldt = 0xffffffff82115060 tss = 0xffffffff82115050 db:0:kdb.enter.default> bt Tracing pid 3330 tid 100147 td 0xfffff8010003d000 pmap_remove_pages() at pmap_remove_pages+0x546/frame 0xfffffe01197b79f0 vmspace_exit() at vmspace_exit+0x9c/frame 0xfffffe01197b7a30 exit1() at exit1+0x65f/frame 0xfffffe01197b7ac0 sys_sys_exit() at sys_sys_exit+0xe/frame 0xfffffe01197b7ad0 amd64_syscall() at amd64_syscall+0x40f/frame 0xfffffe01197b7bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01197b7bf0 --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x800cf214a, rsp = 0x7fffffffec38, rbp = 0x7fffffffec50 ---
The backtrace is quite short and all of the functions are quite low-level functions that are very unlikely to have a bug or other issue. Further, the functions at the top of the backtrace are dealing with memory management. Another point toward possible hardware or memory issues.
Given the nature of the crash and its frequency, I would be highly suspicious of the hardware. Check its power, cooling, etc. It might be as simple as overheating or it may be that the board itself is on its last leg.
-
@jimp, thank you for your detailed response. The zotac is new, but the RAM is old. I think I will change out the RAM and see if the problem reoccurs.
-
Changed RAM 5 days ago. pfSense has not crashed once in the past 5 days so it looks like the problem has been solved.
I put the suspect RAM in another computer and its been running fine since then as well.
My guess is that the old RAM is good, but was not seated well. In changing the RAM, I happened to fix the problem by seating the new RAM properly.
Thanks for the help jimp.