Frequent system halts on 2.5.2
-
Hi all,
My pfSense box has been crashing about once every few days and I haven't been able to find whats causing it. I've rebuilt and upgraded the version hoping that an update or fresh install would fix it but nothing so far. It crashed again not to long ago, but this time spit out an error log for the first time.At the end is a page fault while in kernel mode that causes a CPU panic and system freeze.
Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 09 fault virtual address = 0xd8 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80f712cc stack pointer = 0x28:0xfffffe004d5b68f0 frame pointer = 0x28:0xfffffe004d5b6900 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 67417 (unbound) trap number = 12 timeout stopping cpus panic: page fault cpuid = 3 time = 1630759059 KDB: enter: panic
I'm just hoping that someone can make more sense of the output than I can and hopefully point me to something that may be causing all my lock ups. TY!
-
So the important part there is:
db:0:kdb.enter.default> show pcpu cpuid = 3 dynamic pcpu = 0xfffffe007f12e380 curthread = 0xfffff8020ef64740: pid 67417 tid 100250 "unbound" curpcb = 0xfffff8020ef64ce0 fpcurthread = 0xfffff8020ef64740: pid 67417 "unbound" idlethread = 0xfffff80004340740: tid 100006 "idle: cpu3" curpmap = 0xfffff8020e6cc138 tssp = 0xffffffff83717758 commontssp = 0xffffffff83717758 rsp0 = 0xfffffe004d5b6cc0 kcr3 = 0xffffffffffffffff ucr3 = 0xffffffffffffffff scr3 = 0x0 gs32p = 0xffffffff8371df70 ldt = 0xffffffff8371dfb0 tss = 0xffffffff8371dfa0 tlb gen = 589816 curvnet = 0xfffff8000408ba80 db:0:kdb.enter.default> bt Tracing pid 67417 tid 100250 td 0xfffff8020ef64740 kdb_enter() at kdb_enter+0x37/frame 0xfffffe004d5b65b0 vpanic() at vpanic+0x197/frame 0xfffffe004d5b6600 panic() at panic+0x43/frame 0xfffffe004d5b6660 trap_fatal() at trap_fatal+0x391/frame 0xfffffe004d5b66c0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe004d5b6710 trap() at trap+0x286/frame 0xfffffe004d5b6820 calltrap() at calltrap+0x8/frame 0xfffffe004d5b6820 --- trap 0xc, rip = 0xffffffff80f712cc, rsp = 0xfffffe004d5b68f0, rbp = 0xfffffe004d5b6900 --- in_pcbdetach() at in_pcbdetach+0x3c/frame 0xfffffe004d5b6900 udp_detach() at udp_detach+0x93/frame 0xfffffe004d5b6930 sofree() at sofree+0x245/frame 0xfffffe004d5b6960 soclose() at soclose+0x30d/frame 0xfffffe004d5b69c0 _fdrop() at _fdrop+0x1a/frame 0xfffffe004d5b69e0 closef() at closef+0x23e/frame 0xfffffe004d5b6a70 closefp() at closefp+0xa0/frame 0xfffffe004d5b6ac0 amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe004d5b6bf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe004d5b6bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x800c8f47a, rsp = 0x7fffffffd978, rbp = 0x7fffffffd990 ---
The msgbuf.txt file in your redacted archive appears to be damaged, I can't check it.
That backtrace is not one I'm familiar with. It would be useful to compare that with the backtrace from other crashes. If it's close to identical it's probably a software issue. If it's a hardware problem they will be far more random.
Steve