Crashes on APU2
-
It seems to crash every 2 weeks or every month. I have no idea what I'm looking for in the error dump, and can't determine to what to even google for to find a answer. I've tried reinstalling it, run a self test on the ssd, and that hasn't fixed the issue in anyway. If anyone has any ideas, please let me know.
-
Current temperatures is 54 °C to 60 °C. It seems to crash when I'm not actively using the internet, because it has never crashed on me during usage. The device is only a couple of months old. I don't think it was crashing on earlier versions of pfSense.
-
The key parts of that are:
db:0:kdb.enter.default> show pcpu cpuid = 2 dynamic pcpu = 0xfffffe0197692480 curthread = 0xfffff801034d9620: pid 4609 "sh" curpcb = 0xfffffe012089fb80 fpcurthread = 0xfffff801034d9620: pid 4609 "sh" idlethread = 0xfffff80003975000: tid 100005 "idle: cpu2" curpmap = 0xfffff8007b66f138 tssp = 0xffffffff82bb47e0 commontssp = 0xffffffff82bb47e0 rsp0 = 0xfffffe012089fb80 gs32p = 0xffffffff82bbb038 ldt = 0xffffffff82bbb078 tss = 0xffffffff82bbb068 db:0:kdb.enter.default> bt Tracing pid 4609 tid 100201 td 0xfffff801034d9620 pmap_remove_pages() at pmap_remove_pages+0x2d7/frame 0xfffffe012089f450 exec_new_vmspace() at exec_new_vmspace+0x1b5/frame 0xfffffe012089f4c0 exec_elf64_imgact() at exec_elf64_imgact+0x931/frame 0xfffffe012089f5b0 kern_execve() at kern_execve+0x77c/frame 0xfffffe012089f900 sys_execve() at sys_execve+0x4a/frame 0xfffffe012089f980 amd64_syscall() at amd64_syscall+0xa38/frame 0xfffffe012089fab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe012089fab0 --- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x800b4664a, rsp = 0x7fffffffe218, rbp = 0x7fffffffe360 --- db:0:kdb.enter.default> ps
and
Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0xfffff83df000e028 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff81181117 stack pointer = 0x28:0xfffffe012089f380 frame pointer = 0x28:0xfffffe012089f450 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 4609 (sh)
Unfortunately nothing super conclusive there but it does look similar to this:
https://forum.netgate.com/topic/106192/regular-crash-reports-on-my-apu2-2-3-2I would boot memtest86+ and run that for a few loops to be sure if you can.
Steve
-
@cypher100 Apart from running memtest on your APU2 you could also consider upgrading BIOS to v4.0.24 which enables ECC on APU2 models with 4GB RAM (e.g. APU2C4) variants. FreeBSD supports ECC and can report errors via MCA - although the APU2 ECC is relatively recent and so is unproven. Of course I'm not suggesting you continue using marginal HW but it may add another data point.
it might also be worth checking the power supply - specifically as It seems to crash when I'm not actively using the internet could well be a PS issue.
-
It would be interesting to compare it to other reports if it crashes regularly. If they are all the same that's usually a pretty big clue.
Steve
-
I changed some options around, and the crashes continue. Memtest didn't show anything wrong with the memory, I turned off PowerD to make sure it wasn't a downclocking issue, and crashed sooner after I turned that off.
I have a universal laptop charger, I'll test out the PSU theory, and report here.
-
I will also give v4.024 a shot, and update here if any crashes occur.
-
Today it crashed again. I installed the latest BIOS with ECC, and used a third party adapter that matches the requirements for the APU2. I reinstalled PFSense after doing all that above to. I have attached the error log. I'm out of ideas on what could be causing this.
-
I'm updating to v4.9.0.2 to see if that solves the issue.
-
Hmm, very different crash:
db:0:kdb.enter.default> bt Tracing pid 0 tid 100250 td 0xfffff8001d5f0000 lz4_compress() at lz4_compress+0x761/frame 0xfffffe01205358d0 zio_compress_data() at zio_compress_data+0x8c/frame 0xfffffe0120535910 zio_write_compress() at zio_write_compress+0x21f/frame 0xfffffe0120535990 zio_execute() at zio_execute+0xac/frame 0xfffffe01205359e0 taskqueue_run_locked() at taskqueue_run_locked+0x154/frame 0xfffffe0120535a40 taskqueue_thread_loop() at taskqueue_thread_loop+0x98/frame 0xfffffe0120535a70 fork_exit() at fork_exit+0x83/frame 0xfffffe0120535ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0120535ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db:0:kdb.enter.default> ps
Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x1 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8300ab51 stack pointer = 0x28:0xfffffe0120535860 frame pointer = 0x28:0xfffffe01205358d0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (zio_write_issue_2)
More like a hardware issue with different crashes like that.
Steve
-
I seem to be having some instability issues with my APU2C. It was running OK for over a week. This morning the orange lights on each NIC were not flashing and all connected clients were receiving a self-assigned IP address.
The only way to resolve this was to reboot I had a look through the logs but couldn't find anything. My grafana dashboard shows that something odd started to occur around midnight:
CPU temperature on average is about 53 degrees Celsius and it is running the latest BIOS v4.10.0.2||spoiler||