Crashes on APU2



  • It seems to crash every 2 weeks or every month. I have no idea what I'm looking for in the error dump, and can't determine to what to even google for to find a answer. I've tried reinstalling it, run a self test on the ssd, and that hasn't fixed the issue in anyway. If anyone has any ideas, please let me know.

    0_1550212997635_textdump.tar.0



  • Current temperatures is 54 °C to 60 °C. It seems to crash when I'm not actively using the internet, because it has never crashed on me during usage. The device is only a couple of months old. I don't think it was crashing on earlier versions of pfSense.


  • Netgate Administrator

    The key parts of that are:

    db:0:kdb.enter.default>  show pcpu
    cpuid        = 2
    dynamic pcpu = 0xfffffe0197692480
    curthread    = 0xfffff801034d9620: pid 4609 "sh"
    curpcb       = 0xfffffe012089fb80
    fpcurthread  = 0xfffff801034d9620: pid 4609 "sh"
    idlethread   = 0xfffff80003975000: tid 100005 "idle: cpu2"
    curpmap      = 0xfffff8007b66f138
    tssp         = 0xffffffff82bb47e0
    commontssp   = 0xffffffff82bb47e0
    rsp0         = 0xfffffe012089fb80
    gs32p        = 0xffffffff82bbb038
    ldt          = 0xffffffff82bbb078
    tss          = 0xffffffff82bbb068
    db:0:kdb.enter.default>  bt
    Tracing pid 4609 tid 100201 td 0xfffff801034d9620
    pmap_remove_pages() at pmap_remove_pages+0x2d7/frame 0xfffffe012089f450
    exec_new_vmspace() at exec_new_vmspace+0x1b5/frame 0xfffffe012089f4c0
    exec_elf64_imgact() at exec_elf64_imgact+0x931/frame 0xfffffe012089f5b0
    kern_execve() at kern_execve+0x77c/frame 0xfffffe012089f900
    sys_execve() at sys_execve+0x4a/frame 0xfffffe012089f980
    amd64_syscall() at amd64_syscall+0xa38/frame 0xfffffe012089fab0
    fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe012089fab0
    --- syscall (59, FreeBSD ELF64, sys_execve), rip = 0x800b4664a, rsp = 0x7fffffffe218, rbp = 0x7fffffffe360 ---
    db:0:kdb.enter.default>  ps
    

    and

    Fatal trap 12: page fault while in kernel mode
    cpuid = 2; apic id = 02
    fault virtual address	= 0xfffff83df000e028
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff81181117
    stack pointer	        = 0x28:0xfffffe012089f380
    frame pointer	        = 0x28:0xfffffe012089f450
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 4609 (sh)
    

    Unfortunately nothing super conclusive there but it does look similar to this:
    https://forum.netgate.com/topic/106192/regular-crash-reports-on-my-apu2-2-3-2

    I would boot memtest86+ and run that for a few loops to be sure if you can.

    Steve



  • @cypher100 Apart from running memtest on your APU2 you could also consider upgrading BIOS to v4.0.24 which enables ECC on APU2 models with 4GB RAM (e.g. APU2C4) variants. FreeBSD supports ECC and can report errors via MCA - although the APU2 ECC is relatively recent and so is unproven. Of course I'm not suggesting you continue using marginal HW but it may add another data point.

    it might also be worth checking the power supply - specifically as It seems to crash when I'm not actively using the internet could well be a PS issue.


  • Netgate Administrator

    It would be interesting to compare it to other reports if it crashes regularly. If they are all the same that's usually a pretty big clue.

    Steve



  • I changed some options around, and the crashes continue. Memtest didn't show anything wrong with the memory, I turned off PowerD to make sure it wasn't a downclocking issue, and crashed sooner after I turned that off.

    I have a universal laptop charger, I'll test out the PSU theory, and report here.



  • I will also give v4.024 a shot, and update here if any crashes occur.



  • Today it crashed again. I installed the latest BIOS with ECC, and used a third party adapter that matches the requirements for the APU2. I reinstalled PFSense after doing all that above to. I have attached the error log. I'm out of ideas on what could be causing this.

    0_1551922109339_textdump.tar.0



  • I'm updating to v4.9.0.2 to see if that solves the issue.


  • Netgate Administrator

    Hmm, very different crash:

    db:0:kdb.enter.default>  bt
    Tracing pid 0 tid 100250 td 0xfffff8001d5f0000
    lz4_compress() at lz4_compress+0x761/frame 0xfffffe01205358d0
    zio_compress_data() at zio_compress_data+0x8c/frame 0xfffffe0120535910
    zio_write_compress() at zio_write_compress+0x21f/frame 0xfffffe0120535990
    zio_execute() at zio_execute+0xac/frame 0xfffffe01205359e0
    taskqueue_run_locked() at taskqueue_run_locked+0x154/frame 0xfffffe0120535a40
    taskqueue_thread_loop() at taskqueue_thread_loop+0x98/frame 0xfffffe0120535a70
    fork_exit() at fork_exit+0x83/frame 0xfffffe0120535ab0
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0120535ab0
    --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
    db:0:kdb.enter.default>  ps
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 2; apic id = 02
    fault virtual address	= 0x1
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff8300ab51
    stack pointer	        = 0x28:0xfffffe0120535860
    frame pointer	        = 0x28:0xfffffe01205358d0
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 0 (zio_write_issue_2)
    

    More like a hardware issue with different crashes like that.

    Steve


Log in to reply