Crash report



  • I had a crash this morning on my SG-4860 running with

    pfSense-Full-Update-2.2.5-DEVELOPMENT-amd64-20151018-0257.tgz

    Crash report submitted via web interface. Please let me know if you need more detail.

    [Chris, in case you are going to ask, this was stock SU+J]


  • Rebel Alliance Developer Netgate

    We would need to know the IP address it was submitted from. Looking at the IP address you logged into the forum from I see a crash from a nearby system submitted yesterday that was running 2.2.5, so I suppose that might be it. IP address ended in .73

    Looks like something crashed in unbound somehow:

    Backtrace:

    db:0:kdb.enter.default>  show pcpu
    cpuid        = 0
    dynamic pcpu = 0x63a600
    curthread    = 0xfffff80100852920: pid 80566 "unbound"
    curpcb       = 0xfffffe006441dcc0
    fpcurthread  = 0xfffff80100852920: pid 80566 "unbound"
    idlethread   = 0xfffff80003390000: tid 100003 "idle: cpu0"
    curpmap      = 0xfffff80126edd9f8
    tssp         = 0xffffffff8219d190
    commontssp   = 0xffffffff8219d190
    rsp0         = 0xfffffe006441dcc0
    gs32p        = 0xffffffff8219ebe8
    ldt          = 0xffffffff8219ec28
    tss          = 0xffffffff8219ec18
    db:0:kdb.enter.default>  bt
    Tracing pid 80566 tid 100156 td 0xfffff80100852920
    done_store_dr() at done_store_dr+0x21/frame 0xfffffe006441daf0
    mi_switch() at mi_switch+0xe1/frame 0xfffffe006441db30
    critical_exit() at critical_exit+0x7a/frame 0xfffffe006441db50
    intr_event_handle() at intr_event_handle+0x106/frame 0xfffffe006441dba0
    intr_execute_handlers() at intr_execute_handlers+0x48/frame 0xfffffe006441dbd0
    lapic_handle_intr() at lapic_handle_intr+0x3f/frame 0xfffffe006441dbf0
    Xapic_isr1() at Xapic_isr1+0xa4/frame 0xfffffe006441dbf0
    --- interrupt, rip = 0x4354e4, rsp = 0x7fffffffebb0, rbp = 0x7fffffffebc0 ---
    
    

    End of the message buffer:

    kernel trap 12 with interrupts disabled
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address	= 0xfffffe006443bfff
    fault code		= supervisor write data, page not present
    instruction pointer	= 0x20:0xffffffff80f34434
    stack pointer	        = 0x28:0xfffffe006441da80
    frame pointer	        = 0x28:0xfffffe006441daf0
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= resume, IOPL = 0
    current process		= 80566 (unbound)
    
    

    That's a pretty deep area for it to have crashed, unless it crashes repeatedly in the exact same spot I might be inclined to distrust the hardware at the moment.



  • I just send in another. Same place. I believe I've had the issue previously with 2.2.2 or 2.2.3. If you look back, you should find previous crash reports, either from .73 or from .78. All Unbound related I believe.

    I "fixed" the issue previously by turning off DHCP registration. With DHCP registration disabled, Unbound has been fairly stable for me. One crash (spontaneous exit) per month maybe, but no system crashes.

    I've been testing 2.2.5 for a few weeks, and it's been very stable for me aside from an install problem that I've been talking with Chris about. I just turned DHCP registration back on as part of 2.2.5 testing about 3 days ago. In those 3 days, I've had 2 system crashes.



  • I had another this morning. In php-fpm this time, but still at the point of a lease update.

    If you want to swap out the hardware I'm okay with that. However before doing that, I think you probably want to have a close look at some of the earlier crash reports I submitted. The first ones should show a SG-2440 rather than the current SG-4860.



  • I just sent in another, again with Unbound.

    Unfortunately, this one hit in the middle of an upgrade and left the system unbootable. Required a re-install.



  • Another in the middle of an update. Unbound again.

    Given that no one else seems to see these problems, maybe it is a hardware issue.

    Do you guys want to swap it out?

    @jimp:

    That's a pretty deep area for it to have crashed, unless it crashes repeatedly in the exact same spot I might be inclined to distrust the hardware at the moment.



  • might be better to ask on the portal



  • I have had two crashes on 2.2.5 in the last few days.  Never had a problem before with my equipment.  My IP should be the same as what is logged on this post and ends in .161

    I did recently upgrade my FiOS to 150 / 150.  So my WAN port is now connected via gigabit.  Let me know if you need any more information.



  • It looks like my crashes may have been the result of an issue with hardware crypto acceleration. At cmb's suggestion, I've disabled aesni and haven't had a crash since. Of course, your mileage may vary.



  • I just turned AES-NI off and will see what happens.  Thanks for the information.



  • @cwagz:

    I just turned AES-NI off and will see what happens.  Thanks for the information.

    I found a couple crash reports submitted from the same IP you're visiting the forum from, and it's not likely that's the cause in your case. There have been known AES-NI panics related to FPU in all versions, which the vast majority never hit, but some routinely hit. It's something we're pursuing upstream and expect to have resolved in 2.3. It's something to try, but I don't expect it'll have any impact for you.

    Your crash looks nothing at all like those (nor any others I can recall offhand), and the two different crashes aren't even similar to each other. Most often when you're getting crashes with that frequency, and they're not the same or at least similar, the root cause is a hardware problem. Both those were memory corruption related, which could still be a software problem.

    If you're continuing to get crashes, keep submitting the crash reports, and start a new thread since this is not the same as the original issue here, and I'll check them and suggest how to proceed from there.


Log in to reply