Crash report

jimp

We would need to know the IP address it was submitted from. Looking at the IP address you logged into the forum from I see a crash from a nearby system submitted yesterday that was running 2.2.5, so I suppose that might be it. IP address ended in .73

Looks like something crashed in unbound somehow:

Backtrace:

db:0:kdb.enter.default>  show pcpu
cpuid        = 0
dynamic pcpu = 0x63a600
curthread    = 0xfffff80100852920: pid 80566 "unbound"
curpcb       = 0xfffffe006441dcc0
fpcurthread  = 0xfffff80100852920: pid 80566 "unbound"
idlethread   = 0xfffff80003390000: tid 100003 "idle: cpu0"
curpmap      = 0xfffff80126edd9f8
tssp         = 0xffffffff8219d190
commontssp   = 0xffffffff8219d190
rsp0         = 0xfffffe006441dcc0
gs32p        = 0xffffffff8219ebe8
ldt          = 0xffffffff8219ec28
tss          = 0xffffffff8219ec18
db:0:kdb.enter.default>  bt
Tracing pid 80566 tid 100156 td 0xfffff80100852920
done_store_dr() at done_store_dr+0x21/frame 0xfffffe006441daf0
mi_switch() at mi_switch+0xe1/frame 0xfffffe006441db30
critical_exit() at critical_exit+0x7a/frame 0xfffffe006441db50
intr_event_handle() at intr_event_handle+0x106/frame 0xfffffe006441dba0
intr_execute_handlers() at intr_execute_handlers+0x48/frame 0xfffffe006441dbd0
lapic_handle_intr() at lapic_handle_intr+0x3f/frame 0xfffffe006441dbf0
Xapic_isr1() at Xapic_isr1+0xa4/frame 0xfffffe006441dbf0
--- interrupt, rip = 0x4354e4, rsp = 0x7fffffffebb0, rbp = 0x7fffffffebc0 ---

End of the message buffer:

kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xfffffe006443bfff
fault code		= supervisor write data, page not present
instruction pointer	= 0x20:0xffffffff80f34434
stack pointer	        = 0x28:0xfffffe006441da80
frame pointer	        = 0x28:0xfffffe006441daf0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= resume, IOPL = 0
current process		= 80566 (unbound)

That's a pretty deep area for it to have crashed, unless it crashes repeatedly in the exact same spot I might be inclined to distrust the hardware at the moment.

dennypage

I just send in another. Same place. I believe I've had the issue previously with 2.2.2 or 2.2.3. If you look back, you should find previous crash reports, either from .73 or from .78. All Unbound related I believe.

I "fixed" the issue previously by turning off DHCP registration. With DHCP registration disabled, Unbound has been fairly stable for me. One crash (spontaneous exit) per month maybe, but no system crashes.

I've been testing 2.2.5 for a few weeks, and it's been very stable for me aside from an install problem that I've been talking with Chris about. I just turned DHCP registration back on as part of 2.2.5 testing about 3 days ago. In those 3 days, I've had 2 system crashes.

dennypage

I had another this morning. In php-fpm this time, but still at the point of a lease update.

If you want to swap out the hardware I'm okay with that. However before doing that, I think you probably want to have a close look at some of the earlier crash reports I submitted. The first ones should show a SG-2440 rather than the current SG-4860.

dennypage

I just sent in another, again with Unbound.

Unfortunately, this one hit in the middle of an upgrade and left the system unbootable. Required a re-install.

dennypage

Another in the middle of an update. Unbound again.

Given that no one else seems to see these problems, maybe it is a hardware issue.

Do you guys want to swap it out?

@jimp:

That's a pretty deep area for it to have crashed, unless it crashes repeatedly in the exact same spot I might be inclined to distrust the hardware at the moment.

heper

might be better to ask on the portal

cwagz

I have had two crashes on 2.2.5 in the last few days. Never had a problem before with my equipment. My IP should be the same as what is logged on this post and ends in .161

I did recently upgrade my FiOS to 150 / 150. So my WAN port is now connected via gigabit. Let me know if you need any more information.

dennypage

It looks like my crashes may have been the result of an issue with hardware crypto acceleration. At cmb's suggestion, I've disabled aesni and haven't had a crash since. Of course, your mileage may vary.

cwagz

I just turned AES-NI off and will see what happens. Thanks for the information.

cmb

@cwagz:

I just turned AES-NI off and will see what happens. Thanks for the information.

I found a couple crash reports submitted from the same IP you're visiting the forum from, and it's not likely that's the cause in your case. There have been known AES-NI panics related to FPU in all versions, which the vast majority never hit, but some routinely hit. It's something we're pursuing upstream and expect to have resolved in 2.3. It's something to try, but I don't expect it'll have any impact for you.

Your crash looks nothing at all like those (nor any others I can recall offhand), and the two different crashes aren't even similar to each other. Most often when you're getting crashes with that frequency, and they're not the same or at least similar, the root cause is a hardware problem. Both those were memory corruption related, which could still be a software problem.

If you're continuing to get crashes, keep submitting the crash reports, and start a new thread since this is not the same as the original issue here, and I'll check them and suggest how to proceed from there.