Crash report
-
I had a crash this morning on my SG-4860 running with
pfSense-Full-Update-2.2.5-DEVELOPMENT-amd64-20151018-0257.tgz
Crash report submitted via web interface. Please let me know if you need more detail.
[Chris, in case you are going to ask, this was stock SU+J]
-
We would need to know the IP address it was submitted from. Looking at the IP address you logged into the forum from I see a crash from a nearby system submitted yesterday that was running 2.2.5, so I suppose that might be it. IP address ended in .73
Looks like something crashed in unbound somehow:
Backtrace:
db:0:kdb.enter.default> show pcpu cpuid = 0 dynamic pcpu = 0x63a600 curthread = 0xfffff80100852920: pid 80566 "unbound" curpcb = 0xfffffe006441dcc0 fpcurthread = 0xfffff80100852920: pid 80566 "unbound" idlethread = 0xfffff80003390000: tid 100003 "idle: cpu0" curpmap = 0xfffff80126edd9f8 tssp = 0xffffffff8219d190 commontssp = 0xffffffff8219d190 rsp0 = 0xfffffe006441dcc0 gs32p = 0xffffffff8219ebe8 ldt = 0xffffffff8219ec28 tss = 0xffffffff8219ec18 db:0:kdb.enter.default> bt Tracing pid 80566 tid 100156 td 0xfffff80100852920 done_store_dr() at done_store_dr+0x21/frame 0xfffffe006441daf0 mi_switch() at mi_switch+0xe1/frame 0xfffffe006441db30 critical_exit() at critical_exit+0x7a/frame 0xfffffe006441db50 intr_event_handle() at intr_event_handle+0x106/frame 0xfffffe006441dba0 intr_execute_handlers() at intr_execute_handlers+0x48/frame 0xfffffe006441dbd0 lapic_handle_intr() at lapic_handle_intr+0x3f/frame 0xfffffe006441dbf0 Xapic_isr1() at Xapic_isr1+0xa4/frame 0xfffffe006441dbf0 --- interrupt, rip = 0x4354e4, rsp = 0x7fffffffebb0, rbp = 0x7fffffffebc0 ---
End of the message buffer:
kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xfffffe006443bfff fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80f34434 stack pointer = 0x28:0xfffffe006441da80 frame pointer = 0x28:0xfffffe006441daf0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 80566 (unbound)
That's a pretty deep area for it to have crashed, unless it crashes repeatedly in the exact same spot I might be inclined to distrust the hardware at the moment.
-
I just send in another. Same place. I believe I've had the issue previously with 2.2.2 or 2.2.3. If you look back, you should find previous crash reports, either from .73 or from .78. All Unbound related I believe.
I "fixed" the issue previously by turning off DHCP registration. With DHCP registration disabled, Unbound has been fairly stable for me. One crash (spontaneous exit) per month maybe, but no system crashes.
I've been testing 2.2.5 for a few weeks, and it's been very stable for me aside from an install problem that I've been talking with Chris about. I just turned DHCP registration back on as part of 2.2.5 testing about 3 days ago. In those 3 days, I've had 2 system crashes.
-
I had another this morning. In php-fpm this time, but still at the point of a lease update.
If you want to swap out the hardware I'm okay with that. However before doing that, I think you probably want to have a close look at some of the earlier crash reports I submitted. The first ones should show a SG-2440 rather than the current SG-4860.
-
I just sent in another, again with Unbound.
Unfortunately, this one hit in the middle of an upgrade and left the system unbootable. Required a re-install.
-
Another in the middle of an update. Unbound again.
Given that no one else seems to see these problems, maybe it is a hardware issue.
Do you guys want to swap it out?
That's a pretty deep area for it to have crashed, unless it crashes repeatedly in the exact same spot I might be inclined to distrust the hardware at the moment.
-
might be better to ask on the portal
-
I have had two crashes on 2.2.5 in the last few days. Never had a problem before with my equipment. My IP should be the same as what is logged on this post and ends in .161
I did recently upgrade my FiOS to 150 / 150. So my WAN port is now connected via gigabit. Let me know if you need any more information.
-
It looks like my crashes may have been the result of an issue with hardware crypto acceleration. At cmb's suggestion, I've disabled aesni and haven't had a crash since. Of course, your mileage may vary.
-
I just turned AES-NI off and will see what happens. Thanks for the information.
-
I just turned AES-NI off and will see what happens. Thanks for the information.
I found a couple crash reports submitted from the same IP you're visiting the forum from, and it's not likely that's the cause in your case. There have been known AES-NI panics related to FPU in all versions, which the vast majority never hit, but some routinely hit. It's something we're pursuing upstream and expect to have resolved in 2.3. It's something to try, but I don't expect it'll have any impact for you.
Your crash looks nothing at all like those (nor any others I can recall offhand), and the two different crashes aren't even similar to each other. Most often when you're getting crashes with that frequency, and they're not the same or at least similar, the root cause is a hardware problem. Both those were memory corruption related, which could still be a software problem.
If you're continuing to get crashes, keep submitting the crash reports, and start a new thread since this is not the same as the original issue here, and I'll check them and suggest how to proceed from there.