Help with a crash dump
-
Hmm. Well I'd disable Nut as a test just because it's the only thing doing anything active.
Pretty sure there are others running that box without issue so I'd guess it's either a config issue or some bad component, assuming you have not added any thing like a wifi card etc.
-
I'll disable nut and report back.
Nothing unusual in the config. Not sure how I could tell if there is?
No other components added and no additional cards / hardware.
-
I disabled nut and it crashed again this week.
Anything else I can do?
Thanks.
-
Are the crashes all the same with this error:
db:0:kdb.enter.default> bt Tracing pid 12 tid 100026 td 0xfffff8000396b620 pfslowtimo() at pfslowtimo+0x52/frame 0xfffffe010e6e6810 softclock_call_cc() at softclock_call_cc+0x13a/frame 0xfffffe010e6e68c0
Or is it different?
-
This is the latest error I recieved:
db:0:kdb.enter.default> bt Tracing pid 12 tid 100026 td 0xfffff8000397d620 ipport_tick() at ipport_tick+0x4e/frame 0xfffffe010e6e6810 softclock_call_cc() at softclock_call_cc+0x13a/frame 0xfffffe010e6e68c0 softclock() at softclock+0x79/frame 0xfffffe010e6e68e0
-
It looks like that's hardware, I would potentially look at changing the on-board battery see if that helps, but I highly doubt it would.
-
Hi,
I'm experimenting the same kind of issue, my PFsense box crashing on daily basis since a couple of months. I did the same changing the SSD drive but getting the same results.
Looking at the logs I see the same kind of details as discussed above:
Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 06 fault virtual address = 0xc46b3dd0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80d89866 stack pointer = 0x28:0xfffffe01188d6688 frame pointer = 0x28:0xfffffe01188d6688 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 55216 (darkstat) ��version.txt������0600����0�������0�������336���������13421446713� 7622� ������ustar���root���������wheel���������FreeBSD 11.2-RELEASE-p6 #3 518496b29ae(RELENG_2_4_4): Wed Dec 12 07:41:44 EST 2018 root@buildbot2.nyi.netgate.com:/build/ce-crossbuild-244/obj/amd64/ZfGpH5cd/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/sys/pfSense�� Filename: /var/crash/textdump.tar.11 ddb.txt�����0600����0�������0�������140000������13422124500� 7063� ��ustar���root������wheel����db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command; use "help" to list available commands db:1:lockinfo> show alllocks No such command; use "help" to list available commands db:1:lockinfo> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 2 dynamic pcpu = 0xfffffe018f873480 curthread = 0xfffff80003dd0000: pid 12 "irq259: re0" curpcb = 0xfffffe0118646cc0 fpcurthread = none idlethread = 0xfffff80003939000: tid 100005 "idle: cpu2" curpmap = 0xffffffff82b83898 tssp = 0xffffffff82bb47e0 commontssp = 0xffffffff82bb47e0 rsp0 = 0xfffffe0118646cc0 gs32p = 0xffffffff82bbb038 ldt = 0xffffffff82bbb078 tss = 0xffffffff82bbb068 db:0:kdb.enter.default> bt Tracing pid 12 tid 100057 td 0xfffff80003dd0000 turnstile_broadcast() at turnstile_broadcast+0x47/frame 0xfffffe0118646050 __mtx_unlock_sleep() at __mtx_unlock_sleep+0xb9/frame 0xfffffe0118646080 pf_state_insert() at pf_state_insert+0xb33/frame 0xfffffe0118646110 pf_test_rule() at pf_test_rule+0x2c7c/frame 0xfffffe01186465a0 pf_test() at pf_test+0x20e9/frame 0xfffffe0118646800 pf_check_in() at pf_check_in+0x1d/frame 0xfffffe0118646820 pfil_run_hooks() at pfil_run_hooks+0x90/frame 0xfffffe01186468b0 ip_input() at ip_input+0x441/frame 0xfffffe0118646910 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0118646960 ether_demux() at ether_demux+0x173/frame 0xfffffe0118646990 ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe01186469f0 netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0118646a40 ether_input() at ether_input+0x26/frame 0xfffffe0118646a60 re_rxeof() at re_rxeof+0x601/frame 0xfffffe0118646ad0 re_intr_msi() at re_intr_msi+0xfc/frame 0xfffffe0118646b20 intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0118646b60 ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0118646bb0 fork_exit() at fork_exit+0x83/frame 0xfffffe0118646bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0118646bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
I also attached the latest crash report to this post: https://forum.netgate.com/post/819822
Thanks
-
@warden said in Help with a crash dump:
your crash is very different from the other one, Can you open a new thread?
-
It's definitely nut. I uninstalled and there were no crashes. I re-installed and it's started crashing. Is this an integration issue you can look at or should I contact the nut team?
Thanks.
-
Yet it still crashed with Nut installed but disabled previously?
Were you able to replicate that? It seems hard to imagine that could happen if it really was disabled.
If it's a problem with the nut binaries in FreeBSD that would need to be reported upstream but there must be be a lot of people running that in FreeBSD.
Do you have any additional crash reports? Anything showing the NUT package specifically?
Steve
-
I'm pretty sure it did. I'm going to disable it again and see if I get a crash dump with nut installed but disabled.
Here is the latest crash:
-
Mmm, well identical crash then. Implies probably software at least.
-
Bumping this thread back up. I've continued to have this problem. I disabled nut for a few months and it didn't go away. I'm seeing crashes about once or twice a week still. Any next debugging steps?
Latest crash dump attached.
Thanks in advance.
-
Hmm, well that is three almost identical crashes:
hardclock_cnt() at hardclock_cnt+0x131/frame 0xfffffe010e4d44e0 handleevents() at handleevents+0xc9/frame 0xfffffe010e4d4530 timercb() at timercb+0xad/frame 0xfffffe010e4d4580 lapic_handle_timer() at lapic_handle_timer+0xa2/frame 0xfffffe010e4d45c0 Xtimerint() at Xtimerint+0xa8/frame 0xfffffe010e4d45c0
I got to think it's some issue with the system clock being used on that system.
I see it's loading the speedstep driver (est), is powerd enabled? You might disabling it if so. It's been a while since I've seen one but some systems has issues with varying the cpu clock that would throw errors.
You could usually work past that by selevting a non variable system timer instead.
For example:[2.5.0-DEVELOPMENT][admin@apu.stevew.lan]/root: sysctl kern.timecounter.choice kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC(800) dummy(-1000000) [2.5.0-DEVELOPMENT][admin@apu.stevew.lan]/root: sysctl kern.timecounter.hardware kern.timecounter.hardware: HPET
Steve
-
@stephenw10 said in Help with a crash dump:
sysctl kern.timecounter.hardware
Thanks very much. powerd is not running.
I changed to HPET and will see what happens. I have to be honest, I know next to nothing about system timers so this is a stab in the dark for me. Will report back if anything happens.
-
HPET didnt work. My system froze with 're1 watchdog timeout' within about 5 mins. Re-booted, reset HPET and same thing happened.
-
@nik-taylor said in Help with a crash dump:
watchdog timeout
You saw that error only using the HPET timecounter? What is the default timecounter there?
Did it actually 'freeze' or just stop responding to the network? That error is typical when using Realtek NICs but the system usually still responds at the console for example. It's only the NICs that fail.
You might try the alternative driver if so. That has been shown to help if the default driver is triggering watchdog timeouts.
https://forum.netgate.com/topic/135850/official-realtek-driver-binary-1-95-for-2-4-4-releaseSteve
-
I have seen that error once or twice in the past but it's pretty consistent when i changed to HPET.
It just stopped responding to the network. I could still use a keyboard directly attached to the box.
Default timecounter is TSC.
I'll try the updated driver.
Thanks for helping me keep on top of this.
-
No problem.
Yeah if the console was still active but the NIC/driver crashed out definitely try the alternative driver.Ultimately there's not too much you can do. The Realtek NICs are budget items and not that well supported in FreeBSD.
Steve
-
@stephenw10 - 5 days and no crash. I think the NIC driver patch fixed it. Thanks for all your help. I think I learned my lesson, Intel NIC's from here on out.