pfSense crashes randomly - new setup
-
I have a bit of an update and maybe some more questions:
Running pciconf, I have determined that that I am using the em drivers:
em0@pci0:3:0:0: class=0x020000 rev=0x06 hdr=0x00 vendor=0x8086 device=0x10bc subvendor=0x8086 subdevice=0x11bc
vendor = 'Intel Corporation'
device = '82571EB/82571GB Gigabit Ethernet Controller (Copper)'
class = network
subclass = ethernetI also read that that in the past ema nd igb drivers were merged. Could I be using the wrong drivers? Do I need to use igb drivers? I dont know how to tell pfsense to use a different driver. Or does this all just mean that my NIC is not compatible?
Thanks for the help.
-
That's the correct driver.
We need to see the backtrace from the ddb.txt file in the crash report to know more.
So for example:
db:0:kdb.enter.default> show pcpu cpuid = 0 dynamic pcpu = 0x532100 curthread = 0xfffff800033a0000: pid 11 "idle: cpu0" curpcb = 0xfffffe0059bc3cc0 fpcurthread = none idlethread = 0xfffff800033a0000: tid 100003 "idle: cpu0" curpmap = 0xffffffff820f89a0 tssp = 0xffffffff82113890 commontssp = 0xffffffff82113890 rsp0 = 0xfffffe0059bc3cc0 gs32p = 0xffffffff821152e8 ldt = 0xffffffff82115328 tss = 0xffffffff82115318 db:0:kdb.enter.default> bt Tracing pid 11 tid 100003 td 0xfffff800033a0000 callout_process() at callout_process+0x1a0/frame 0xfffffe0059bc38b0 handleevents() at handleevents+0x18e/frame 0xfffffe0059bc3910 timercb() at timercb+0x318/frame 0xfffffe0059bc3970 lapic_handle_timer() at lapic_handle_timer+0x9c/frame 0xfffffe0059bc39a0 Xtimerint() at Xtimerint+0x8c/frame 0xfffffe0059bc39a0 --- interrupt, rip = 0xffffffff80f84316, rsp = 0xfffffe0059bc3a70, rbp = 0xfffffe0059bc3a70 --- acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfffffe0059bc3a70 acpi_cpu_idle() at acpi_cpu_idle+0x15a/frame 0xfffffe0059bc3ac0 cpu_idle_acpi() at cpu_idle_acpi+0x3f/frame 0xfffffe0059bc3ae0 cpu_idle() at cpu_idle+0x90/frame 0xfffffe0059bc3b00 sched_idletd() at sched_idletd+0x1d5/frame 0xfffffe0059bc3bb0 fork_exit() at fork_exit+0x9a/frame 0xfffffe0059bc3bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0059bc3bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db:0:kdb.enter.default> ps
Steve
-
here is what is in my ddb.txt file. I had to use pastebin because it was too long: https://pastebin.com/vWni3cmP
Thanks.
-
OK so:
db:0:kdb.enter.default> show pcpu cpuid = 3 dynamic pcpu = 0xfffffe008cd085c0 curthread = 0xfffffe000fdc93a0: pid 16 tid 100079 critnest 1 "usbus1" curpcb = 0xfffffe000fdc98c0 fpcurthread = none idlethread = 0xfffffe000fcbce40: tid 100006 "idle: cpu3" self = 0xffffffff84013000 curpmap = 0xffffffff8303ef30 tssp = 0xffffffff84013384 rsp0 = 0xfffffe007ca93000 kcr3 = 0xffffffffffffffff ucr3 = 0xffffffffffffffff scr3 = 0x0 gs32p = 0xffffffff84013404 ldt = 0xffffffff84013444 tss = 0xffffffff84013434 curvnet = 0 db:0:kdb.enter.default> bt Tracing pid 16 tid 100079 td 0xfffffe000fdc93a0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe007ca92bb0 vpanic() at vpanic+0x183/frame 0xfffffe007ca92c00 panic() at panic+0x43/frame 0xfffffe007ca92c60 _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x67/frame 0xfffffe007ca92c70 _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe007ca92ce0 cpu_new_callout() at cpu_new_callout+0x2a2/frame 0xfffffe007ca92d30 callout_reset_sbt_on() at callout_reset_sbt_on+0x1a8/frame 0xfffffe007ca92d90 sleepq_set_timeout_sbt() at sleepq_set_timeout_sbt+0xbd/frame 0xfffffe007ca92dd0 _sleep() at _sleep+0x178/frame 0xfffffe007ca92e50 pause_sbt() at pause_sbt+0xff/frame 0xfffffe007ca92e80 usb_pause_mtx() at usb_pause_mtx+0x55/frame 0xfffffe007ca92eb0 usb_process() at usb_process+0xd7/frame 0xfffffe007ca92ef0 fork_exit() at fork_exit+0x7d/frame 0xfffffe007ca92f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe007ca92f30 --- trap 0xd9738ee, rip = 0x85d0315689903142, rsp = 0x22e7aa0baea7aa0e, rbp = 0x9ed8862492988620 ---
So it looks to be something USB related but I don't see any USB devices in your log other than controllers and hubs. Do you have any USB devices connected?
-
@stephenw10
No, the only USB device plugged in is the keyboard. However, i tried running it completely headless without the keyboard or monitor connected. With just the power and one cable on the LAN side connected to a laptop for access the GUI. No WAN connection, as i am trying to work out the bugs befpre i disconmect my current internet setup. Even with only the LAN and power cable, it still gets a panic error.Also, i just got 2 brand new RAM sticks today 2x8GB and still have the problem so im guessing its not related to the RAM.
-
If it's a RAM issue the panics will be more random. Do all your crash reports show that same backtrace?
It could be a driver issue with one of the USB controllers. You might try disabling the USB3 (xhci) controller in the BIOS if you can.
-
It looks like this latest crash is different.
db:0:kdb.enter.default> show registers cs 0x20 ds 0x3b es 0x3b fs 0x13 gs 0x1b ss 0x28 rax 0x12 rcx 0x1 rdx 0xfffffe001b7e4690 rbx 0x100 rsp 0xfffffe001b7e4a70 rbp 0xfffffe001b7e4a70 rsi 0x32 rdi 0xffffffff82d82918 vt_conswindow+0x10 r8 0 r9 0x1e6b00 r10 0xffffffff82d82908 vt_conswindow r11 0x15f r12 0 r13 0xfffffe001e255c80 r14 0xfffffe001b7e4b00 r15 0xfffffe001e2563a0 rip 0xffffffff80d43122 kdb_enter+0x32 rflags 0x86 kdb_enter+0x32: movq $0,0x2347ce3(%rip) db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command; use "help" to list available commands db:1:lockinfo> show alllocks No such command; use "help" to list available commands db:1:lockinfo> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 0 dynamic pcpu = 0x10865c0 curthread = 0xfffffe001e2563a0: pid 11 tid 100003 critnest 3 "idle: cpu0" curpcb = 0xfffffe001e2568c0 fpcurthread = none idlethread = 0xfffffe001e2563a0: tid 100003 "idle: cpu0" self = 0xffffffff84010000 curpmap = 0xffffffff8303ef30 tssp = 0xffffffff84010384 rsp0 = 0xfffffe001b7e5000 kcr3 = 0xffffffffffffffff ucr3 = 0xffffffffffffffff scr3 = 0x0 gs32p = 0xffffffff84010404 ldt = 0xffffffff84010444 tss = 0xffffffff84010434 curvnet = 0 db:0:kdb.enter.default> bt Tracing pid 11 tid 100003 td 0xfffffe001e2563a0 kdb_enter() at kdb_enter+0x32/frame 0xfffffe001b7e4a70 vpanic() at vpanic+0x183/frame 0xfffffe001b7e4ac0 panic() at panic+0x43/frame 0xfffffe001b7e4b20 _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x67/frame 0xfffffe001b7e4b30 _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe001b7e4ba0 handleevents() at handleevents+0x2cb/frame 0xfffffe001b7e4be0 timercb() at timercb+0x25b/frame 0xfffffe001b7e4c30 hpet_intr_single() at hpet_intr_single+0x1b0/frame 0xfffffe001b7e4c60 intr_event_handle() at intr_event_handle+0x123/frame 0xfffffe001b7e4cd0 intr_execute_handlers() at intr_execute_handlers+0x4a/frame 0xfffffe001b7e4d00 Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe001b7e4d00 --- interrupt, rip = 0xffffffff8125b026, rsp = 0xfffffe001b7e4dd0, rbp = 0xfffffe001b7e4dd0 --- acpi_cpu_c1() at acpi_cpu_c1+0x6/frame 0xfffffe001b7e4dd0 acpi_cpu_idle() at acpi_cpu_idle+0x2fe/frame 0xfffffe001b7e4e10 cpu_idle_acpi() at cpu_idle_acpi+0x48/frame 0xfffffe001b7e4e30 cpu_idle() at cpu_idle+0x9e/frame 0xfffffe001b7e4e50 sched_idletd() at sched_idletd+0x4d1/frame 0xfffffe001b7e4ef0 fork_exit() at fork_exit+0x7d/frame 0xfffffe001b7e4f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001b7e4f30 --- trap 0x552ee2ab, rip = 0xdd69eb03d129eb07, rsp = 0x7a5e704f761e704b, rbp = 0xc6615c61ca215c65 --- db:0:kdb.enter.default> ps
-
Hmm still the same spin lock issue though but from something else.
Given that platform is known I would try removing the Intel NIC and running with only the Realtek NIC for a few days, see if it still crashes.
-
@stephenw10
Ok thanks for the advice. I will try that now and report back after a few days. If there are no crashes, then i guess ill blame the NIC. -
I took the NIC out and it has been running for 6+ hours with no issues. Ill let it run longer to make sure, but before it never made it past 2-1/2 hours without a kernel panic. Im guessing that the NIC was the issue. I guess i need to find a new NIC. I will report back tomorrow after i run it all night.
-
So ive just passed 24 hrs runnung without the NIC installed and no kernel panics. Im assuming the NIC was causing the problems. I guess the IBM Pro/1000 PT Quad NIC is not compatible, even though it is listed as a compatible device? I guess ill be looking for an i350-T4 as they seem to be the best.
Thanks for your help!
-
@farizno said in pfSense crashes randomly - new setup:
So ive just passed 24 hrs runnung without the NIC installed and no kernel panics. Im assuming the NIC was causing the problems. I guess the IBM Pro/1000 PT Quad NIC is not compatible, even though it is listed as a compatible device? I guess ill be looking for an i350-T4 as they seem to be the best.
Thanks for your help!
It could just be a faulty card and not compatibility issue.
-
Or some low level compatibility with that particular device.
Or a power or heat issue there with the expansion card.
-
@stephenw10 I do appreciate all the assistance. I will order an i350-T4 this week and report back after trying that card. Thanks again.
-
After installing an i350-T4 card, I can confirm that there have been no more kernel panics. I think this definitely points to an issue with the IBM/Intel PRO/1000 PT 82571EB/82571GB card that I have. I am not sure if the card is faulty and I don't really know how to test it. I guess I can try installing it in a Windows desktop PC that I have and see if it causes my desktop to crash, but the desktop that I have is connected on wireless (I don't have an ethernet drop near where it is located) so I am not sure if just having it installed will tell me if it functions properly.
Anyways, thanks for all the assistance stephenw10.
-
Yup, testing the card in a different host is really the only way to know for sure.
-