Help Understanding a Crash [kernel panic]
-
@bmeeks
My mistake, there's no netmap_ring_reinit this time:<6>arp: 192.168.0.30 moved from 00:1d:60:7d:8c:61 to 00:05:4b:04:5e:7c on igb1 <6>arp: 192.168.0.39 moved from 00:22:15:6c:eb:96 to 00:e0:53:0b:40:f8 on igb1 <6>pid 18785 (grep), jid 0, uid 0: exited on signal 10 (core dumped) <6>igb1: link state changed to DOWN <6>igb1: link state changed to UP <6>igb1: link state changed to DOWN <6>igb1: link state changed to UP <6>igb1: link state changed to DOWN <6>igb1: link state changed to UP <6>igb1: link state changed to DOWN <6>igb1: link state changed to UP <6>arp: 192.168.0.30 moved from f4:6d:04:e4:84:54 to 00:05:4b:04:5e:7c on igb1 <6>igb1: promiscuous mode disabled <6>igb1: promiscuous mode enabled <6>igb1: link state changed to DOWN <6>igb1: link state changed to UP <6>igb1: link state changed to DOWN <6>igb1: link state changed to UP Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0xffff fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff83cf7396 stack pointer = 0x28:0xfffffe0089906ac0 frame pointer = 0x28:0xfffffe0089906ac0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 19 (dp_zil_clean_taskq_) trap number = 12 panic: page fault cpuid = 2 time = 1630329038 KDB: enter: panic
So as you said, this is not related at all with Suricata. netmap is out.
Ok, so in this log there'r "promiscuous mode" changes, and even if that is normal, I uninstalled darkstat anyway.
Now the only other pkg installed is pfBlockerNG-devel. Let's see if my system will crash again in the next days....
Thanks.
-
You have the backtrace from the crash report?
-
@stephenw10 Hi! Yes, here: ddb.txt
-
Mmm, well very similar but not identical:
db:0:kdb.enter.default> bt Tracing pid 40766 tid 100593 td 0xfffff8013f829740 kdb_enter() at kdb_enter+0x37/frame 0xfffffe00910f4530 vpanic() at vpanic+0x197/frame 0xfffffe00910f4580 panic() at panic+0x43/frame 0xfffffe00910f45e0 trap_fatal() at trap_fatal+0x391/frame 0xfffffe00910f4640 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00910f4690 trap() at trap+0x286/frame 0xfffffe00910f47a0 calltrap() at calltrap+0x8/frame 0xfffffe00910f47a0 --- trap 0xc, rip = 0xffffffff8120874b, rsp = 0xfffffe00910f4870, rbp = 0xfffffe00910f4880 --- vm_radix_remove() at vm_radix_remove+0x1b/frame 0xfffffe00910f4880 vm_page_free_prep() at vm_page_free_prep+0x55/frame 0xfffffe00910f48a0 vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfffffe00910f48d0 vm_object_page_remove() at vm_object_page_remove+0x61/frame 0xfffffe00910f4930 vm_map_entry_delete() at vm_map_entry_delete+0x104/frame 0xfffffe00910f4980 vm_map_delete() at vm_map_delete+0x184/frame 0xfffffe00910f49e0 vm_map_remove() at vm_map_remove+0xab/frame 0xfffffe00910f4a10 vmspace_exit() at vmspace_exit+0xcb/frame 0xfffffe00910f4a50 exit1() at exit1+0x55b/frame 0xfffffe00910f4ab0 sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe00910f4ac0 amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe00910f4bf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00910f4bf0 --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x800c2a00a, rsp = 0x7fffffffec38, rbp = 0x7fffffffec50 ---
What is that running on?
-
-
@none-0 said in Help Understanding a Crash [kernel panic]:
What is that running on?
Sorry, what you mean?
I suspect he means what type of hardware -- Netgate appliance (and if so, which model, as different models have different CPU families) or generic Intel/AMD hardware.
-
It's a generic Intel box:
CPU: i3-4170
RAM: Kingston 2x8GB
Mobo: Asus (don't remember the model, but I can check)
Network adp: Intel I350-T4V2
-
Mmm, well I would be running a RAM test there when you can to be sure it's not hardware issue.
Though it seems far too similar to be a RAM error which is usually pretty random. -
@stephenw10 I think I ll buy a new stick... Memory tests work sometimes, but for intermittent problems I would possible need to run them for days...
I can do the tests with more time, and use them elsewhere if happens to be no problem with.A single module would do the trick, or dual channel benefits pfsense? I mean, it won't use the bandwidth, but latency is better in dual too... What do you guys think?
-
Unlikely to make much difference IMO. For a test it doesn't really matter anyway.
Steve
-
Hello,
Just to update about the crashs: they didn't happen again.
Also, I've being using Suricata 6.0.3 release since than, and no netmap issuesSo, I changed my RAM, and tested the old ones:
24H of MemTest86+ and at least 5hrs of GoldMemory (not the best tests, but still), resulted in not a single red flag for them (tested individually), AND I'm using them on other Win machines withouth BSOD or anything in the logs.I already saw RAM tests failing to detect problems, so based on what you explained, I'm assuming that both 1 - the issue with Suricata's Multithreading ring access, and 2 - darkstat, were hitting some intermittent problem, that I could not with tests and other OS.
Anyway, thank you for helping me out solving this. Really appreciate @stephenw10 and @bmeeks !
-