Crash Report

stephenw10

You can upload the crash report file here: https://nc.netgate.com/nextcloud/s/m8t7sSzGHLRW5ji

stephenw10

Sorry I didn't realise you had uploaded this.

Backtrace:

db:0:kdb.enter.default>  bt
Tracing pid 65762 tid 100256 td 0xfffffe00fbb20000
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00fa39c5f0
vpanic() at vpanic+0x163/frame 0xfffffe00fa39c720
panic() at panic+0x43/frame 0xfffffe00fa39c780
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00fa39c7e0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00fa39c840
calltrap() at calltrap+0x8/frame 0xfffffe00fa39c840
--- trap 0xc, rip = 0xffffffff80e5fc25, rsp = 0xfffffe00fa39c910, rbp = 0xfffffe00fa39caf0 ---
sysctl_iflist() at sysctl_iflist+0x2b5/frame 0xfffffe00fa39caf0
sysctl_rtsock() at sysctl_rtsock+0x2fb/frame 0xfffffe00fa39cbd0
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x90/frame 0xfffffe00fa39cc20
sysctl_root() at sysctl_root+0x216/frame 0xfffffe00fa39cca0
userland_sysctl() at userland_sysctl+0x176/frame 0xfffffe00fa39cd50
sys___sysctl() at sys___sysctl+0x5c/frame 0xfffffe00fa39ce00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00fa39cf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00fa39cf30
--- syscall (202, FreeBSD ELF64, __sysctl), rip = 0x82247826a, rsp = 0x820656178, rbp = 0x8206561b0 ---

Panic:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address	= 0xffffea0055c55000
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80e5fc25
stack pointer	        = 0x28:0xfffffe00fa39c910
frame pointer	        = 0x28:0xfffffe00fa39caf0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 65762 (darkstat)
rdi: fffffe00fa39ccc0 rsi: 0000000000000000 rdx: 0000000000000ae0
rcx: 00000000000009f8  r8: 00000000667dedf3  r9: 00000000667dedf3
rax: 0000000000000000 rbx: fffffe00fa39cb68 rbp: fffffe00fa39caf0
r10: 000000000007cf53 r11: fffffe00fbb20520 r12: ffffea0055c55000
r13: 00000000000000e8 r14: 0000000000000000 r15: fffff8001584dd00
trap number		= 12
panic: page fault
cpuid = 1
time = 1719551956
KDB: enter: panic

So it looks like it's in darkstat while trying to use sysctl to see available interfaces.

Do you see the same panic/backtrace on several crash reports?

Try disabling darkstat.

hassling

118>Bootup complete

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x100000000012
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80fa3cfb
stack pointer = 0x28:0xfffffe00c5975e60
frame pointer = 0x28:0xfffffe00c5975e80
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 7 (pf purge)
rdi: 0000100000000000 rsi: ffffffff84010000 rdx: 0000000000000180
rcx: fffffe00db3d83a0 r8: 0000000000000004 r9: ffffffff84013000
rax: 0000000000000000 rbx: 0000000000064071 rbp: fffffe00c5975e80
r10: 000000000000000f r11: 0000000081b3bc6d r12: fffffe00deee01c8
r13: 0000100000000000 r14: fffffe00deee01a8 r15: 00000000000025d7
trap number = 12
panic: page fault
cpuid = 3
time = 1721232192
KDB: enter: panic
��panic.txt��0600��0��0��12��14645765500� 7145� ��ustar��root��wheel��page fault��version.txt��0600��0��0��457��14645765500� 7635� ��ustar��root��wheel��FreeBSD 14.0-CURRENT amd64 1400094 #1 RELENG_2_7_2-n255948-8d2b56da39c: Wed Dec 6 20:45:47 UTC 2023
root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/obj/amd64/StdASW5b/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/sources/FreeBSD-src-RELENG_2_7_2/amd64.amd64/sys/pfSense

patient0

@hassling under no circumstances provide more information :).

But more serious: is it the same system you opened the last thread?

If yes did you do what @stephenw10 recommended and disabled darkstat?

stephenw10

Yup need to see the backtrace. However that appears to be different.

If you have a number of crash reports and they are all different it's probably a hardware issue.

hassling

Body:

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x100000000012
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80fa3cfb
stack pointer = 0x28:0xfffffe00c5975e60
frame pointer = 0x28:0xfffffe00c5975e80
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 7 (pf purge)
rdi: 0000100000000000 rsi: ffffffff84010000 rdx: 0000000000000180
rcx: fffffe00db3d83a0 r8: 0000000000000004 r9: ffffffff84013000
rax: 0000000000000000 rbx: 0000000000064071 rbp: fffffe00c5975e80
r10: 000000000000000f r11: 0000000081b3bc6d r12: fffffe00deee01c8
r13: 0000100000000000 r14: fffffe00deee01a8 r15: 00000000000025d7
trap number = 12
panic: page fault

stephenw10

Still need to see a backtrace. And posting new threads just confuses everyone.

How many crash reports do you have? We need to compare multiple crashes.

hassling

@stephenw10
Sorry for the confusion, yes it is the same system and yes and disabled darkstat. Multiple crashes occur and continues.

stephenw10

Ok, can you upload any further crash reports to the Nextcloud link?

Or just give us the backtraces here to compare?

hassling

@patient0 I did. Still happening. I bought this Topton pfSense router from AliExpress two months ago.

hassling

@stephenw10 I'll post backtraces later as I am not home. Thank you

hassling

@stephenw10 uploaded.

Konstanti

@hassling

This line
current process = 7 (pf purge)
indicates that the error occurs at the pfSense boot stage when initializing the PF kernel module.
I would recommend to install the system again first

It is possible that there is some kind of problem at the hardware level

hassling

@Konstanti The last two differnet codes happened after I put an external USB fan (plugged to device the wall) under the router as per seller recommendation to dispense heat. Recorded temperature at 4 cores reach 62 Celsius though Pfsense occasionally throughs 70 Celsius.

Correlation vs causation? I am just trying to exclude.

stephenw10

Did the fan bring down the CPU temps? On an N100 that's normally passively cooled any fan should have a dramatic effect. If it doesn't I would check the CPU is correctly attached to the heasink/case, has thermal paste etc.

There are 4 distinct crashes there. One each in darkstat and netstat and all the others in pfctl and pfpurge which are very similar:

db:0:kdb.enter.default>  show pcpu
cpuid        = 3
dynamic pcpu = 0xfffffe009d4fff80
curthread    = 0xfffffe0020552740: pid 7 tid 100113 critnest 1 "pf purge"
curpcb       = 0xfffffe0020552c60
fpcurthread  = none
idlethread   = 0xfffffe002047fe40: tid 100006 "idle: cpu3"
self         = 0xffffffff84013000
curpmap      = 0xffffffff83020ab0
tssp         = 0xffffffff84013384
rsp0         = 0xfffffe00c5809000
kcr3         = 0xffffffffffffffff
ucr3         = 0xffffffffffffffff
scr3         = 0x0
gs32p        = 0xffffffff84013404
ldt          = 0xffffffff84013444
tss          = 0xffffffff84013434
curvnet      = 0xfffff80001240440
db:0:kdb.enter.default>  bt
Tracing pid 7 tid 100113 td 0xfffffe0020552740
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c5808b40
vpanic() at vpanic+0x163/frame 0xfffffe00c5808c70
panic() at panic+0x43/frame 0xfffffe00c5808cd0
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00c5808d30
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00c5808d90
calltrap() at calltrap+0x8/frame 0xfffffe00c5808d90
--- trap 0xc, rip = 0xffffffff80fa3cfb, rsp = 0xfffffe00c5808e60, rbp = 0xfffffe00c5808e80 ---
pf_state_expires() at pf_state_expires+0xb/frame 0xfffffe00c5808e80
pf_purge_expired_states() at pf_purge_expired_states+0xd5/frame 0xfffffe00c5808ec0
pf_purge_thread() at pf_purge_thread+0x13b/frame 0xfffffe00c5808ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe00c5808f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c5808f30
--- trap 0xe8e0e8e0, rip = 0x3cd03cdd431d431, rsp = 0x186b186b6af46af4, rbp = 0x5fc25fc2d476d476 ---

db:0:kdb.enter.default>  show pcpu
cpuid        = 1
dynamic pcpu = 0xfffffe009d4e1f80
curthread    = 0xfffffe00fdaee3a0: pid 36100 tid 100648 critnest 1 "pfctl"
curpcb       = 0xfffffe00fdaee8c0
fpcurthread  = 0xfffffe00fdaee3a0: pid 36100 "pfctl"
idlethread   = 0xfffffe0020480c80: tid 100004 "idle: cpu1"
self         = 0xffffffff84011000
curpmap      = 0xfffff800095f9ad0
tssp         = 0xffffffff84011384
rsp0         = 0xfffffe00fcdb3000
kcr3         = 0xffffffffffffffff
ucr3         = 0xffffffffffffffff
scr3         = 0x0
gs32p        = 0xffffffff84011404
ldt          = 0xffffffff84011444
tss          = 0xffffffff84011434
curvnet      = 0xfffff80001240440
db:0:kdb.enter.default>  bt
Tracing pid 36100 tid 100648 td 0xfffffe00fdaee3a0
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00fcdb23e0
vpanic() at vpanic+0x163/frame 0xfffffe00fcdb2510
panic() at panic+0x43/frame 0xfffffe00fcdb2570
trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00fcdb25d0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00fcdb2630
calltrap() at calltrap+0x8/frame 0xfffffe00fcdb2630
--- trap 0xc, rip = 0xffffffff80fc1b62, rsp = 0xfffffe00fcdb2700, rbp = 0xfffffe00fcdb2be0 ---
pfioctl() at pfioctl+0x2ca2/frame 0xfffffe00fcdb2be0
devfs_ioctl() at devfs_ioctl+0xcc/frame 0xfffffe00fcdb2c30
vn_ioctl() at vn_ioctl+0xcf/frame 0xfffffe00fcdb2ca0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00fcdb2cc0
kern_ioctl() at kern_ioctl+0x255/frame 0xfffffe00fcdb2d30
sys_ioctl() at sys_ioctl+0x123/frame 0xfffffe00fcdb2e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00fcdb2f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00fcdb2f30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x929772e79ca, rsp = 0x929740778a8, rbp = 0x92974077920 ---

All of those crashes are after bootup completes.

Nothing else significant is shown. How long does it take to crash after booting?

stephenw10

Similar to this: https://redmine.pfsense.org/issues/13417

hassling

@stephenw10 random between 1-2 days. However, when I removed the fan the system was stable for 4 days. I put the fan and connected it throught a USB on wall outlet within two days two crashes and froze once.

Now I removed the fan completely and I am watching. Next I'll check the thermal paste and update.

Any chance how clear darkstat remnants files or other packages (I already removed them)?

hassling

@stephenw10 Yes, the fan brought the temperature down to 40 Celsius.

Anything I can do to clear darkstats/ netstat remnants cache files?

stephenw10

darkstat was the 1st crash like 20days ago (1719551956). The netstat crash was the 5th of July (1720211173).

Hard to imagine the fan caused a crash. Especially if it wasn't powered from the device itself.

hassling

@stephenw10 A little update since last crash July 17th, no more crashes after removing the external (or USB) fan.

I am now inclined to believe that trying to cool the processor is the culprit for the crash. The processor seems to be very sensitive to temp change.

Hope this will help someone else. I'll post here if anything changes.

Thank you for your insight.