2.5.2 Crashing Every Few Weeks
-
Every few weeks, My PFsense 2.5.2 Supermicro E300-8D server crashes. I cannot upgrade to 2.6.0 because of the known issue where Intel NICs can't tag VLAN 0. Since I have AT&T fiber and use pfatt.sh, this is a barrier prohibiting me from upgrading to 2.6.0.
I have a complex config for home use, but the box handles it well. Temps appear good, and the hardware is fairly new.
System logs aren't indicative of an issue that I can see. I've attached the crash report to see if wiser minds have any insight here that I can't find.
-
Backtrace:
db:0:kdb.enter.default> bt Tracing pid 34 tid 100305 td 0xfffff8000e287000 kdb_enter() at kdb_enter+0x37/frame 0xfffffe00a3443720 vpanic() at vpanic+0x197/frame 0xfffffe00a3443770 panic() at panic+0x43/frame 0xfffffe00a34437d0 trap_fatal() at trap_fatal+0x391/frame 0xfffffe00a3443830 trap() at trap+0x67/frame 0xfffffe00a3443940 calltrap() at calltrap+0x8/frame 0xfffffe00a3443940 --- trap 0x9, rip = 0xffffffff8137fe96, rsp = 0xfffffe00a3443a10, rbp = 0xfffffe00a3443a10 --- memcpy_erms() at memcpy_erms+0x106/frame 0xfffffe00a3443a10 abd_copy_off() at abd_copy_off+0xbd/frame 0xfffffe00a3443a80 zio_ready() at zio_ready+0x10f/frame 0xfffffe00a3443ad0 zio_execute() at zio_execute+0xac/frame 0xfffffe00a3443b20 taskqueue_run_locked() at taskqueue_run_locked+0x144/frame 0xfffffe00a3443b80 taskqueue_thread_loop() at taskqueue_thread_loop+0xb6/frame 0xfffffe00a3443bb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe00a3443bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00a3443bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Something in ZFS maybe....
Panic:Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff8137fe96 stack pointer = 0x28:0xfffffe00a3443a10 frame pointer = 0x28:0xfffffe00a3443a10 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 34 (zio_write_issue_5) trap number = 9 panic: general protection fault cpuid = 0 time = 1662585067 KDB: enter: panic
Nothing obvious.
Has it always done this?Steve
-
Thanks for checking. Yeah, its done this every 3-5 weeks on this hardware, and it also did it on my Asus RS200-E9 server I used previously. I imported the config I created on the Asus to the newer Supermicro unit.
I have about 15 VLANs, 3 Mullvad VPN clients in HA, but the crash never happens under load that I'm aware of. The crash today happened when the system was fairly idle. Not much traffic or inter-VLAN routing happening then.
-
Both those systems were 2.5.2? Has it shown the same behaviour in either system in any other pfSense version?
-
Correct, both systems were 2.5.2 to the best of my recollection. I believe as soon as 2.5.2 was out and proven to be working with pfatt.sh, I upgraded to it, and have been on it ever since. Both systems, with the same version, have had this issue. I may have had it on 2.5.0, but I could be misremembering. That feels like so long ago.
-
Hmm, the only thing this looks like is an issue we had before 2.5.2 was released where pfctl was bogging and exhausting the RAM triggering a panic in ZFS. But to trigger that we had to deliberately use very low memory systems and this has 32GB so.... that seems unlikely!
However check the memory usage history in Status > Monitoring.