pfSense v.2.6 crashes and reboot

mauro.tridici

Dear pfSense experts/developers,

during the last month, pfSense suddenly crashed and rebooted two times.
Nothing changed during this period.

I read that, in this case, I should share the crash report here.
Could you please help me to understand the cause of this issue?
You can find the dump files in attachment.

Many thanks in advance,
Mauro

info.0 textdump.tar.0

stephenw10

Backtrace:

db:0:kdb.enter.default>  bt
Tracing pid 84479 tid 100662 td 0xfffff80309ee7000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe0094f3b860
vpanic() at vpanic+0x197/frame 0xfffffe0094f3b8b0
panic() at panic+0x43/frame 0xfffffe0094f3b910
pmap_remove_pages() at pmap_remove_pages+0xa1d/frame 0xfffffe0094f3ba10
vmspace_exit() at vmspace_exit+0x9e/frame 0xfffffe0094f3ba50
exit1() at exit1+0x55b/frame 0xfffffe0094f3bab0
sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe0094f3bac0
amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe0094f3bbf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0094f3bbf0
--- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8004095fa, rsp = 0x7fffffffebe8, rbp = 0x7fffffffec00 ---

Panics:

panic: bad pte va 80c200000 pte 80000002be20ac28
cpuid = 2
time = 1690324290
KDB: enter: panic

panic: bad pte va 800f5c000 pte 0
cpuid = 4
time = 1692058926
KDB: enter: panic

Hmm, I would say that's likely a hardware error...excpet it's in VMWare. Has the hypervisor been updated in that time? What version of ESXi is it?

The only error shown other than the panic is this:

(da0:mpt0:0:0:0): UNMAP failed, disabling BIO_DELETE
(da0:mpt0:0:0:0): UNMAP. CDB: 42 00 00 00 00 00 00 00 08 00 
(da0:mpt0:0:0:0): CAM status: SCSI Status Error
(da0:mpt0:0:0:0): SCSI status: Check Condition
(da0:mpt0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:24,0 (Invalid field in CDB)
(da0:mpt0:0:0:0): Command byte 7 is invalid
(da0:mpt0:0:0:0): Error 22, Unretryable error

Which looks like a drive error. Except again it's in VMWare...

Steve

Stewart

@stephenw10 A bad bit in hardware, if it is in the right place, could also affect the vmdk file. I would suspect that bit would be unreadable in the vmfs and get passed on. Could possibly still be a drive or controller error just getting passed up the stack.

mauro.tridici

Hi Stephen,

thank you for your support.

@stephenw10 said in pfSense v.2.6 crashes and reboot:

Hmm, I would say that's likely a hardware error...excpet it's in VMWare. Has the hypervisor been updated in that time? What version of ESXi is it?

No, the hypervisor hasn't been updated during that period.
The version of ESXi is 6.7 u3

I'll check the status of drives and controller and I will let you know.
Thanks,
Mauro

mauro.tridici

@Stewart thank you for the additional info.

I just checked the status of drives and controller from the server management GUI, but it seems everything is ok.
No lines has been recently added to the logs page of the server.

It is very strange, I don't know how to manage it.

Mauro

stephenw10

Unfortunately none of that crash data is very revealing. Are those the only crashes it's seen?

mauro.tridici

@stephenw10 it happened again some minutes ago.

No CPU overload, no hard issues on controller and drives...
I'm still not able to understand where is the cause...

info.0 textdump.tar.0

stephenw10

Hmm, different crash but still nothing specific.

Fatal trap 9: general protection fault while in kernel mode
cpuid = 6; apic id = 0c
instruction pointer	= 0x20:0xffffffff80d6f3f7
stack pointer	        = 0x28:0xfffffe000455f680
frame pointer	        = 0x28:0xfffffe000455f700
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 28 (dom0)
trap number		= 9
panic: general protection fault
cpuid = 6
time = 1692624634
KDB: enter: panic

db:0:kdb.enter.default>  bt
Tracing pid 28 tid 100208 td 0xfffff800090ed000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe000455f390
vpanic() at vpanic+0x197/frame 0xfffffe000455f3e0
panic() at panic+0x43/frame 0xfffffe000455f440
trap_fatal() at trap_fatal+0x391/frame 0xfffffe000455f4a0
trap() at trap+0x67/frame 0xfffffe000455f5b0
calltrap() at calltrap+0x8/frame 0xfffffe000455f5b0
--- trap 0x9, rip = 0xffffffff80d6f3f7, rsp = 0xfffffe000455f680, rbp = 0xfffffe000455f700 ---
__mtx_lock_sleep() at __mtx_lock_sleep+0xd7/frame 0xfffffe000455f700
pmap_ts_referenced() at pmap_ts_referenced+0xc63/frame 0xfffffe000455f7b0
vm_pageout_worker() at vm_pageout_worker+0xf88/frame 0xfffffe000455fb70
vm_pageout() at vm_pageout+0x193/frame 0xfffffe000455fbb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe000455fbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000455fbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

Is there some reason you're not on 2.7?

You should probably stop logging those ARP movements if those MACs are known.

mauro.tridici

@stephenw10 thank you for the analysis.

I'm still at 2.7 because pfsense is in production and we need to be sure that the update will not cause any issue...
I'm at 2.6...do you think that I can update to 2.7 without impacting the existing services (syslog-ng, snort, pfblocker-ng, iperf, and so on)?

In addition, I noticed that some installed package names are in yellow.

Screenshot 2023-08-21 at 17.51.00.png

Sorry, but I didn't understand your last sentence:
"You should probably stop logging those ARP movements if those MACs are known."

What does I need to do in this case?

Thank you in advance,
Mauro

stephenw10

https://docs.netgate.com/pfsense/en/latest/troubleshooting/logs-arp-moved.html