HA - Crash report - Need help to understand why

Puma

Hello,

I would like to know if you can analyse the crash report and help us to understand why the slave pfsense was crashed and why we had a downtime on our first pfsense and instability during 30 minutes period.

I explain, we have two pfsense configured in HA in the version 2.1.5 (I know this is an old version, we have a project to upgrade). Last week, we have a downtime of our production and so, our internet lines were down (fiber, VPN, VDSL) : the first pfsense had high load average : ~ 13 and the secondary pfsense was crashed with this crash report. We have shutdown the secondary and disable the SYNC (HA - pfsync) interface to bring back to the life the first pfsense.

Actually, these PFSENSE are virtualized with Proxmox and Intel e1000 network cards (we would like to upgrade in physical with the newest version but I have tested it and we have a problem with IPSEC and FTP).

So, can you help us ? Do you need more informations ?

Thanks.

crash_pfsense2.txt

jimp

Your disk and/or disk controller is shot.

A wipe and reload might help but it looks more like hardware to me because of the NMI trap there – that signal can only be generated from hardware.

If it was just a corrupted filesystem it would only have crashed in filesystem functions and it wouldn't have the NMI bits in the trace.

db:0:kdb.enter.default>  bt
Tracing pid 24734 tid 100230 td 0xc891e5c0
bcopy(2,eeb32924,c0e8f7ba,c62ee600,0,...) at bcopy+0x1a
ipi_nmi_handler(c62ee600,0,c0f92f98,eeb32a40,c891a000,...) at ipi_nmi_handler+0x2c
trap(eeb32930) at trap+0x26a
calltrap() at calltrap+0x6
--- trap 0x13, eip = 0xc0eaded0, esp = 0xeeb32970, ebp = 0xeeb32970 ---
VOP_ISLOCKED_APV(c1502c60,eeb329e0,c0fa12dd,1f8,eeb329c0,...) at VOP_ISLOCKED_APV+0x20
lookup(eeb32b8c,c62d1000,400,eeb32bac,c0d48dd6,...) at lookup+0x3fa
namei(eeb32b8c,c14eca80,eeb32af8,0,eeb32ac4,...) at namei+0x5b8
vn_open_cred(eeb32b8c,eeb32c40,1a4,0,c5d8f700,...) at vn_open_cred+0xc0
vn_open(eeb32b8c,eeb32c40,1a4,c8935620,c1d8aaf8,...) at vn_open+0x3b
kern_openat(c891e5c0,ffffff9c,2ccc05ec,0,602,...) at kern_openat+0x11e
kern_open(c891e5c0,2ccc05ec,0,601,1b6,...) at kern_open+0x35
open(c891e5c0,eeb32cec,eeb32cc0,c0ac9a76,c155c734,...) at open+0x30
syscall(eeb32d28) at syscall+0x1fb
Xint0x80_syscall() at Xint0x80_syscall+0x21

ata1: WARNING - READ_TOC read data overrun 18>12

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address       = 0x1f4
fault code                           = supervisor read, page not present
instruction pointer          = 0x20:0xc0a93746
stack pointer             = 0x28:0xc5a2abbc
frame pointer           = 0x28:0xc5a2abd4
code segment                   = base 0x0, limit 0xfffff, type 0x1b
                                               = DPL 0, pres 1, def32 1, gran 1
processor eflags               = interrupt enabled, resume, IOPL = 0
current process                = 12 (swi6: task queue)

0xc680a860: tag ufs, type VDIR
    usecount 1, writecount 0, refcount 4 mountedhere 0
    flags ()
    v_object 0xc6752770 ref 0 pages 1
    lock type ufs: EXCL by thread 0xc85322e0 (pid 53831)
                ino 3933184, on dev ad0s1a

0xc8676000: tag ufs, type VREG
    usecount 1, writecount 0, refcount 1 mountedhere 0
    flags ()
    lock type ufs: EXCL by thread 0xc85322e0 (pid 53831)
                ino 3933374, on dev ad0s1a
version.txt06000021612773423343  7622 ustarrootwheelFreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:25:41 EDT 2014
    root@pf2_1_1_i386.pfsense.org:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_SMP.8

Puma

Sorry i don't really understand your answer (and English isn't my native language). Is there a problem with the hard drive ? I must check it ?

jimp

A problem with the hard drive or possibly the disk controller itself on the motherboard (where the drive is plugged in)

I'm not sure if proxmox is smart enough to generate an NMI on its own for things like that, so it may be passed through from the actual hardware.

There is a chance it's something in proxmox or the host itself, but someone more familiar with proxmox would have to chime in and answer that part.