Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    HA - Crash report - Need help to understand why

    Scheduled Pinned Locked Moved General pfSense Questions
    4 Posts 2 Posters 766 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      Puma
      last edited by

      Hello,

      I would like to know if you can analyse the crash report and help us to understand why the slave pfsense was crashed and why we had a downtime on our first pfsense and instability during 30 minutes period.

      I explain, we have two pfsense configured in HA in the version 2.1.5 (I know this is an old version, we have a project to upgrade). Last week, we have a downtime of our production and so, our internet lines were down (fiber, VPN, VDSL) : the first pfsense had high load average : ~ 13 and the secondary pfsense was crashed with this crash report. We have shutdown the secondary and disable the SYNC (HA - pfsync) interface to bring back to the life the first pfsense.

      Actually, these PFSENSE are virtualized with Proxmox and Intel e1000 network cards  (we would like to upgrade in physical with the newest version but I have tested it and we have a problem with IPSEC and FTP).

      So, can you help us ? Do you need more informations ?

      Thanks.

      crash_pfsense2.txt

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        Your disk and/or disk controller is shot.

        A wipe and reload might help but it looks more like hardware to me because of the NMI trap there – that signal can only be generated from hardware.

        If it was just a corrupted filesystem it would only have crashed in filesystem functions and it wouldn't have the NMI bits in the trace.

        db:0:kdb.enter.default>  bt
        Tracing pid 24734 tid 100230 td 0xc891e5c0
        bcopy(2,eeb32924,c0e8f7ba,c62ee600,0,...) at bcopy+0x1a
        ipi_nmi_handler(c62ee600,0,c0f92f98,eeb32a40,c891a000,...) at ipi_nmi_handler+0x2c
        trap(eeb32930) at trap+0x26a
        calltrap() at calltrap+0x6
        --- trap 0x13, eip = 0xc0eaded0, esp = 0xeeb32970, ebp = 0xeeb32970 ---
        VOP_ISLOCKED_APV(c1502c60,eeb329e0,c0fa12dd,1f8,eeb329c0,...) at VOP_ISLOCKED_APV+0x20
        lookup(eeb32b8c,c62d1000,400,eeb32bac,c0d48dd6,...) at lookup+0x3fa
        namei(eeb32b8c,c14eca80,eeb32af8,0,eeb32ac4,...) at namei+0x5b8
        vn_open_cred(eeb32b8c,eeb32c40,1a4,0,c5d8f700,...) at vn_open_cred+0xc0
        vn_open(eeb32b8c,eeb32c40,1a4,c8935620,c1d8aaf8,...) at vn_open+0x3b
        kern_openat(c891e5c0,ffffff9c,2ccc05ec,0,602,...) at kern_openat+0x11e
        kern_open(c891e5c0,2ccc05ec,0,601,1b6,...) at kern_open+0x35
        open(c891e5c0,eeb32cec,eeb32cc0,c0ac9a76,c155c734,...) at open+0x30
        syscall(eeb32d28) at syscall+0x1fb
        Xint0x80_syscall() at Xint0x80_syscall+0x21
        
        
        ata1: WARNING - READ_TOC read data overrun 18>12
        
        Fatal trap 12: page fault while in kernel mode
        cpuid = 0; apic id = 00
        fault virtual address       = 0x1f4
        fault code                           = supervisor read, page not present
        instruction pointer          = 0x20:0xc0a93746
        stack pointer             = 0x28:0xc5a2abbc
        frame pointer           = 0x28:0xc5a2abd4
        code segment                   = base 0x0, limit 0xfffff, type 0x1b
                                                       = DPL 0, pres 1, def32 1, gran 1
        processor eflags               = interrupt enabled, resume, IOPL = 0
        current process                = 12 (swi6: task queue)
        
        0xc680a860: tag ufs, type VDIR
            usecount 1, writecount 0, refcount 4 mountedhere 0
            flags ()
            v_object 0xc6752770 ref 0 pages 1
            lock type ufs: EXCL by thread 0xc85322e0 (pid 53831)
                        ino 3933184, on dev ad0s1a
        
        0xc8676000: tag ufs, type VREG
            usecount 1, writecount 0, refcount 1 mountedhere 0
            flags ()
            lock type ufs: EXCL by thread 0xc85322e0 (pid 53831)
                        ino 3933374, on dev ad0s1a
        version.txt06000021612773423343  7622 ustarrootwheelFreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:25:41 EDT 2014
            root@pf2_1_1_i386.pfsense.org:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_SMP.8
        

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • P
          Puma
          last edited by

          Sorry i don't really understand your answer (and English isn't my native language). Is there a problem with the hard drive ? I must check it ?

          1 Reply Last reply Reply Quote 0
          • jimpJ
            jimp Rebel Alliance Developer Netgate
            last edited by

            A problem with the hard drive or possibly the disk controller itself on the motherboard (where the drive is plugged in)

            I'm not sure if proxmox is smart enough to generate an NMI on its own for things like that, so it may be passed through from the actual hardware.

            There is a chance it's something in proxmox or the host itself, but someone more familiar with proxmox would have to chime in and answer that part.

            Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

            Need help fast? Netgate Global Support!

            Do not Chat/PM for help!

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.