SG-5100 Kernel Panic/Boot Loop



  • Good day, everyone.

    I have an SG-5100 I received last week. Brand spankin' new. I booted it up on the workbench and did a config restore from my SG-3100. Everything was working fine, except I didn't have it connected to the network so it could pull down the packages to reinstall them. So I factory reset it again, racked it up in the closet, plugged it back in and powered it up so I could restore the config while it was connected and in place.

    When nothing happened for a while, I pulled it back down and hooked it up to the console. This is what I saw.

    https://pastebin.com/FerCZ94p

    [snip]
    Welcome to pfSense 2.4.4-RELEASE (Patch 3)...
    
    No core dumps found.
    ...ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/ipsec /usr/local/lib/perl5/5.26/mach/CORE
    32-bit compatibility ldconfig path:
    done.
    External config loader 1.0 is now starting... mmcsd0s1 mmcsd0s1a mmcsd0s1b
    Launching the init system...Updating CPU Microcode...
    CPU: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz (2200.07-MHz K8-class CPU)
      Origin="GenuineIntel"  Id=0x506f1  Family=0x6  Model=0x5f  Stepping=1
      Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
      Features2=0x4ff8ebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,RDRAND>
      AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
      AMD Features2=0x101<LAHF,Prefetch>
      Structured Extended Features=0x2294e283<FSGSBASE,TSCADJ,SMEP,ERMS,NFPUSG,MPX,PQE,RDSEED,SMAP,CLFLUSHOPT,PROCTRACE,SHA>
      Structured Extended Features3=0xac000400<IBPB,STIBP,ARCH_CAP,SSBD>
      XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
      IA32_ARCH_CAPS=0x69<RDCL_NO>
      VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
      TSC: P-state invariant, performance statistics
    Done.
    mode = 0100666, inum = 561816, fs = /
    panic: ffs_valloc: dup alloc
    cpuid = 3
    KDB: enter: panic
    [ thread pid 408 tid 100172 ]
    Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
    db:0:kdb.enter.default> textdump set
    textdump set
    db:0:kdb.enter.default>  capture on
    db:0:kdb.enter.default>  run lockinfo
    db:1:lockinfo> show locks
    No such command; use "help" to list available commands
    db:1:lockinfo>  show alllocks
    No such command; use "help" to list available commands
    db:1:lockinfo>  show lockedvnods
    Locked vnodes
    vnode 0xfffff800076603b0: tag ufs, type VDIR
        usecount 1, writecount 0, refcount 4 mountedhere 0
        flags (VI_ACTIVE)
        v_object 0xfffff800076a90f0 ref 0 pages 1 cleanbuf 0 dirtybuf 1
        lock type ufs: EXCL by thread 0xfffff80007720000 (pid 408, php-cgi, tid 100172)
    ino 561792, on dev ufsid/5d4890a17562fc55
    vnode 0xfffff800077b5000: tag ufs, type VREG
        usecount 1, writecount 0, refcount 1
        flags (VI_ACTIVE)
        lock type ufs: EXCL by thread 0xfffff80007720000 (pid 408, php-cgi, tid 100172)
    ino 561816, on dev ufsid/5d4890a17562fc55
    [snip]
    

    More in the pastebin. At the end, it reboots and does it all over again. Did I brick it? Can I save it?



  • i think you just need to launch a

    # /sbin/fsck -y /
    

    Filesystem is not clean, probably corrupted
    boot in single user mode and repeat that command until fsck neither finds nor fixes problems when run. Do not stop when it claims to have cleaned the filesystem after fixing an issue.



  • That did it all right. I saw that fsck had marked it clean in the original console output, so I assumed that couldn't be the problem.

    Thanks for the help!

    Snaps for kiokoman.
    alt text

    Enter full pathname of shell or RETURN for /bin/sh: uhub0: 8 ports with 8 removable, self powered
    
    # /sbin/fsck -y /
    ** /dev/ufsid/5d4890a17562fc55
    
    USE JOURNAL? yes
    
    ** SU+J Recovering /dev/ufsid/5d4890a17562fc55
    ** Reading 33554432 byte journal from inode 4.
    
    RECOVER? yes
    
    ** Building recovery table.
    ** Resolving unreferenced inode list.
    ** Processing journal entries.
    
    WRITE CHANGES? yes
    
    ** 39 journal records in 2560 bytes for 48.75% utilization
    ** Freed 0 inodes (0 dirs) 2 blocks, and 2 frags.
    
    ***** FILE SYSTEM MARKED CLEAN *****
    # /sbin/fsck -y /
    ** /dev/ufsid/5d4890a17562fc55
    
    USE JOURNAL? yes
    
    ** SU+J Recovering /dev/ufsid/5d4890a17562fc55
    Journal timestamp does not match fs mount time
    ** Skipping journal, falling through to full fsck
    
    ** Last Mounted on /
    ** Root file system
    ** Phase 1 - Check Blocks and Sizes
    ** Phase 2 - Check Pathnames
    ** Phase 3 - Check Connectivity
    ** Phase 4 - Check Reference Counts
    UNREF FILE I=561816  OWNER=root MODE=100666
    SIZE=0 MTIME=Aug 10 21:37 2019
    CLEAR? yes
    
    UNREF FILE I=561821  OWNER=root MODE=100666
    SIZE=0 MTIME=Aug 10 21:27 2019
    CLEAR? yes
    
    ** Phase 5 - Check Cyl groups
    FREE BLK COUNT(S) WRONG IN SUPERBLK
    SALVAGE? yes
    
    SUMMARY INFORMATION BAD
    SALVAGE? yes
    
    BLK(S) MISSING IN BIT MAPS
    SALVAGE? yes
    
    21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)
    
    ***** FILE SYSTEM IS CLEAN *****
    
    ***** FILE SYSTEM WAS MODIFIED *****
    # /sbin/fsck -y /
    ** /dev/ufsid/5d4890a17562fc55
    
    USE JOURNAL? yes
    
    ** SU+J Recovering /dev/ufsid/5d4890a17562fc55
    Journal timestamp does not match fs mount time
    ** Skipping journal, falling through to full fsck
    
    ** Last Mounted on /
    ** Root file system
    ** Phase 1 - Check Blocks and Sizes
    ** Phase 2 - Check Pathnames
    ** Phase 3 - Check Connectivity
    ** Phase 4 - Check Reference Counts
    ** Phase 5 - Check Cyl groups
    21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)
    
    ***** FILE SYSTEM IS CLEAN *****
    # /sbin/fsck -y /
    ** /dev/ufsid/5d4890a17562fc55
    
    USE JOURNAL? yes
    
    ** SU+J Recovering /dev/ufsid/5d4890a17562fc55
    Journal timestamp does not match fs mount time
    ** Skipping journal, falling through to full fsck
    
    ** Last Mounted on /
    ** Root file system
    ** Phase 1 - Check Blocks and Sizes
    ** Phase 2 - Check Pathnames
    ** Phase 3 - Check Connectivity
    ** Phase 4 - Check Reference Counts
    ** Phase 5 - Check Cyl groups
    21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)
    
    ***** FILE SYSTEM IS CLEAN *****
    # /sbin/fsck -y /
    ** /dev/ufsid/5d4890a17562fc55
    
    USE JOURNAL? yes
    
    ** SU+J Recovering /dev/ufsid/5d4890a17562fc55
    Journal timestamp does not match fs mount time
    ** Skipping journal, falling through to full fsck
    
    ** Last Mounted on /
    ** Root file system
    ** Phase 1 - Check Blocks and Sizes
    ** Phase 2 - Check Pathnames
    ** Phase 3 - Check Connectivity
    ** Phase 4 - Check Reference Counts
    ** Phase 5 - Check Cyl groups
    21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)
    
    ***** FILE SYSTEM IS CLEAN *****
    # /sbin/fsck -y /
    ** /dev/ufsid/5d4890a17562fc55
    
    USE JOURNAL? yes
    
    ** SU+J Recovering /dev/ufsid/5d4890a17562fc55
    Journal timestamp does not match fs mount time
    ** Skipping journal, falling through to full fsck
    
    ** Last Mounted on /
    ** Root file system
    ** Phase 1 - Check Blocks and Sizes
    ** Phase 2 - Check Pathnames
    ** Phase 3 - Check Connectivity
    ** Phase 4 - Check Reference Counts
    ** Phase 5 - Check Cyl groups
    21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)
    
    ***** FILE SYSTEM IS CLEAN *****
    # reboot
    Aug 11 21:47:48 init: single user shell terminated.
    Waiting (max 60 seconds) for system process `vnlru' to stop... done
    Waiting (max 60 seconds) for system process `bufdaemon' to stop... done
    Waiting (max 60 seconds) for system process `syncer' to stop...
    Syncing disks, vnodes remaining... 0 0 0 0 0 0 0 0 0 0 done
    All buffers synced.
    Uptime: 2m21s
    


  • You didn't say how you powered it down prior to re-racking it. You should never just unplug the power. Always use the shutdown command in the pfSense menu. Simply removing power is just about guaranteed to cause disk corruption.



  • @bmeeks

    I'm ashamed to say that's very likely what I did wrong. I'm usually more conscientious than that. I did a legitimate shutdown and halt before I put it back in the closet this time. How embarrassing.



  • No harm since you repaired the file system. These little Netgate appliances are still PCs at heart, so you have to shutdown them down gracefully even though they do sort of resemble a network switch. They are actively writing to the disk even if it is solid-state, unlike say a typical firmware-based switch which just reads from a ROM of some type.


Log in to reply