SG-5100 Kernel Panic/Boot Loop

JoeDiffieHellman

Good day, everyone.

I have an SG-5100 I received last week. Brand spankin' new. I booted it up on the workbench and did a config restore from my SG-3100. Everything was working fine, except I didn't have it connected to the network so it could pull down the packages to reinstall them. So I factory reset it again, racked it up in the closet, plugged it back in and powered it up so I could restore the config while it was connected and in place.

When nothing happened for a while, I pulled it back down and hooked it up to the console. This is what I saw.

https://pastebin.com/FerCZ94p

[snip]
Welcome to pfSense 2.4.4-RELEASE (Patch 3)...

No core dumps found.
...ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/ipsec /usr/local/lib/perl5/5.26/mach/CORE
32-bit compatibility ldconfig path:
done.
External config loader 1.0 is now starting... mmcsd0s1 mmcsd0s1a mmcsd0s1b
Launching the init system...Updating CPU Microcode...
CPU: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz (2200.07-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x506f1  Family=0x6  Model=0x5f  Stepping=1
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x4ff8ebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,RDRAND>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x101<LAHF,Prefetch>
  Structured Extended Features=0x2294e283<FSGSBASE,TSCADJ,SMEP,ERMS,NFPUSG,MPX,PQE,RDSEED,SMAP,CLFLUSHOPT,PROCTRACE,SHA>
  Structured Extended Features3=0xac000400<IBPB,STIBP,ARCH_CAP,SSBD>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  IA32_ARCH_CAPS=0x69<RDCL_NO>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
  TSC: P-state invariant, performance statistics
Done.
mode = 0100666, inum = 561816, fs = /
panic: ffs_valloc: dup alloc
cpuid = 3
KDB: enter: panic
[ thread pid 408 tid 100172 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db:0:kdb.enter.default> textdump set
textdump set
db:0:kdb.enter.default>  capture on
db:0:kdb.enter.default>  run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo>  show alllocks
No such command; use "help" to list available commands
db:1:lockinfo>  show lockedvnods
Locked vnodes
vnode 0xfffff800076603b0: tag ufs, type VDIR
    usecount 1, writecount 0, refcount 4 mountedhere 0
    flags (VI_ACTIVE)
    v_object 0xfffff800076a90f0 ref 0 pages 1 cleanbuf 0 dirtybuf 1
    lock type ufs: EXCL by thread 0xfffff80007720000 (pid 408, php-cgi, tid 100172)
ino 561792, on dev ufsid/5d4890a17562fc55
vnode 0xfffff800077b5000: tag ufs, type VREG
    usecount 1, writecount 0, refcount 1
    flags (VI_ACTIVE)
    lock type ufs: EXCL by thread 0xfffff80007720000 (pid 408, php-cgi, tid 100172)
ino 561816, on dev ufsid/5d4890a17562fc55
[snip]

More in the pastebin. At the end, it reboots and does it all over again. Did I brick it? Can I save it?

kiokoman

i think you just need to launch a

# /sbin/fsck -y /

Filesystem is not clean, probably corrupted
boot in single user mode and repeat that command until fsck neither finds nor fixes problems when run. Do not stop when it claims to have cleaned the filesystem after fixing an issue.

JoeDiffieHellman

That did it all right. I saw that fsck had marked it clean in the original console output, so I assumed that couldn't be the problem.

Thanks for the help!

Snaps for kiokoman.
alt text

Enter full pathname of shell or RETURN for /bin/sh: uhub0: 8 ports with 8 removable, self powered

# /sbin/fsck -y /
** /dev/ufsid/5d4890a17562fc55

USE JOURNAL? yes

** SU+J Recovering /dev/ufsid/5d4890a17562fc55
** Reading 33554432 byte journal from inode 4.

RECOVER? yes

** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.

WRITE CHANGES? yes

** 39 journal records in 2560 bytes for 48.75% utilization
** Freed 0 inodes (0 dirs) 2 blocks, and 2 frags.

***** FILE SYSTEM MARKED CLEAN *****
# /sbin/fsck -y /
** /dev/ufsid/5d4890a17562fc55

USE JOURNAL? yes

** SU+J Recovering /dev/ufsid/5d4890a17562fc55
Journal timestamp does not match fs mount time
** Skipping journal, falling through to full fsck

** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
UNREF FILE I=561816  OWNER=root MODE=100666
SIZE=0 MTIME=Aug 10 21:37 2019
CLEAR? yes

UNREF FILE I=561821  OWNER=root MODE=100666
SIZE=0 MTIME=Aug 10 21:27 2019
CLEAR? yes

** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****
# /sbin/fsck -y /
** /dev/ufsid/5d4890a17562fc55

USE JOURNAL? yes

** SU+J Recovering /dev/ufsid/5d4890a17562fc55
Journal timestamp does not match fs mount time
** Skipping journal, falling through to full fsck

** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
# /sbin/fsck -y /
** /dev/ufsid/5d4890a17562fc55

USE JOURNAL? yes

** SU+J Recovering /dev/ufsid/5d4890a17562fc55
Journal timestamp does not match fs mount time
** Skipping journal, falling through to full fsck

** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
# /sbin/fsck -y /
** /dev/ufsid/5d4890a17562fc55

USE JOURNAL? yes

** SU+J Recovering /dev/ufsid/5d4890a17562fc55
Journal timestamp does not match fs mount time
** Skipping journal, falling through to full fsck

** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
# /sbin/fsck -y /
** /dev/ufsid/5d4890a17562fc55

USE JOURNAL? yes

** SU+J Recovering /dev/ufsid/5d4890a17562fc55
Journal timestamp does not match fs mount time
** Skipping journal, falling through to full fsck

** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
21873 files, 231050 used, 1521533 free (997 frags, 190067 blocks, 0.1% fragmentation)

***** FILE SYSTEM IS CLEAN *****
# reboot
Aug 11 21:47:48 init: single user shell terminated.
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `bufdaemon' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining... 0 0 0 0 0 0 0 0 0 0 done
All buffers synced.
Uptime: 2m21s

bmeeks

You didn't say how you powered it down prior to re-racking it. You should never just unplug the power. Always use the shutdown command in the pfSense menu. Simply removing power is just about guaranteed to cause disk corruption.

JoeDiffieHellman

@bmeeks

I'm ashamed to say that's very likely what I did wrong. I'm usually more conscientious than that. I did a legitimate shutdown and halt before I put it back in the closet this time. How embarrassing.

bmeeks

No harm since you repaired the file system. These little Netgate appliances are still PCs at heart, so you have to shutdown them down gracefully even though they do sort of resemble a network switch. They are actively writing to the disk even if it is solid-state, unlike say a typical firmware-based switch which just reads from a ROM of some type.