Router aborting boot at mountroot (not related to recent install/update/reboot)



  • Short and simple:

    pfsense 2.3.4 on a Haswell platform and single boot SSD. Woke up to find it's evidently tried to reboot and hit an issue but I can't see how to narrow the issue down. All I can see is in the attached photo. It's consistent on reboot.

    The initial "boot normally/single user" prompt is shown, device detection is normal, and suddenly it dumps out into the boot loader prompt. Because it's on a VGA screen without scrollback, this is all I can see. I doubt there's any other useful info in dmesg. I don't know how to figure what's up and whether it's hardware, disk IO, loader.conf scrambled, or needs fsck. (I'm not very familiar with BSD before userland kicks in)

    My other pfsense install (working fine) shows almost identical dmesg output right up to where it bails into the loader:

    ada0: Previously was known as ad20
    SMP: AP CPU #3 Launched!
    SMP: AP CPU #1 Launched!
    SMP: AP CPU #2 Launched!
    Timecounter "TSC-low" frequency 1200067654 Hz quality 1000

    –----> THIS IS WHERE THE NON-BOOT MACHINE BAILS INTO THE LOADER PROMPT
              THE BOOT OUTPUT FROM DMESG SHOULD PRESUMABLY MATCH THE WORKING
              INSTALL, AND HAVE AS THE NEXT LINE:

    Trying to mount root from ufs:/dev/ufsid/[some disk id] [rw]…

    The loader "?" prompt shows "sensible" disks and slices. There's only one disk. The backtrace ("db> bt") shows minimal information but confirms it's mounting root fs that's the issue: start_init() -> vfs_mountroot() -> panic()

    I could grab the config and reinstall, but I'd rather actually trace and fix it than just work around, so I have a clue if I ever see it again.

    There hasn't been a reboot or update recently, nor has anyone else touched the machine.  It's had a decent uptime and has been rock solid. So something's happened in the last few hours. I'd say hardware - except it's having zero issues with hardware enumeration or disk usage right up to vfs_mounroot() where it hits the buffers.

    How can I diagnose this from here, and what are the basic tools/tricks needed to view the log, get a more detailed debug log, and (if it needs fixing within BSD) resolve it?