XG-1541 Weird Boot Behavior and Changing FSCK Messages



  • We're running an XG-1541 and I came into the office to find it had not recovered from a scheduled reboot. When I got in in the morning it was offline and when I checked at the KVM it had a Blank screen, no messages, no response to input. I restarted it and noted that the "Intel Boot Agent" screen reset numerous times before it moved on to the standard boot loader. The Intel Boot agent message appears at the top it then below it "press "ctrl-s" to enter setup" appears, the message to enter setup disappears, then the Intel boot agent message does, the screen goes black then the whole thing repeats. It did this 5 or 6 times before continuing.

    Once it got into the menu I went to the shell and ran a FSCK and got a bunch of errors, however not being a BSD/*NIX admin I am unfamiliar with FSCK output I'm not sure how serious they are. I rebooted into single user and ran it again to repair, it marked the drive clean. Rebooted back to normal mode and again multiple errors. The boot behavior was the same each time, resetting at the Intel boot agent message multiple times. The output from FSCK varies, if i run it several times in a row I will frequently get different messages, I've included a couple below:

    UNEXPECTED SOFT UPDATE INCONSISTENCY
    ** Last Mounted on /
    ** Root file system
    ** Phase 1 - Check Blocks and Sizes
    ** Phase 2 - Check Pathnames
    ** Phase 3 - Check Connectivity
    ** Phase 4 - Check Reference Counts
    UNREF FILE I=4333827 OWNER=root MODE=100666
    SIZE=0 MTIME=Sep 25 07:42 2019
    CLEAR? no

    ** Phase 5 - Check Cyl groups
    208838 files, 3777527 used, 16479844 free (462412 frags, 2002179 blocks, 2.3% fragmentation)

    And run immediately after that I get:

    SETTING DIRTY FLAG IN READ_ONLY MODE

    UNEXPECTED SOFT UPDATE INCONSISTENCY
    ** Last Mounted on /
    ** Root file system
    ** Phase 1 - Check Blocks and Sizes
    INCORRECT BLOCK COUNT I=4333845 (8 should be 0)
    CORRECT? no

    INCORRECT BLOCK COUNT I=4333846 (408 should be 0)
    CORRECT? no

    ** Phase 2 - Check Pathnames
    ** Phase 3 - Check Connectivity
    ** Phase 4 - Check Reference Counts
    UNREF FILE I=4333827 OWNER=root MODE=100666
    SIZE=0 MTIME=Sep 25 07:42 2019
    CLEAR? no

    ** Phase 5 - Check Cyl groups
    FREE BLK COUNT(S) WRONG IN SUPERBLK
    SALVAGE? no

    SUMMARY INFORMATION BAD
    SALVAGE? no

    BLK(S) MISSING IN BIT MAPS
    SALVAGE? no

    208838 files, 3777470 used, 16479792 free (462408 frags, 2002173 blocks, 2.3% fragmentation)

    The hard drive is the Intel 53x and Pro 2500 Series SSD that shipped with the system and still passes SMART tests, but this seems like it could be failing to me. The boot agent could have problems with the hard drive causing it to loop temporarily.

    I'm going to get a new drive and attempt to clone it. But before that are there any other recommended diagnostics I should run?


  • LAYER 8

    i don't think it's anything serius from the fsck log
    when you reboot to single user mode launch
    fsck -y /
    multiple times even if it tell you that the file system is clean
    at least run it 5 / 6 times
    this could happen after a blackout

    but there could be a worse problem from the KVM



  • I'll try that tomorrow morning when I get in and can take it offline, Thanks.

    Is the stutter at the Intel boot agent normal for this model? I don't remember noticing it do so before.


  • Netgate Administrator

    No that does not sounds like the normal boot output.

    If running fsck multiple times from single user mode does not correct it I would open a ticket with us. https://go.netgate.com

    Steve



  • Yeah no change. Booted single user and ran FSCK about 20 times it would say clean each time but rebooted and tried from the shell only to get an SU+J error and a long list of incorrect block counts right away. I"ll open a ticket, thanks everyone.


Log in to reply