Recover after a power failure



  • Hi,

    I've discovered a trouble that occurs after a bad shutdown of the machine (power loss, system failure, …)

    The trouble is that, in the next power up, pfsense is mounting the root as read only because the partition is unclean.

    Then, due to it's read only mode, the system randomly freezes. (maybe due to the fact that it uses the ram to store the modifications that he can't do)

    In order to fix this I must fsck the partition and reboot.

    My questions are :
    1 - Why the fsck is not made automaticaly in case of unclean root partition ?
    2 - How do I do in order to fsck automaticaly in case of unclean root partition ?
    3 - Is there a place in the web interface that can help me to detect that kind of situation ?

    Best regards.



  • What version of pfSense are you running? Have you considered purchasing a small, supported, UPS to allow a clean shutdown on power failure?



  • It should ask you to start a fsck or do it automatically. If it doesn't complete this, then there might be something wrong with the hardware. If you can, save your config and reload pfSense, then restore your config.



  • Mine does this quite often (test box). I run 2.0 RC3 and I have a gmirror with 2x 36gb WD 10krpm HDD. The rebuild process is pretty fast.

    I would suggest that you use a smaller HDD and also a supported UPS.

    Are you worried about resume time or just that you have to rebuild the drive? I am unclear about the issue.

    Depending on where you are located - if you are in NYC I have piles of 36gb WD 10krpm HDD and you are welcome to try a mirror setup to see if this changes things for you.???



  • I Use a UPS already.

    However not all bad shutdowns are due to power failure ….. It can be also due to a system crash.

    I am running pfsense 2.0 release and it does not do automaticaly the fsck at startup when the partition is marked as unclean.

    When I run manualy fsck there is nothing to correct ..... it just remove the unclean flag..... and then I can reboot the system again to have pfsense stable again :(

    Let me resume my trouble in order to reproduce :
    1 - Bad shutdown
    2 - Pfsense Power up
    3 - root partition mounted as read only due to the fact that the partition is unclean (can be seen with a simple dmesg)
    4 - pfsense freezes because it tries to write to a volume that is read only (more or less 15 minutes working)

    To correct this I must run a fsck after the point 3 and then reboot ........ fsck that the init of the pfsense don't do...... And I must do it quickly because the system hang up after a short time.

    No one had this trouble ?



  • @Cry:

    What version of pfSense are you running?

    I can't see this issue happening on full embedded.  I just pulled the plug on my system and it booted fine (with the exception of HAVP starting due to the permissions problem that is known about).  It checked the disk and started fine.



  • I have the full version 2.0 release installed and can reproduce this problem on a VirtualBox installation.
    If I close the Virtual machine with the power button and start it again then it will mount the disk as read only.

    This is the last rows from dmesg:
    Trying to mount root from ufs:/dev/ad0s1a
    WARNING: / was not properly dismounted
    WARNING: R/W mount of / denied.  Filesystem is not clean - run fsck

    Edit
    Spoke to soon.
    Did some more tests and the filesystem is not in read only mode.
    Changed a setting and rebooted and the setting was still there.

    Also found the command dmesg -a
    There I can see that the fsck is run at startup.
    From dmesg:

    Trying to mount root from ufs:/dev/ad0s1a
    WARNING: / was not properly dismounted
    Configuring crash dumps…
    Using /dev/ad0s1b for dump device.
    Mounting filesystems...
    WARNING: R/W mount of / denied.  Filesystem is not clean - run fsck
    mount:
    /dev/ad0s1a
    :
    Operation not permitted
    ** /dev/ad0s1a
    ** Last Mounted on /
    ** Root file system
    ** Phase 1 - Check Blocks and Sizes
    INCORRECT BLOCK COUNT I=79 (4 should be 0)
    CORRECT? yes

    ** Phase 2 - Check Pathnames
    ** Phase 3 - Check Connectivity
    ** Phase 4 - Check Reference Counts
    UNREF FILE I=68  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=69  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=74  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=75  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=76  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=97  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=98  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=99  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=101  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=102  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=106  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=107  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=108  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=109  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE I=110  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 20:29 2011
    CLEAR? yes

    UNREF FILE  I=128  OWNER=root MODE=100644
    SIZE=0 MTIME=Sep 21 21:00 2011
    RECONNECT? yes

    ** Phase 5 - Check Cyl groups
    FREE BLK COUNT(S) WRONG IN SUPERBLK
    SALVAGE? yes

    SUMMARY INFORMATION BAD
    SALVAGE? yes

    BLK(S) MISSING IN BIT MAPS
    SALVAGE? yes

    5957 files, 93674 used, 412789 free (517 frags, 51534 blocks, 0.1% fragmentation)

    ***** FILE SYSTEM MARKED CLEAN *****

    ***** FILE SYSTEM WAS MODIFIED *****


  • Rebel Alliance Developer Netgate

    It always runs fsck automatically after an unclean shutdown. It says r/w mount denied, so then it runs fsck to fix the problem with the filesystem, and once it's cleaned up then it mounts the drive normally.

    If that automated process doesn't work, then it's possible the drive was corrupted more than usual by the power loss. If it's that bad, then it could possible take a few fsck runs (boot to single user mode) to fix up all the way.


Locked