Pfsense stuck during boot on "Starting DNS Resolver" after power loss.
Hello, i hope somebody can help me. It's my first time i write on the Netgate forum.
I have Netgate SG-3100, which worked perfectly, until it had a sudden power loss. Now it's stuck in booting and never fully goes up. I connected to its console with usb, and i can see the booting proccess.
It starts all the processes before "Starting dns resolver" line, there it gets stuck. I can ctrl+c that, press enter and see the shell. I can see the files, run some commands and edit the files with vi. Probably i can disable the dns resolver in some file to temporarily skip it during boot, get to the gui and at least backup the settings. Or somehow backup them directly from the files, i just don't know which exactly.
I was suggested by support (no subscription, so limited help) to try fsck command and attempt to fix filesystem. I have tried. It looks like it fixes some, but not all errors. The list of errors is longer in the beginning, after some repeats of /sbin/fsck -y / it gets shorter till certain length only. After reboot it's longer again. Here is how it looks shortest:
** /dev/diskid/DISK-01FDFC17s2a (NO WRITE)
USE JOURNAL? no
** Skipping journal, falling through to full fsck
SETTING DIRTY FLAG IN READ_ONLY MODE
UNEXPECTED SOFT UPDATE INCONSISTENCY
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
INCORRECT BLOCK COUNT I=175898 (16 should be 8)
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
UNREF FILE I=59063 OWNER=root MODE=100666
SIZE=0 MTIME=Jan 28 04:12 2020
** Phase 5 - Check Cyl groups
SUMMARY INFORMATION BAD
FREE BLK COUNT(S) WRONG IN SUPERBLK
BLK(S) MISSING IN BIT MAPS
21715 files, 1239180 used, 558779 free (2795 frags, 69498 blocks, 0.2% fragmentation)
I've also mounted a usb drive and copied conf files so i can just setup all from scratch and not lose some settings. I just hope that the contents of /cf/conf folder are all the settings, i don't know for sure.
It looks that i probably might need to just reinstall the system from usb. I have requested the image of the latest firmware in the ticket.
If there is a way to fix it without loosing the current coniguration, i prefer this. If nothing can be done to save the old setup, i just need to know the right way to get things working and how better do the new setup.
Please help me or refer me to the right place to search or/and ask, i really hope i can fix my stuck device. Thank you and have a great day!
First, before you even think about restoring the firewall, get a UPS and connect it up!!! A firewall is a PC, and sudden power loss almost always results in disk corruption. That's your problem now.
After you get the UPS in place so this won't happen again, then go follow the instructions in this documentation: https://docs.netgate.com/pfsense/en/latest/backup/automatically-restore-during-install.html. You can import your
config.xmlfile during the installation process.
The funny part is that it is connected to ups, and i know that it's a pc... but the sockets going to the ups were mistakenly turned off, which i will now avoid from happening, was my stupid mistake for which i'm paying now. At least the firewall's drive is alive.
Thank you for the link, it really looks like what i need, very cool that it allows to use the old config file.
As i said before, i could copy the conf directory to the usb drive. In addition to the main config file there is also a backup folder for some versions of the same config. Are they good for this too? I'm not sure what's corrupt, maybe it's the config file. I hope it won't brick it in the process. The support have sent me the firmware, so i'll try later to do what your link suggests.
If you had a UPS and it still shutdown without warning, then maybe you don't have either the NUT or apcupsd package installed. Install and configure one of those packages and connect your UPS to the pfSense box (usually via USB cable) so that when the battery is about to expire, the UPS will signal the NUT or apcupsd package and the package will gracefully shutdown the firewall and avoid disk corruption.
Each time you make a change on the firewall and save it, a backup of the configuration is created. That's what all those other file versions are. Choose the most recent one from before the crash and you should be fine.
Thank you again for your help! I have successfully restored my firewall using the provided firmware and the old but latest config.xml file which i took before reinstalling from that backup folder. Looks like it's working now, has all the rules, also auto-installed the packages and their settings, so i hope that's the happy end. :)
About the UPS, it isn't guilty, i was me who disconnected everything from that UPS and got all this mess in reward. But anyway it's a good idea to let it control the shutdown, i'll check it, thank you so much!
Glad to hear everything is back in operation.
Using either the NUT or apcupsd package with your UPS is a good idea. The package can query the UPS for status info, and in the reverse, the UPS will signal the package any time the UPS goes on battery. It will also tell the package what the state of battery charge is. That way, when the UPS says the battery is about to expire, the package code will initiate a pfSense shutdown. This will protect you from disk issues.
Within either package you can configure when pfSense will shutdown by telling it at what point of "runtime remaining" that you want the graceful shutdown to start. The default values are usually fine.
@bmeeks Thank you for this tip, so cool that pfsense is prepared for everything! I have APC, will try apcupsd.