Pfsense 2.5 stacks at boot with dots
-
@sjm said in Pfsense 2.5 stacks at boot with dots:
Under 2.4 I'd had numerous power outages and brown outs and the system kept coming back every time without a hitch.
Same. The only time I've had a device not come back up is when there was a clear explanation as to why.
-
@chamilton_ccn I do remember seeing a zero length config.xml file.
-
@sjm said in Pfsense 2.5 stacks at boot with dots:
@chamilton_ccn I do remember seeing a zero length config.xml file.
Interesting! Do you know if the
/cf/conf/backup
directory was empty and/or whether the most recent backup config was empty? If you're uncertain and this happens again, definitely check that and report back! -
@chamilton_ccn said in Pfsense 2.5 stacks at boot with dots:
It seems the /cf/config/config.xml file was also empty.
I should repeat myself : impossible.
But I acknowledge : you saw it ... so it's possible.
I never saw such a thing for the past 10 years or so.It's not pfSense "2.5" as that version doesn't exist - is that 2.5.0 ? 2.5.1 ? Or 21.02.2-RELEASE (amd64) ? All of them ?
Meanwhile, some real info is shown now : there are logs !!.
So, now, it's known where the issue is : here /etc/config.lib.inc : the while loop at line 465, it loops again printing a dot ".".
I suppose not only the backup xml files are 'gone', but also the main /cf/conf/config.xml.
So, the code 'thinks' it needs to be "upgraded" as it retrieves a "0" as an initial version number..... and 0 + something changed by the config upgrade conversion becomes "B.S.".
That should be changed. The system should halt / bail out with a message like : "No valid config found - See you."
Now, it take the before last config .... which is zer, so it goes back to the before before last, which is zero (right ?).Normally, when a file gets created, there is content to be written to.
The creation process worked - but nothing gets written. Even a simple file copy doesn't work on your system.
Ok, so it's not a file system error.
I'll take the next best : PHP is brain dead ? The kernel is brain dead ? What's so special with your 'setup' that it is so messed up ?
Please keep going with the info.Take you pfSense to a new VM - or even better : a dedicate machine, and see that it behaves as thousands of others : it can create files, fill them with info - etc.
If /cf/conf/config.xml creation was an issue for many other, this would be a major show stopper.
So, use another VM, like Windows has the build in Hyper-V : I'm using two of them : works great. -
@gertjan It WAS pfsense 2.5.0 stable when it happened.
And is also confirmed to happen on official netgate hardware.
The same vm host runs happily various other workloads including pf 2.5.1 as we speak.
And no, this has nothing to do with the Hypervisor.
And Hyper-v isn't going to solve it anyways. -
@gertjan said in Pfsense 2.5 stacks at boot with dots:
But I acknowledge : you saw it ... so it's possible.
I never saw such a thing for the past 10 years or so.Oh it definitely happened, but I appreciate your skepticism :-) This is a first time for me as well.
EDIT: In my situation, it was version
21.02.2-RELEASE (amd64)
. -
@netblues
Hyper V as an alternative because I saw "pfsense 2.5 under centos8 kvm" in the beginning.Also because Netgate - I'm speaking for myself - did not test centos8 . Hyper-V was tested.
I'm not saying it's better. Just to get you on "common grounds".Btw :
[2.5.1-RELEASE][root@pfsense.outside.bdx.net.net]/cf/conf/backup: ls -al total 28788 drwxr-xr-x 2 root wheel 2048 May 5 08:17 . drwxr-xr-x 4 root wheel 2048 May 5 13:01 .. -rw-r--r-- 1 root wheel 10999 May 5 08:17 backup.cache -rw-r--r-- 1 root wheel 429728 Apr 27 17:17 config-1619536501.xml -rw-r--r-- 1 root wheel 430599 Apr 29 11:08 config-1619536620.xml -rw-r--r-- 1 root wheel 430590 Apr 29 11:08 config-1619687301.xml -rw-r--r-- 1 root wheel 430609 Apr 29 14:52 config-1619687320.xml -rw-r--r-- 1 root wheel 430647 Apr 30 08:29 config-1619700777.xml -rw-r--r-- 1 root wheel 430666 Apr 30 08:30 config-1619764170.xml -rw-r--r-- 1 root wheel 430709 Apr 30 08:39 config-1619764238.xml -rw-r--r-- 1 root wheel 430644 Apr 30 08:40 config-1619764777.xml -rw-r--r-- 1 root wheel 430653 Apr 30 08:40 config-1619764818.xml -rw-r--r-- 1 root wheel 430372 Apr 30 08:41 config-1619764822.xml -rw-r--r-- 1 root wheel 430279 Apr 30 08:41 config-1619764863.xml -rw-r--r-- 1 root wheel 430689 Apr 30 08:41 config-1619764867.xml -rw-r--r-- 1 root wheel 430666 Apr 30 13:41 config-1619764918.xml -rw-r--r-- 1 root wheel 430675 Apr 30 13:41 config-1619782889.xml -rw-r--r-- 1 root wheel 430394 Apr 30 13:42 config-1619782894.xml -rw-r--r-- 1 root wheel 430346 Apr 30 13:43 config-1619782965.xml -rw-r--r-- 1 root wheel 430281 Apr 30 13:43 config-1619783032.xml -rw-r--r-- 1 root wheel 430691 Apr 30 18:00 config-1619783036.xml -rw-r--r-- 1 root wheel 430624 Apr 30 18:00 config-1619798400.xml -rw-r--r-- 1 root wheel 430633 May 1 00:00 config-1619798431.xml -rw-r--r-- 1 root wheel 430624 May 1 00:00 config-1619820000.xml -rw-r--r-- 1 root wheel 430633 May 1 06:00 config-1619820004.xml -rw-r--r-- 1 root wheel 430624 May 1 06:00 config-1619841600.xml -rw-r--r-- 1 root wheel 430633 May 1 12:00 config-1619841604.xml -rw-r--r-- 1 root wheel 430624 May 1 12:00 config-1619863200.xml -rw-r--r-- 1 root wheel 430633 May 1 12:05 config-1619863231.xml -rw-r--r-- 1 root wheel 430655 May 1 12:06 config-1619863557.xml -rw-r--r-- 1 root wheel 430374 May 1 12:07 config-1619863565.xml -rw-r--r-- 1 root wheel 430337 May 1 12:07 config-1619863639.xml -rw-r--r-- 1 root wheel 430346 May 1 12:08 config-1619863646.xml -rw-r--r-- 1 root wheel 430281 May 1 12:08 config-1619863713.xml -rw-r--r-- 1 root wheel 430691 May 1 12:12 config-1619863720.xml -rw-r--r-- 1 root wheel 430534 May 1 12:12 config-1619863929.xml -rw-r--r-- 1 root wheel 430543 May 1 12:12 config-1619863966.xml -rw-r--r-- 1 root wheel 430262 May 1 12:13 config-1619863970.xml -rw-r--r-- 1 root wheel 430315 May 1 12:13 config-1619864004.xml -rw-r--r-- 1 root wheel 430279 May 1 12:13 config-1619864016.xml -rw-r--r-- 1 root wheel 430689 May 3 16:32 config-1619864022.xml -rw-r--r-- 1 root wheel 430668 May 4 07:29 config-1620052367.xml -rw-r--r-- 1 root wheel 430622 May 4 07:30 config-1620106188.xml -rw-r--r-- 1 root wheel 430609 May 4 07:31 config-1620106241.xml -rw-r--r-- 1 root wheel 429702 May 4 07:33 config-1620106271.xml -rw-r--r-- 1 root wheel 429703 May 4 07:34 config-1620106429.xml -rw-r--r-- 1 root wheel 429660 May 4 07:34 config-1620106440.xml -rw-r--r-- 1 root wheel 428464 May 4 07:35 config-1620106491.xml -rw-r--r-- 1 root wheel 428497 May 4 07:36 config-1620106534.xml -rw-r--r-- 1 root wheel 428498 May 4 07:36 config-1620106574.xml -rw-r--r-- 1 root wheel 427619 May 4 07:37 config-1620106612.xml -rw-r--r-- 1 root wheel 426756 May 4 08:41 config-1620106653.xml -rw-r--r-- 1 root wheel 425903 May 5 07:56 config-1620110479.xml -rw-r--r-- 1 root wheel 426800 May 5 07:57 config-1620194206.xml -rw-r--r-- 1 root wheel 426814 May 5 08:07 config-1620194237.xml -rw-r--r-- 1 root wheel 426814 May 5 08:09 config-1620194839.xml -rw-r--r-- 1 root wheel 426842 May 5 08:10 config-1620194993.xml -rw-r--r-- 1 root wheel 426861 May 5 08:10 config-1620195027.xml -rw-r--r-- 1 root wheel 426886 May 5 08:13 config-1620195040.xml -rw-r--r-- 1 root wheel 426924 May 5 08:15 config-1620195192.xml -rw-r--r-- 1 root wheel 426959 May 5 08:16 config-1620195341.xml -rw-r--r-- 1 root wheel 426991 May 5 08:16 config-1620195365.xml -rw-r--r-- 1 root wheel 427023 May 5 08:17 config-1620195391.xml
If one of these was zero, I would surely hit that big red alarm button right away.
I'm pretty sure the main config.xml was zeroed out also.
That's close to a Windows PC with a nuked registry file : that system will not boot, period.edit :
This :
@chamilton_ccn said in Pfsense 2.5 stacks at boot with dots:
/ecl.php: Netgate pfSense Plus is restoring the configuration
is Plus, - and I have no ecl.php file as I use the Community edition.
@chamilton_ccn said in Pfsense 2.5 stacks at boot with dots:
Here's some console output that I believe supports what I said, above:
Not really.
What is the
ls -al
at that moment ?
A "/conf/backup/config-1547535734.xml" is found, read, promoted to new config.xml but also found empty (== no objects).
Btw : I install this many moons ago : https://github.com/KoenZomers/pfSenseBackup : works great. A real set-it-and-forget-it-backup tool.
-
@gertjan said in Pfsense 2.5 stacks at boot with dots:
Not really.
Ok, but everything I said was tested, empirically. Or else I'd still have a non-booting device.
What is the
ls -al
at that moment ?There's no way to know at this point.
-
@chamilton_ccn will keep an eye out. As mentioned I had to restore service ASAP so when fsck and forum searches were unable to resolve the grief I went straight to reformatting and an external config.xml backup restore.
-
@sjm the only other remote thought is a comment from my wife. She said a number of people in town had been experiencing internet service disruption. Not sure if there is any relationship but if may help someone else to consider a possible link.
-
Ok done some digging; in summary.
- this happened on pfsense 2.5.1.
- pfsense was streaming a YouTube video to my TV at the time it crashed.
- It happened around 07:45 AEST Wednesday the 5th of May.
- NOTE:- the pfsense box was connected to a UPS. There were no audible warnings from the UPS indicating a power fluctuation or drop out in any case.
- pfsense had partly crashed. I.e. web interface had stopped responding. pfsense OS was still working.
- direct connected a keyboard and a monitor. Saw the spooling dots appearing on the screen.
- ctrl-c allowed me to bring up the command prompt.
- ran fsck after booting into single user mode, if I recall correctly, at least three times, in attempt to repair fs corruption. This did not succeed.
- had noted that config.xml had a zero byte count file size. - rebooted a number of times with the same failed result: endless dots appearing on the screen.
- scrubbed the install and did a complete reinstall with a reload of an external backup copy of config.xml.
- everything now appears to be operating normally.
- later on my wife indicated that the local chat had mentioned a number of people in the area where we live experiencing internet service issues (one would assume this would not be in any way connected with the crash I experienced with pfsense?!?).
- see photos and hope they help.
-
@sjm said in Pfsense 2.5 stacks at boot with dots:
It happened around 07:45 AEST Wednesday the 5th of May.
.....You saw the dots.
And yet you said :@sjm said in Pfsense 2.5 stacks at boot with dots:
pfsense had partly crashed.
When you see the dots, pfSense is booting.
Did you reboot (reset ?) it ?
If not, it crashed and rebooted itself.in attempt to repair fs corruption. This did not succeed.
If the file system doesn't get cleaned/repaired, it will get mounted in read only mode at best.
This means : not one single byte can get written to the system.
This is not good at all.
read only mode means that file's can't get created : no empty files - no file name. Nothing. The file system can't get altered any more.I would start doing some severe hardware / drive tests. Or just change the drive.
Btw : updating Facebook, doing nothing, or watching Youtube isn't related here ^^
A WAN (or LAN) disconnection should not 'break' or reset the system. Or mess up the file system.
These two events are not related.I mean : I can ripe out the WAN connector any time, power down my ISP router any time, ripe out the mains plug of my UPS (protects my ISP router / pfSEnse and main switch) any time.
I actually do so every month or so.
Before doing so, I look at ALL the logs for 'special' events.
I start removing power.
And afterwards, check how the system builds the WAN again - and check if the man LAN works normally.Btw : your photo's : try capturing the moment the kernel boots, and when it finds the drive dirty. It would (or you instructed it) start to file system check (fsck) and it should terminate with a clean system an continue booting.
-
@gertjan What we really don't know is if the system suddenly crashes and restarts and then hangs with dots, or dots start to appear without rebooting.
That's quite difficult to pinpoint.
I would say the first, but this is an assumption.
-
@sjm I agree with this comment by @Gertjan:
Btw : updating Facebook, doing nothing, or watching Youtube isn't related here ^^
A WAN (or LAN) disconnection should not 'break' or reset the system. Or mess up the file system.
These two events are not related.What I'm curious about is this: In one of your photos, there's a message:
2021-05-04T17:49:29.134875+ 10:00 php-fpm 328 - - /ecl.php: New alert found: pfSense is restoring the configuration /cf/conf/backup/config-1618488403.xml
1618488403 == Thu Apr 15 08:06:43 2021
Is that date relevant to your scenario? Also, can you confirm whether that file is empty or not? If it's empty, can you try dropping to a shell and deleting it, and then rebooting? In my situation,
config.xml
was empty so the device tried restoring the most recent backup, which was also empty, resulting in infinite dots. As I've previously stated, deleting the zero-byte backup config seems to have caused the device to roll over to the next most recent config, which wasn't empty, and the system was able to boot on that attempt.The only thing I can think of is that this is "unexpected shutdown" related. I was putting the finishing touches on my configuration, the device was on my test bench. After I was done, I logged out of the console and maybe 30 seconds to a minute later, I powered down the device using the rocker switch so I could move it into the rack. When I got it into the rack and powered it back up, that's when this problem occurred.
-
@netblues said in Pfsense 2.5 stacks at boot with dots:
@gertjan What we really don't know is if the system suddenly crashes and restarts and then hangs with dots, or dots start to appear without rebooting.
I'm gonna go with the former. I find it hard to believe a running system would just start doing this on its own. It appears to be directly related to the device being unable to load a good config file (either from
config.xml
or from/cf/conf/backup/
. Whether or not it happens only if the config files are empty remains to be seen, I think. -
@chamilton_ccn as previously mentioned I had to wipe the installation and perform a clean install ASAP to restore my service. The only data I have is contained in these photos.
Given the helpful comments I'll know what action to take if there is a next time.
- Boot into single user more and run fsck multiple times to clear any fs corruption errors.
- Delete any zeroed config.xml files including backups.
- Attempt a reboot.
- In addition try to determine the sequence of events. Crash reboot dots(fs corruption) loop or dots(fs corruption) loop etc.
- Save any and all config.xml files and log files for further checking.
-
@sjm Gotcha.
If anyone in this thread is willing to attempt to recreate the problem, you might go about it by first creating a backup of your config and downloading/saving it somewhere safe, then interrupting the boot process and dropping to a shell, then:
echo > /cf/conf/config.xml echo > /cf/conf/backup/config-`date +%s`.xml reboot
All my machines are in production and I don't have a VM at the ready.
-
@chamilton_ccn said in Pfsense 2.5 stacks at boot with dots:
echo > /cf/conf/config.xml
echo > /cf/conf/backup/config-date +%s
.xml
rebootThat's like deleting the Windows Registry files and rebooting Windows to see if it boot up again.
It won't.
Don't believe my word on it. Try it out yourself. Or better : some have made video's about it.The real issue is : why is the file system (is it ?) messed up ? Why are (config only ?) files created, but not 'filled' up ?
-
@gertjan said in Pfsense 2.5 stacks at boot with dots:
@chamilton_ccn said in Pfsense 2.5 stacks at boot with dots:
echo > /cf/conf/config.xml
echo > /cf/conf/backup/config-date +%s
.xml
rebootThat's like deleting the Windows Registry files and rebooting Windows to see if it boot up again.
It won't.
Don't believe my word on it. Try it out yourself. Or better : some have made video's about it.It looks nothing like that. The backup directory contains multiple backup files, so question should be: Why, upon encountering an empty or bad file, doesn't it roll over to the next one? Or at the least, why isn't an error logged ... or something other than printing an infinite string of dots to the console?
That said, one of the most helpful steps in any troubleshooting scenario is the ability to reliably recreate the problem. That's all I was suggesting here, recreating the conditions that are suspected to lead to the observed symptoms, to see if they do in fact lead to the same end.
-
@chamilton_ccn said in Pfsense 2.5 stacks at boot with dots:
Why, upon encountering an empty or bad file
.... this situation was probably not tested.
Why would the most important system file disappear ??
It's the one and only file you should backup regular - as with this file you can rebuild an entire pfSense system from scratch in a couple of minutes.
The absence of this will take down the same system - but faster ^^@chamilton_ccn said in Pfsense 2.5 stacks at boot with dots:
or something other than printing an infinite string of dots to the console?
These dots are just what should be "low-bud" progress bar, as it could take some time to look over 'all' the saved config files. It could be the last one that is good.
Or the first.edit : see below : I found out what it does when running ecl.php .....it's a lengthy, non-standard process. I never saw this file running.
@chamilton_ccn said in Pfsense 2.5 stacks at boot with dots:
That said, one of the most helpful steps in any troubleshooting scenario is the ability to reliably recreate the problem.
I totally agree.
But by just wiping (zeroing out) the config files you test the behaviour of what the system would do without a valid config file. That is : with the multiple config files - with non content.
It should load in default config (this one has no interface assigned), and wait for you at the console level so you can re init the entire system again.I want to know a test that creates failing disk system .... or whatever is needed so that only empty files are created.
Btw : I just found ecl.php - it's in /etc/ and not in /etc/inc/
(wtf ...)It stand for External ( external from to what ??? ) Config Loader ....
It scan other disks / partitions / slices / .... !!!!It look like that it found something somewhere -dono what 'disks and partions' you have , and then the system tries to 'upgrade' it (every pfSense version has its own config 'level' version) and that process fails.
True, that's an issue.What I would like to know is : why is the (your) main /cf/conf/config.xml fckd up ?
I never saw that before.And I'm pretty sure that when you install a clean pfSense :
- On another system
- Not changing any settings
that you can not create the problem.
I'll leave it up to you to draw the (a ?) conclusion ......
So, tell us :
Your - all - the device details.
Your - all - settings.I know, this is silly - and hard to answer.
But where would I start to debug test this ? By nuking my own systems ..... no thanks.