PfSense Crashed
-
Yeah… it rocks... most of the time. Yesterday's build though sucks rocks.
What happened? Must have been something pretty fundamental, but nothing significant shows up in commit history!
-
Yeah… it rocks... most of the time. Yesterday's build though sucks rocks.
What happened? Must have been something pretty fundamental, but nothing significant shows up in commit history!
While I certainly share your frustration as many of us are running 2.1 nightlies on production systems for driver reasons, I think saying it "sucks rocks" is unfair. These are nightlies we're running. They're not even alphas (yes, I know they're labeled "beta" but they're not betas in the traditional sense of a beta that's been slightly tested and expected to mostly work - they're automatic nightly builds). Nightlies can be broken in very fundamental ways, it's the nature of software development.
The pfSense team is doing great work, and I'm especially appreciative of the members of the team who are friendly and helpful on these forums to all of us running pfSense. When 2.1 is out, or even an actual beta/RC, then we'll have builds we can use and not run the risk of them fundamentally trashing the whole system. Until then, they're nightlies, for better or for worse. I'm glad the team's made them available for widespread community testing.
-
Ok, I post just to help someone in the case. The easiest way I found to recover pfSense to a working state was restoring a full backup from an image I made some time ago.
Just connect to the console, control+break the system while it is initializing, and run:/etc/rc.restore_full_backup /root/pfSense-full-backup-20130219-2240.tgz
(just change pfSense-full-backup-20130219-2240.tgz with the filename of the full backup).Then, just wait for the process to complete and when it's done run: reboot
Since I always check the "make full backup" box when upgrading, that was an almost painless process to get things back working.
Almost painless, because I didn't know how, until I read this here, but of course, my net was down, so…...and also because my box needs to be opened up and a ribbon connector must be plugged onto the MB for me to hook up screen and keyboard, because that's just an "for install only" afterthought. Time to make a mod to the case, maybe.
In any case, things are up and running again, as this post shows. Will someone deposit a message here when a new build is available that works?
Given the hoops I have to jump through when things are busted in this particular way, I'm not keen on installing random builds until I know this particular issue is fixed. -
While I certainly share your frustration as many of us are running 2.1 nightlies on production systems for driver reasons, I think saying it "sucks rocks" is unfair. These are nightlies we're running. They're not even alphas (yes, I know they're labeled "beta" but they're not betas in the traditional sense of a beta that's been slightly tested and expected to mostly work - they're automatic nightly builds). Nightlies can be broken in very fundamental ways, it's the nature of software development.
The pfSense team is doing great work, and I'm especially appreciative of the members of the team who are friendly and helpful on these forums to all of us running pfSense. When 2.1 is out, or even an actual beta/RC, then we'll have builds we can use and not run the risk of them fundamentally trashing the whole system. Until then, they're nightlies, for better or for worse. I'm glad the team's made them available for widespread community testing.
Lighten up guys. My comment was only an indication of surprise at the catastrophic outcome in applying this last nightly 'alpha', 'beta' or whatever. I had been lulled into complacence by the general quality of these releases and was taken off guard by this one.
One thing that would have facilitated recovery is a boot option allowing the selection and re-installation of one of the previously created backup tarballs. Sure its easily done manually, but it's difficult to look up how when you've lost all Internet connectivity.
-
yes, let's relax a bit everybody… we all know that all the pfSense guys are doing an AWESOME job, just let's us all remember that version 2.1 is still in the developing phase. For example, I run it at home, but in the office I run the stable 2.0.X version, and it's rock solid.
Since version 2.1 is still in developing, this can happen, we all have to remember it on each update, and have a backup plan, like trying to restore a previous config or full backup when things are working (because when things are not working could be too late) or have a stable backup disk/flash/microdrive to use in case of emergency or have two or more firewalls to update in different times, and so on.
Btw, let's wait for the pfSense staff to produce a stable snapshot with all the latest updates and fixes.
Michele
-
Since version 2.1 is still in developing, this can happen, we all have to remember it on each update, and have a backup plan, like trying to restore a previous config or full backup when things are working (because when things are not working could be too late) or have a stable backup disk/flash/microdrive to use in case of emergency or have two or more firewalls to update in different times, and so on.
No I totally agree that's all I meant, in a very light-hearted way :D
-
Since version 2.1 is still in developing, this can happen, we all have to remember it on each update, and have a backup plan, like trying to restore a previous config or full backup when things are working (because when things are not working could be too late) or have a stable backup disk/flash/microdrive to use in case of emergency or have two or more firewalls to update in different times, and so on.
Fully understood, but also one of the reasons why it would be great if the non-embedded version of pfSense could also have two slices, or a recovery partition that one can boot into to restore a previous backup, maybe even automatically if a system crashes more than X times within timespan t.
Otherwise, particularly if a unit is in a remote location, it can be a true PITA to even restore a backup that does exist.
-
I just upgraded 3 instances to the Thursday build, and I've started experiencing this as well.
Theyre all configured differently, and the only things in common are
- OpenVPN client export
- iPerf
- They have OpenVPN configured (but disabled on one of them)
It seems that messing with the firmware upgrade page is a sure-fire way to trigger a crash. One of them seems stable as long as I dont log into it.
The problem only started happening after a few hours of working on (2 of) the boxes, then all of a sudden all 3 started acting unstable. I dont think its just the webGUI, I had a number of crashes while launching daemons from SSH as well. Right now i have a ping -t running against one and I have ceased attempting to log into the GUI, and it seems stable.
Not sure how helpful that is.
-
Um, me again…
I think anybody chosing to run the DEVELOPMENT version should be doing so in the strict knowledge and understanding that there could be issues. I do.
Ensure your backup/DR/contingency plan is robust, thats basic. Carry out own testing before releasing to a production or working rig, that's basic as well.
I dont think its right however "light-hearted" it seems to criticise the project for a broken DEVELOPMENT version. But there we go. Thats my opinion.
Constructively - would the developers perhaps consider PULLING the broken release to negate more people being affected?
-
In my case it doesn't reboot: it simply freezes and refuses to accept any input. Strangely, it was working fine yesterday. Not that it matters much: I've been using it on my secondary (backup) firewall to provide IPv6 connectivity. My primary one is still on 2.0.X.
It probably would be an idea to pull the broken release.
-
Re-installed image from Mon Apr 22 04:52:47 EDT 2013 … to get back online for now.
That works so far ;)I will follow this thread to see when it is safe to do another update to the latest snapshot.
Thanks, Stefan
-
24th april snaps r safe, im using it from past few days
-
I dont think its right however "light-hearted" it seems to criticise the project for a broken DEVELOPMENT version. But there we go. Thats my opinion.
Constructively - would the developers perhaps consider PULLING the broken release to negate more people being affected?
Umm, I DIDN'T criticise it (since I was the one who said what I said was light-hearted). I was saying how great pfSense was. I was light-heartedly criticising those of us who got burned by this snap :) (I'm in that group, the snaps have been so close to production-ready that I thought nothing of clicking the upgrade button. My fault, it's development)
-
In my case, attempting to upgrade thru the GUI triggered reboots 90% of the time. This should fix the problem if anyone has run into it (run from SSH / console)
For i386:
8 fetch http://snapshots.pfsense.org/FreeBSD_RELENG_8_3/i386/pfSense_HEAD/updates/pfSense-Full-Update-2.1-BETA1-i386-20130423-1530.tgz exit 13 2 /root/pfSense-Full-Update-2.1-BETA1-i386-20130423-1530.tgz y
For amd64:
8 fetch http://snapshots.pfsense.org/FreeBSD_RELENG_8_3/amd64/pfSense_HEAD/updates/pfSense-Full-Update-2.1-BETA1-amd64-20130423-0841.tgz exit 13 2 /root/pfSense-Full-Update-2.1-BETA1-amd64-20130423-0841.tgz y
This has been tested on a remote system over SSH, and works fine.
I should add that if you are on the affected version from thursday, you need to downgrade ASAP; on one of my (virtual) systems the problem progressed until the vm no longer booted up. It seems to get worse, probably due to repeated dirty unmounts of the filesystem.
-
I can confirm the "getting worse" part, and that the GUI access makes it even less stable.
However there seem to be other things, maybe VPN related, that made my system not stable enough to be recoverable from the CLI, because by the time it was up and I did an slogin, it was about to crash.So the only way I could do this was to disconnect all network cables (physically), so there were no network events (packets, pings, VPN down, etc.) and then do a restore of a full backup from the physical console, which was a bit tricky due to the kind of hardware I use. (No keyboard and video port on the outside of the case).
So your suggestion may not work for everyone.
-
If you are in that state where it wont boot, this should work. I have used it to remotely restore 3 boxes so far, and it seems to work well.
Get an ISO of a "good" version (2.0.3, 2.1 as of april 23). Boot up to it, and select "recovery". Pick your drive, and continue.
You will need to re-assign your interfaces to their adapters, dont worry about getting all of them correct as we will restore the config.
Once you are at the standard "menu", run the following:8 cp /tmp/hdrescue/cf/conf/config.xml /cf/conf/config.xml cp /tmp/hdrescue/cf/conf/config.xml /conf/config.xml rm /tmp/config.cache exit
Your config should now be loaded. Manually assign the proper IPs to your interfaces, and you should have proper web-gui access again. Log in, and make a backup of your config.
Continue with the installation, which should preserve your now in-memory configuration.
I HIGHLY recommend that you A) confirm that the downloaded configuration is correct and that B) cat /cf/conf/config.xml shows your configuration. Make SURE you have backups before proceeding with the install, which will involve unmounting and wiping your existing partition.
-
@markuhde - i know you didnt. was somebody else mangling my earlier comment about pfsense rocks. which it does!
Moving swiftly on - all my 32-bit builds are happy running Thu Apr 25 09:08:19 EDT 2013, which I think was the last snapshot prior to the broken one. All good this end!
-
Looking at today's build log, seems like the error from the weekend is gone (I think it was a sig 15 during a make world). A build seems to be running now.
-
Ahh..whew!! That's a good crash!
At least it wasn't someone on the network trying to play master hacker.
-
Anyone dare load today's build yet?