Hiccup
-
the other day, one of my boxes, hiccuped, it froze for about a minute or two, when it came back, i checked the temps (phpsysinfo), they were fine (36,48,20), 48 is for the CPU, others are from the motherboard. just about everything(logs,etc) got reset, it was if i rebooted it, which I thought it did automatically, but it takes about 3 min for it to come online completely.
I have attached a graph showing the above, and as you can see it wasnt really overloaded as far as I can tell.
I am wondering what would cause this.
-
It looks like a reboot, are you sure it didn't reboot? What is the uptime?
Random reboots are not a good sign. They aren't always temperature related, but typically something in hardware (RAM, PSU, HDD/media, etc)
-
I dont think it was a reboot as all services were up and running, normally I cant access anything for about 3-5 minutes until its ready. and the Internet was down for only a minute.
uptime is 5 days 5 hours right now (so maybe it did) -
I have started to get this in the logs, as an entry, could this be related to my problem?
9440806(0) win 8192 1460>
-
No, that's just a partial firewall log entry. Not all that uncommon.
-
well, it did it again today. cpu and another temp is a little high, will investigate the fans.
-
This system is rebooting again, system was a little warm fans are ok, should it reboot when the temp gets above a certain temp? My other system does not have much cooling and runs hotter but it has not reset. both mobos are set to shut down when temps get so high so i think it is software related causing the reboot, am I correct?
-
You could install a debug kernel, which if it is software would make it stop at the panic screen instead of automatically rebooting.
-
Thanks for the fast reply jimp.
that would require a complete reinstall though wouldnt it?
also the realtek issue showed up again, will post to that thread of mine to keep topics separate. -
No need for a reinstall:
http://doc.pfsense.org/index.php/Switching_Kernels
-
If you can spare a little downtime, it could be worth the effort to run a Memtest on the system to see if the hardware is all working fine.
-
Unless you get "lucky" and have a really bad stick, MemTest86+ is really only useful if run 48-72 hours.ย I've had quite a few "unstable" systems that will run for the better part of a weekend with dozens of passes without an error and then suddenly throw up hundreds at once.
-
any burn in/error test that I do I always run for 96 hours. To do those tests I would have to replace the current one with a temp system. I am going to load the dev kernel today, it would be easier to have it freeze at panic than to take it out of service for now, if no issues, will do error tests.
thanks for the input.
-
Unless you get "lucky" and have a really bad stick, MemTest86+ is really only useful if run 48-72 hours.ย I've had quite a few "unstable" systems that will run for the better part of a weekend with dozens of passes without an error and then suddenly throw up hundreds at once.
That is if you run all the tests in a row.ย Looping #3 and #5 alone will throw out errors rapidly on bad sticks of memory.ย The other tests take a long time to check for other system faults.
For other hardware failure (aside from mem. controller and ram), it's better to use Prime (Orthos), IBT or OCCT.