Random? reboots, is there any way to know if it's software or hardware…
-
Hi!
I think I found what is causing those reboots…
The CPU is overheating...
Thing is, it doesn't quite make sense...
It's an Atom 330 CPU which is passively cooled (and always has been stock from the factory).
Why would it suddenly start to overheat like that???
CPU usage seems fine so it doesn't appear to be overheating because it has to do much processing...
Could anything on the software side cause that? I doubt it could but I cannot explain why, all of a sudden, it overheats when nothing cooling wise has changed...
Thank you and have a nice day!
Nick
-
What does the pfSense CPU graph show in Status > Monitoring?
No matter what, you cooling seems insufficient.
-
It's an Atom 330 CPU which is passively cooled (and always has been stock from the factory).
Why would it suddenly start to overheat like that??? -
I also like to cover my computer with a protective anti-static layer of cigarette-smoke tar enhanced dust.
-
Hi Derelict!
What does the pfSense CPU graph show in Status > Monitoring?
Around 215 processes but the highest percentage I had there was around 5%…
No matter what, you cooling seems insufficient.
And this was by my own fault and turned out not to be the problem…
My gut feeling is that this motherboard probably has some bad caps or something similar...
I tried to replace the power supply for the same reason and my pfSense box is still unstable...
It was by my own fault because I had temporarily removed a 120 mm fan I had added (it is not supposed to be necessary but I had added it just in case) and had forgot about it...
That fan is not supposed to be there and I actually had to be creative to make it fit there and not touch anything...
Thank you very much for your help and have a nice day!
Season's Greetings!
Nick
-
Hi doktornotor!
Actually there is barely any dust in the computer that is failing…
As I mentionned in the post I just made prior to this one I was wrong about overheating being the cause...
My guess is bad caps...
Thank you, have a nice day and Season's Greetings!
Nick
-
its entirely possible that from the factory the heatsink isnt properly mounted, or is just insufficient in design, what model is the box, where is it from?
could try pressing the heatsink down onto the proc, but not too hard, just enough to make sure the thermal paste is compacted nicely.
ive seen many a time where the heatsink is just sitting on the paste and not actually compacting it because the mounting bracket/pins are loose fitting and dont compress it at all
-
Hi MasterX-BKC!
its entirely possible that from the factory the heatsink isnt properly mounted, or is just insufficient in design
I was actually wrong about the reboot being caused by overheating… The box was overheating because when it started having problems I opened it and temporarily removed a fan I had added...
That fan is
- not actually supposed to be there
- it's a pain to put it there and have it work because if I move it ever so slightly it stops spinning because it touches something in the casing.
So, in restrospect, the case I used is absolute c...
what model is the box, where is it from?
If you mean the motherboard, it's a Zotac IONITX-F-E I believe…
(Only the last two letters I am not sure of but the IONITX-F-E seems to be what I have...)
If you mean the case, I really don't know who made it...
could try pressing the heatsink down onto the proc, but not too hard, just enough to make sure the thermal paste is compacted nicely.
ive seen many a time where the heatsink is just sitting on the paste and not actually compacting it because the mounting bracket/pins are loose fitting and dont compress it at all
As far as I can tell, and it's the first time ever I have seen this, the heatsink is screwed in from the back side of the board using the same kind of screws as PCI/PCIe slots…
With proper airflow the box doesn't overheat and before I had to open it because it was getting unstable it had proper airflow so I think the heatsink is ok...
I thought I had found the cause of my problems when I saw it overheat but it was because of something I did after it started to be unstable…
Thank you and have a nice day!
Nick
-
OK guys (and gals if there are any reading this…), I have some bad news...
I replaced the box with another new one and it did it again today…
:( :( :( :( :( :( :( :(
I reused 3 parts from the old one...
- the NIC, an Intel I340-T4 (those things are kinda costly...)
- the optical drive (I put one but it will probably never see much use since I installed with a USB key).
- the SSD... Smart isn't reporting any problem with it and since my problem seemed more power supply or motherboard related I decided to reuse it...
Today I was greeted with the same screen as last time...
The box rebooted and seems to freeze while trying to do a PXE boot...
It's like, when it gets to choose between
F1 pfSense
F6 PXE bootIt chooses to do F6 and as far as I can tell only after it rebooted by itself…
Why and why did it reboot in the first place?
What do you guys think I should try next?
I doubt the optical drive is to blame so that leaves the Intel I340-T4 and the SSD...
Apparently if the number if MBUFs is too low with that card it could cause problem so I followed the recommandations I found here https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Intel_igb.284.29_and_em.284.29_Cards but it looks like it's no longer necessary to do this with the more recent versions as I ended up with slightly less MBUFs after doing this (before: 1,009,342 after: 1,000,000).
Could the card hardware itself cause a reboot and not its software, I am not so sure of that.
And there's the SSD... Maybe it is starting to go bad but shouldn't I get errors in the SMART reports and shouldn't my box behave even more strangely?
Any ideas?
Thank you and have a nice day!
Nick
-
Hi!
I have changed the SSD about 10 days ago…
I didn't touch the optical drive (since it's quite improbable it is that nor the NIC card (since those things are kinda costly)...
Last time it took a close to two weeks before it became unstable again IIRC so I should know soon enough if the SSD was truly to blame...
Thank you and have a nice day!
Nick