100% CPU unreachable router - squealing fans and burning hot case (randomly)
-
Been using pfsense on the same box now for about a year. Its an intel system I used to use for music production. Its massively overkill for this purpose but it wasnt doing anything else.
Normally it rests at around 1.2-2% cpu usage. This morning noticed a high pitch squealing sound from a fan in the server closet. I was transcoding/ripping some blu rays on a different pc in the same closet so i thought this was it. A few hours later the rips were done but the squealing persisted. I went into the closet just now and could feel the box was crazy hot and the fans were spinning wildly. The internet was unphased... full 1.2Gb speed no services affected but I couldn't reach the webgui nor the CLI. I tried plugging a monitor and keyboard in but nothing would show on the screen. After about 10 mins the webgui loaded... all normal except cpu at 100%. Probably had been for the last 12 hours. I couldn't navigate anywhere so I had to hard shut it down.
Took a few fsck's to get it back up and now its back up at its normal state. The fan was squealing for 10 mins and has now calmed down (I guess cuz not on fire anymore).
I couldnt top it while it was happening cuz it was locked hard. But what would cause this? This did happen a couple months ago as well actually.. and there was some post I'd found about a PCI or smartcard or something hardware related running away with CPU. One of those "on by default for 1% of the cases" or something. But I entered some commands and it fixed it but i think only until next reboot. I assumed this would be resolved in the next update but im a few updates since then and wondering if this is the same issue again, any idea what I'm talking about?
Any idea how to track this issue down? I'm worried about pfsense killing an expensive CPU over a memory leak or something if I fail to catch it next time. thx
EDIT:
Already spiking again. Looks like its trying to roll logs non stop?
-
Disable your /var and /ram disks....
System -> Advanced -> Misc.
Remember to reboot
-
I would disable log compression and increase the log sizes so that the roll-over is a lot faster and less frequent.
However that is a symptom of something else logging far more than it probably should. I would guess that's Suricata based on that top output.It should not be possible to damage a CPU by running it at 100%. If it is then I suggest improving the cooling setup you have on it. The i5-3470 only requires 77W cooling so that should be relatively easy to achieve. If it was running hot enough to heat the enclosure that indicates a problem to me.
Steve
-
@cool_corona ramdisk was already off. I increased log size and disabled compression. Seems to be stable for now.
thanks -
@stephenw10 log compression off and higher log size seems to have stabilized it.
Theres about 12 computers in that closet. There is cooling and venting into the closet and the alarm never went off but the case was pretty hot to the touch. Will keep an eye on it. thank you.