Fresh load, minimal tweaks, idle then catastrophe
-
So far, this has happened twice. The second time was with different drives. I'm hoping if I tell enough, someone will key up and tell me where a possible weak link is.
I took a spare 1u chassis supermicro X9DRL-7F, dual e5-2603v2, 32Gb ram and loaded pfsense. I have an additional 1gb ethernet adapter with 2 ports, because I like to party.
It seems like a bit overkill, but the system was handy.
I put the drives in a raid 1 array, using the built in sas controller. I know the bsd world loves zfs and zfs hates hardware raid, but this was a simple load and I pretended zfs didn't exist for either attempt.
I set up the machine, things look good, I add a VPN client to get familiar with routing subnets over different vpn connections. I set it aside and do other things while it burns in. After a few weeks, the machine reboots and goes into endless boot loops.
I have seen mentions of filesystem corruption and instructions to boot into single user mode to repair the filesystem. This unfortunately was not a point I could get to, it would just crash out before then. I couldn't really catch much from the IPMI interface before it would reboot.
After the first time this happened and I gave up, I simply reloaded the system with new drives and it worked fine... until last night.
There very well could be something up with the hardware, but to be able to reload the system and it be absent of any symptoms until a certain duration is reached seems a bit weird. Are there known issues with certain configurations that are time bombs like this is?
Unless someone has some good ideas, the next thing I will disable the internal sas controller and wire into the sata ports. I'll probably build a spare and keep it on standby until this one croaks again.
-
@ultrasilence said in Fresh load, minimal tweaks, idle then catastrophe:
After a few weeks, the machine reboots and goes into endless boot loops.
why did it reboot ?
-
Yeah, impossible to say without seeing a crash report of some kind.
But, yes, hardware raid controllers can suck under pfSense and are best avoided if possible.
Steve
-
@heper said in Fresh load, minimal tweaks, idle then catastrophe:
@ultrasilence said in Fresh load, minimal tweaks, idle then catastrophe:
After a few weeks, the machine reboots and goes into endless boot loops.
why did it reboot ?
That unfortunately is an answer I do not have. It only had once PC routing through it, and it was offline.
I'm resisting the urge to put an OS that I am more familiar with.
-
@stephenw10 said in Fresh load, minimal tweaks, idle then catastrophe:
Yeah, impossible to say without seeing a crash report of some kind.
But, yes, hardware raid controllers can suck under pfSense and are best avoided if possible.
Steve
The heat sink on the LSI 2208 is pretty hot to the touch, so I have disabled it and will move things over to the sata ports.
I don't often encounter hardware failures that will destroy operating systems.
Time for round 3, if it happens again, are there any "paint by numbers" guides for retrieving logs externally?
-
The best thing you can do it hook up a serial console and log it's output to something locally.
If it is a drive or drive controller failure it may not be able to record that event but it will spew a load of errors to the console.The next best thing is set up log exporting via syslog:
https://docs.netgate.com/pfsense/en/latest/monitoring/logs/remote.htmlSteve