PfSense on a SuperMicro Atom Server Randomly Rebooting
-
Can you ssh into the box?
If you can, go into the shell and type in clog /var/log/system.log and post the logs from just prior to the reboot and following it.
I wrote a post with more info regarding grabbing logs here. Posting log info usually gets the problem identified very quickly.
-
Thanks for the info guys. I will work on getting 2.1 installed in the next day or two and see what happens from there. I'll know relatively quickly if it is going to work or not, and will post what I find from there.
Just out of curiosity…. Any idea when 2.1 is going to move to stable?
-
Well it looks like I am in the same boat with 2.1. Here's the syslog right before and after the reboot. Sure doesn't look like anything is getting logged.
Feb 21 19:56:16 atlas check_reload_status: Syncing firewall
Feb 21 19:56:49 atlas check_reload_status: Syncing firewall
Feb 21 19:56:53 atlas php: /snort/snort_alerts.php: Checking for and disabling any rules dependent upon disabled preprocessors for WAN…
Feb 21 19:57:33 atlas check_reload_status: Syncing firewall
Feb 21 19:57:37 atlas php: /snort/snort_alerts.php: Checking for and disabling any rules dependent upon disabled preprocessors for WAN...
Feb 21 20:02:02 atlas syslogd: kernel boot file is /boot/kernel/kernel
Feb 21 20:02:02 atlas kernel: Copyright (c) 1992-2012 The FreeBSD Project.
Feb 21 20:02:02 atlas kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Feb 21 20:02:02 atlas kernel: The Regents of the University of California. All rights reserved.
Feb 21 20:02:02 atlas kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
Feb 21 20:02:02 atlas kernel: FreeBSD 8.3-RELEASE-p6 #1: Thu Feb 21 11:33:28 EST 2013
Feb 21 20:02:02 atlas kernel: root@snapshots-8_3-amd64.builders.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_SMP.8 amd64
Feb 21 20:02:02 atlas kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
Feb 21 20:02:02 atlas kernel: CPU: Intel(R) Atom(TM) CPU D525 @ 1.80GHz (1807.21-MHz K8-class CPU)
Feb 21 20:02:02 atlas kernel: Origin = "GenuineIntel" Id = 0x106ca Family = 6 Model = 1c Stepping = 10
Feb 21 20:02:02 atlas kernel: Features=0xbfebfbff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe>Feb 21 20:02:02 atlas kernel: Features2=0x40e31d <sse3,dtes64,mon,ds_cpl,tm2,ssse3,cx16,xtpr,pdcm,movbe>Feb 21 20:02:02 atlas kernel: AMD Features=0x20100800 <syscall,nx,lm>Feb 21 20:02:02 atlas kernel: AMD Features2=0x1 <lahf>Feb 21 20:02:02 atlas kernel: TSC: P-state invariant
Feb 21 20:02:02 atlas kernel: real memory = 8589934592 (8192 MB)
Feb 21 20:02:02 atlas kernel: avail memory = 8244371456 (7862 MB)
Feb 21 20:02:02 atlas kernel: ACPI APIC Table: <022112 APIC1550>
Feb 21 20:02:02 atlas kernel: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
Feb 21 20:02:02 atlas kernel: FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT threads
Feb 21 20:02:02 atlas kernel: cpu0 (BSP): APIC ID: 0
Feb 21 20:02:02 atlas kernel: cpu1 (AP/HT): APIC ID: 1
Feb 21 20:02:02 atlas kernel: cpu2 (AP): APIC ID: 2
Feb 21 20:02:02 atlas kernel: cpu3 (AP/HT): APIC ID: 3Crash happened between 19:57 and 20:02</lahf></syscall,nx,lm></sse3,dtes64,mon,ds_cpl,tm2,ssse3,cx16,xtpr,pdcm,movbe></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe>
-
I'm on a customer network at a hotel running that exact same hardware right now with 80-some active users. That platform is widely used with factory defaults. Still no crash report from the sounds of it? Definitely, without question, a hardware problem of some sort if you're still not getting a crash report.
-
Still no crash report, and still nothing logged in the BIOS log.
I've ran untangle, and most recently ClearOS on this box for like 5 months and no crashes. I'd just rather run pfSense.
So you think it's something faulty? I suppose I could see if SuperMicro would be willing to replace the system board.
I've tried each drive individually, so I don't think it's the drives. And you said earlier you didn't think it was RAM, because there was no crash report.
In your similar setups are you using dual hard drives in a RAID array?
The only thing I have changed from defaults in BIOS is the IDE/SATA Config. I have it set as follows:
Configure Sata#1 as: RAID
ICH Raid CodeBase: AdaptecIf memory serves me right, I tried the CodeBase as Intel, and it wouldn't even see the raid volume.
-
Ok, thought I'd provide an update…
So I stumbled upon SuperMicro's supported OSes page. Supposedly FreeBSD is supported, but not the onboard RAID.
http://www.supermicro.com/support/resources/OS/Atom.cfmSo I installed pfSense setting up a gmirror. That didn't seem to solve it.
So I started wondering if it had something to do with ACPI. Looking in BIOS it was set to ACPI version 2.0. I switched it to 3.0. It's been up for about 24 hours now, so I'm cautiously optimistic now. It would never make it a full 24 hours before.
-
Spoke too soon, rebooted overnight. Back to the drawing board.
-
Can you bypass the RAID on the motherboard and directly connect to an IDE/SATA port?
-
Turning RAID on and off is just a BIOS setting, no jumpers or anything on the board for it. SATA ports are the same, there aren't special ones for the RAID. According to SuperMicro, AHCI mode is supported, which is what I have it on now. I'm accomplishing the RAID with gmirror now.
I just swapped the RAM out with a different brand that I happened to have, so I'm going to give this a go now and see what happens. So I went from 8GB of crucial ram to 8GB of Hynix ram that I had left over from a ram upgrade on my laptop.
I just can't think of what would be physically wrong with the board to only give me grief in FreeBSD, but work fine in other linux variants. But if the RAM doesn't do it, I think my only other option is to see if SuperMicro will send me another board. I just don't know if they will.
-
Well it's not the RAM. It rebooted in less than three hours this time.
I've submitted an RMA to SuperMicro, hopefully they'll send a replacement.
-
SuperMicro is shipping a new system board. Hope to have it in a day or two.
-
New system board is in, so we'll soon see if this is the answer.
Interesting side note. It's been up for about 3 hours now. The CPU is running about 10 degrees cooler than on the other board. It was never close to overheating. I just thought it was note worthy that the new one is running cooler.
-
Running 2 of those same boxes here with 2.0.2 amd64 on them.
Never had any issues.. I did read however that a lot of people had temperature issues with them, and to fix them they taped the vents on the FRONT (?!) closed so it forced air in from the back, across the passive CPU cooler, and out through the power supply. Seemed weird, but they say that the temps drop more than 10 degrees C when they do that..
Mine run in the mid 50's, and the chips are specced up to 100 degrees C, so I'm gonna leave mine alone for now.
They're surprisingly fast squid boxes when paired with an intel or samsung SSD.. :) Nice low budget really fast router.
-
The CPU temp on the new system board is sitting at 55-56 degrees celsius. On the only one it was always 65-70 degrees celsius. So it never got to the point that it was over heating, but the difference tells me that there was definitely something going on.
It's been up 1.5 days now without rebooting, so I'm optimistic.
-
Four days and counting. I think it's solved.
-
Curious…how did you end up accomplishing RAID on the replacement board? I understood where you switched to gmirror; did you revert to the onboard RAID again with the new board?
-
Curious…how did you end up accomplishing RAID on the replacement board? I understood where you switched to gmirror; did you revert to the onboard RAID again with the new board?
I did revert back to the onboard raid on the new board and it is working perfectly. I have it set to raid mode in bios and the code base is set to adaptec.
-
We have 2 boxes with this Supermicro board and PFsense 2.0.1 running. One Year it was no problems at all, no restarts, very stable. Since today morning we have the same problem with these two boxes. Suddenly reboots, weird hdd problems (says it's full but it isn't full and fsck and smart and vendor tools say it's all right with hdds). ram is OK. CPU temp on one box 38 and another box 21 Celsius. It's looks like something with cable or sata controller. It's runs with one hdd, no raid.
For 3 month I had the same issue with another boxes (absolut the same hardware configs) at my customers Datacenter. All was running about 6 month good and suddenly both boxes weird problems, reboots and so on. Those problem was solved changing sata cables and hdds from WD black scorpio 160 GB 2,5" to Seagate. Now they running 4 Month with no problems.
And today these problems in our Infrastrucure too :( There was IPMI update at supermicro website -> now installed and waiting if it helps. box #2 since 5 hours -> OK, box #1 since 1 hour updated -> no crashes.
I'll keep You updated if anything goes wrong, but for me now: no more these Boards with pfsense.