PfSense on a SuperMicro Atom Server Randomly Rebooting

  • I'm hoping someone here can help point me in the right direction. I purchased and built a SuperMicro server for the purpose of running a pfSense firewall.  I was running 2.0.1, but had to give up because the system would just continue to randomly reboot throughout the day.  Now that 2.0.2 is out, I'd like to try again.  I wanted to post my configuration to the user community in hopes that someone could point me in the right direction as to what may be going on.  Here is my configuration:

    SuperMicro SuperServer 5015A-EHF-D525
    Which contains the X7SPE-HF-D525 motherboard.
    Ram: Crucial 8GB Kit (4GBx2) DDR3 1333 MT/s (PC3-10600) CL9 SODIMM 204-Pin Notebook Memory Modules CT2KIT51264BC1339
    Hard drive:  2 x Western Digital HDD 160GB HD WD1600BEKT SATA Mobile Storage 7200 Rpm 8MB Cache Bare, Raid 1.

    Memory tests have come up fine, so I think the ram is ok. 
    BIOS is latest revision from SuperMicro

    I thought it was maybe the onboard-raid and FreeBSD not getting along.  So I disabled the raid, and reinstalled just using one hard drive.  No change.

    I'm hoping someone else has come across this, or has an idea for me to try.  Thanks!

  • I should mention I was running the 64bit pfSense, with no additional addons installed.

  • No crash reports from the IP you're posting from or anything close to it, are you not getting any, or just not submitting them?

  • Couple of things to check:  Is the system BIOS set to reboot if the system encounters an error (hardware)?

    My pfSense box would randomly died, and I realized it was overheating.  I set the BIOS to reboot so I wouldn't have a dark router.  A quick addition of a fan fixed that.  I realized it was overheating when I looked at the system logs.

  • I was never notified of crash reports that could be sent to the developers.  Do I need to install the developer's kernel in order for this to be an option?  If that's the case, I will definitely make sure to do that when I try 2.0.2.

    I do have BIOS set to reboot on a hardware problem, but nothing was ever logged in the BIOS logs.  I'm 99% sure it's not a cooling issue.  I've got three of these boxes running other tasks (PBX, NAS) and then run like a charm.  I'm running a different UTM distro on this box as well, and it doesn't have any issues.  I'd just much rather run pfSense, as I think it is far superior.

  • Perhaps you might have more success with pfSense 2.1 snapshot builds. They have much more up to date device drivers than the 2.0.x series of builds.

    Without some sort of crash dump or crash report it is almost imposibble to tell what is going on.

  • If you're not getting crash reports then it's pretty much a certainty it's a hardware problem (and likely not RAM since RAM problems will most always cause kernel panics). Software problems that cause a reboot will be from a kernel panic, and you'll be prompted to submit the crash report upon your next login and every login until you either choose to delete or submit it. That happens with every kernel, no need for and generally don't want the dev kernel for that. Wouldn't be a bad idea to try 2.1 also so you're trying a newer base OS.

  • Can you ssh into the box?

    If you can, go into the shell and type in clog /var/log/system.log and post the logs from just prior to the reboot and following it.

    I wrote a post with more info regarding grabbing logs here.  Posting log info usually gets the problem identified very quickly.

  • Thanks for the info guys.  I will work on getting 2.1 installed in the next day or two and see what happens from there.  I'll know relatively quickly if it is going to work or not, and will post what I find from there.

    Just out of curiosity….  Any idea when 2.1 is going to move to stable?

  • Well it looks like I am in the same boat with 2.1.  Here's the syslog right before and after the reboot.  Sure doesn't look like anything is getting logged.

    Feb 21 19:56:16 atlas check_reload_status: Syncing firewall
    Feb 21 19:56:49 atlas check_reload_status: Syncing firewall
    Feb 21 19:56:53 atlas php: /snort/snort_alerts.php: Checking for and disabling any rules dependent upon disabled preprocessors for WAN…
    Feb 21 19:57:33 atlas check_reload_status: Syncing firewall
    Feb 21 19:57:37 atlas php: /snort/snort_alerts.php: Checking for and disabling any rules dependent upon disabled preprocessors for WAN...
    Feb 21 20:02:02 atlas syslogd: kernel boot file is /boot/kernel/kernel
    Feb 21 20:02:02 atlas kernel: Copyright (c) 1992-2012 The FreeBSD Project.
    Feb 21 20:02:02 atlas kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    Feb 21 20:02:02 atlas kernel: The Regents of the University of California. All rights reserved.
    Feb 21 20:02:02 atlas kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
    Feb 21 20:02:02 atlas kernel: FreeBSD 8.3-RELEASE-p6 #1: Thu Feb 21 11:33:28 EST 2013
    Feb 21 20:02:02 atlas kernel: amd64
    Feb 21 20:02:02 atlas kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
    Feb 21 20:02:02 atlas kernel: CPU: Intel(R) Atom(TM) CPU D525   @ 1.80GHz (1807.21-MHz K8-class CPU)
    Feb 21 20:02:02 atlas kernel: Origin = "GenuineIntel"  Id = 0x106ca  Family = 6  Model = 1c  Stepping = 10
    Feb 21 20:02:02 atlas kernel: Features=0xbfebfbff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe>Feb 21 20:02:02 atlas kernel: Features2=0x40e31d <sse3,dtes64,mon,ds_cpl,tm2,ssse3,cx16,xtpr,pdcm,movbe>Feb 21 20:02:02 atlas kernel: AMD Features=0x20100800 <syscall,nx,lm>Feb 21 20:02:02 atlas kernel: AMD Features2=0x1 <lahf>Feb 21 20:02:02 atlas kernel: TSC: P-state invariant
    Feb 21 20:02:02 atlas kernel: real memory  = 8589934592 (8192 MB)
    Feb 21 20:02:02 atlas kernel: avail memory = 8244371456 (7862 MB)
    Feb 21 20:02:02 atlas kernel: ACPI APIC Table: <022112 APIC1550>
    Feb 21 20:02:02 atlas kernel: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
    Feb 21 20:02:02 atlas kernel: FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT threads
    Feb 21 20:02:02 atlas kernel: cpu0 (BSP): APIC ID:  0
    Feb 21 20:02:02 atlas kernel: cpu1 (AP/HT): APIC ID:  1
    Feb 21 20:02:02 atlas kernel: cpu2 (AP): APIC ID:  2
    Feb 21 20:02:02 atlas kernel: cpu3 (AP/HT): APIC ID:  3

    Crash happened between 19:57 and 20:02</lahf></syscall,nx,lm></sse3,dtes64,mon,ds_cpl,tm2,ssse3,cx16,xtpr,pdcm,movbe></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe>

  • I'm on a customer network at a hotel running that exact same hardware right now with 80-some active users. That platform is widely used with factory defaults. Still no crash report from the sounds of it? Definitely, without question, a hardware problem of some sort if you're still not getting a crash report.

  • Still no crash report, and still nothing logged in the BIOS log.

    I've ran untangle, and most recently ClearOS on this box for like 5 months and no crashes.  I'd just rather run pfSense.

    So you think it's something faulty?  I suppose I could see if SuperMicro would be willing to replace the system board.

    I've tried each drive individually, so I don't think it's the drives.  And you said earlier you didn't think it was RAM, because there was no crash report.

    In your similar setups are you using dual hard drives in a RAID array? 
    The only thing I have changed from defaults in BIOS is the IDE/SATA Config.  I have it set as follows:
    Configure Sata#1 as: RAID
    ICH Raid CodeBase: Adaptec

    If memory serves me right, I tried the CodeBase as Intel, and it wouldn't even see the raid volume.

  • Ok, thought I'd provide an update…

    So I stumbled upon SuperMicro's supported OSes page.  Supposedly FreeBSD is supported, but not the onboard RAID.

    So I installed pfSense setting up a gmirror.  That didn't seem to solve it.

    So I started wondering if it had something to do with ACPI.  Looking in BIOS it was set to ACPI version 2.0.  I switched it to 3.0.  It's been up for about 24 hours now, so I'm cautiously optimistic now. It would never make it a full 24 hours before.

  • Spoke too soon, rebooted overnight.  Back to the drawing board.

  • Can you bypass the RAID on the motherboard and directly connect to an IDE/SATA port?

  • Turning RAID on and off is just a BIOS setting, no jumpers or anything on the board for it.  SATA ports are the same, there aren't special ones for the RAID.  According to SuperMicro, AHCI mode is supported, which is what I have it on now.  I'm accomplishing the RAID with gmirror now.

    I just swapped the RAM out with a different brand that I happened to have, so I'm going to give this a go now and see what happens.  So I went from 8GB of crucial ram to 8GB of Hynix ram that I had left over from a ram upgrade on my laptop.

    I just can't think of what would be physically wrong with the board to only give me grief in FreeBSD, but work fine in other linux variants.  But if the RAM doesn't do it, I think my only other option is to see if SuperMicro will send me another board.  I just don't know if they will.

  • Well it's not the RAM.  It rebooted in less than three hours this time.

    I've submitted an RMA to SuperMicro, hopefully they'll send a replacement.

  • SuperMicro is shipping a new system board.  Hope to have it in a day or two.

  • New system board is in, so we'll soon see if this is the answer.

    Interesting side note.  It's been up for about 3 hours now.  The CPU is running about 10 degrees cooler than on the other board.  It was never close to overheating.  I just thought it was note worthy that the new one is running cooler.

  • Running 2 of those same boxes here with 2.0.2 amd64 on them.

    Never had any issues..  I did read however that a lot of people had temperature issues with them, and to fix them they taped the vents on the FRONT (?!) closed so it forced air in from the back, across the passive CPU cooler, and out through the power supply.  Seemed weird, but they say that the temps drop more than 10 degrees C when they do that..

    Mine run in the mid 50's, and the chips are specced up to 100 degrees C, so I'm gonna leave mine alone for now.

    They're surprisingly fast squid boxes when paired with an intel or samsung SSD.. :)  Nice low budget really fast router.

  • The CPU temp on the new system board is sitting at 55-56 degrees celsius.  On the only one it was always 65-70 degrees celsius.  So it never got to the point that it was over heating, but the difference tells me that there was definitely something going on.

    It's been up 1.5 days now without rebooting, so I'm optimistic.

  • Four days and counting.  I think it's solved.

  • Curious…how did you end up accomplishing RAID on the replacement board?  I understood where you switched to gmirror; did you revert to the onboard RAID again with the new board?

  • @bryanj0207:

    Curious…how did you end up accomplishing RAID on the replacement board?  I understood where you switched to gmirror; did you revert to the onboard RAID again with the new board?

    I did revert back to the onboard raid on the new board and it is working perfectly. I have it set to raid mode in bios and the code base is set to adaptec.

  • We have 2 boxes with this Supermicro board and PFsense 2.0.1 running. One Year it was no problems at all, no restarts, very stable. Since today morning we have the same problem with these two boxes. Suddenly reboots, weird hdd problems (says it's full but it isn't full and fsck  and smart and vendor tools say it's all right with hdds). ram is OK. CPU temp on one box 38 and another box 21 Celsius. It's looks like something with cable or sata controller. It's runs with one hdd, no raid.

    For 3 month I had the same issue with another boxes (absolut the same hardware configs) at my customers Datacenter. All was running about 6 month good and suddenly both boxes weird problems, reboots and so on. Those problem was solved changing sata cables and hdds from WD black scorpio 160 GB 2,5" to Seagate. Now they running 4 Month with no problems.

    And today these problems in our Infrastrucure too :( There was IPMI update at supermicro website -> now installed and waiting if it helps. box #2 since 5 hours -> OK, box #1 since 1 hour updated -> no crashes.

    I'll keep You updated if anything goes wrong, but for me now: no more these Boards with pfsense.

Log in to reply