Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    PfSense on a SuperMicro Atom Server Randomly Rebooting

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    25 Posts 7 Posters 9.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      cmb
      last edited by

      No crash reports from the IP you're posting from or anything close to it, are you not getting any, or just not submitting them?

      1 Reply Last reply Reply Quote 0
      • T
        tim.mcmanus
        last edited by

        Couple of things to check:  Is the system BIOS set to reboot if the system encounters an error (hardware)?

        My pfSense box would randomly died, and I realized it was overheating.  I set the BIOS to reboot so I wouldn't have a dark router.  A quick addition of a fan fixed that.  I realized it was overheating when I looked at the system logs.

        1 Reply Last reply Reply Quote 0
        • T
          tjgertge
          last edited by

          I was never notified of crash reports that could be sent to the developers.  Do I need to install the developer's kernel in order for this to be an option?  If that's the case, I will definitely make sure to do that when I try 2.0.2.

          I do have BIOS set to reboot on a hardware problem, but nothing was ever logged in the BIOS logs.  I'm 99% sure it's not a cooling issue.  I've got three of these boxes running other tasks (PBX, NAS) and then run like a charm.  I'm running a different UTM distro on this box as well, and it doesn't have any issues.  I'd just much rather run pfSense, as I think it is far superior.

          1 Reply Last reply Reply Quote 0
          • W
            wallabybob
            last edited by

            Perhaps you might have more success with pfSense 2.1 snapshot builds. They have much more up to date device drivers than the 2.0.x series of builds.

            Without some sort of crash dump or crash report it is almost imposibble to tell what is going on.

            1 Reply Last reply Reply Quote 0
            • C
              cmb
              last edited by

              If you're not getting crash reports then it's pretty much a certainty it's a hardware problem (and likely not RAM since RAM problems will most always cause kernel panics). Software problems that cause a reboot will be from a kernel panic, and you'll be prompted to submit the crash report upon your next login and every login until you either choose to delete or submit it. That happens with every kernel, no need for and generally don't want the dev kernel for that. Wouldn't be a bad idea to try 2.1 also so you're trying a newer base OS.

              1 Reply Last reply Reply Quote 0
              • T
                tim.mcmanus
                last edited by

                Can you ssh into the box?

                If you can, go into the shell and type in clog /var/log/system.log and post the logs from just prior to the reboot and following it.

                I wrote a post with more info regarding grabbing logs here.  Posting log info usually gets the problem identified very quickly.

                1 Reply Last reply Reply Quote 0
                • T
                  tjgertge
                  last edited by

                  Thanks for the info guys.  I will work on getting 2.1 installed in the next day or two and see what happens from there.  I'll know relatively quickly if it is going to work or not, and will post what I find from there.

                  Just out of curiosity….  Any idea when 2.1 is going to move to stable?

                  1 Reply Last reply Reply Quote 0
                  • T
                    tjgertge
                    last edited by

                    Well it looks like I am in the same boat with 2.1.  Here's the syslog right before and after the reboot.  Sure doesn't look like anything is getting logged.

                    Feb 21 19:56:16 atlas check_reload_status: Syncing firewall
                    Feb 21 19:56:49 atlas check_reload_status: Syncing firewall
                    Feb 21 19:56:53 atlas php: /snort/snort_alerts.php: Checking for and disabling any rules dependent upon disabled preprocessors for WAN…
                    Feb 21 19:57:33 atlas check_reload_status: Syncing firewall
                    Feb 21 19:57:37 atlas php: /snort/snort_alerts.php: Checking for and disabling any rules dependent upon disabled preprocessors for WAN...
                    Feb 21 20:02:02 atlas syslogd: kernel boot file is /boot/kernel/kernel
                    Feb 21 20:02:02 atlas kernel: Copyright (c) 1992-2012 The FreeBSD Project.
                    Feb 21 20:02:02 atlas kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
                    Feb 21 20:02:02 atlas kernel: The Regents of the University of California. All rights reserved.
                    Feb 21 20:02:02 atlas kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
                    Feb 21 20:02:02 atlas kernel: FreeBSD 8.3-RELEASE-p6 #1: Thu Feb 21 11:33:28 EST 2013
                    Feb 21 20:02:02 atlas kernel: root@snapshots-8_3-amd64.builders.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_SMP.8 amd64
                    Feb 21 20:02:02 atlas kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
                    Feb 21 20:02:02 atlas kernel: CPU: Intel(R) Atom(TM) CPU D525   @ 1.80GHz (1807.21-MHz K8-class CPU)
                    Feb 21 20:02:02 atlas kernel: Origin = "GenuineIntel"  Id = 0x106ca  Family = 6  Model = 1c  Stepping = 10
                    Feb 21 20:02:02 atlas kernel: Features=0xbfebfbff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe>Feb 21 20:02:02 atlas kernel: Features2=0x40e31d <sse3,dtes64,mon,ds_cpl,tm2,ssse3,cx16,xtpr,pdcm,movbe>Feb 21 20:02:02 atlas kernel: AMD Features=0x20100800 <syscall,nx,lm>Feb 21 20:02:02 atlas kernel: AMD Features2=0x1 <lahf>Feb 21 20:02:02 atlas kernel: TSC: P-state invariant
                    Feb 21 20:02:02 atlas kernel: real memory  = 8589934592 (8192 MB)
                    Feb 21 20:02:02 atlas kernel: avail memory = 8244371456 (7862 MB)
                    Feb 21 20:02:02 atlas kernel: ACPI APIC Table: <022112 APIC1550>
                    Feb 21 20:02:02 atlas kernel: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
                    Feb 21 20:02:02 atlas kernel: FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT threads
                    Feb 21 20:02:02 atlas kernel: cpu0 (BSP): APIC ID:  0
                    Feb 21 20:02:02 atlas kernel: cpu1 (AP/HT): APIC ID:  1
                    Feb 21 20:02:02 atlas kernel: cpu2 (AP): APIC ID:  2
                    Feb 21 20:02:02 atlas kernel: cpu3 (AP/HT): APIC ID:  3

                    Crash happened between 19:57 and 20:02</lahf></syscall,nx,lm></sse3,dtes64,mon,ds_cpl,tm2,ssse3,cx16,xtpr,pdcm,movbe></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe>

                    1 Reply Last reply Reply Quote 0
                    • C
                      cmb
                      last edited by

                      I'm on a customer network at a hotel running that exact same hardware right now with 80-some active users. That platform is widely used with factory defaults. Still no crash report from the sounds of it? Definitely, without question, a hardware problem of some sort if you're still not getting a crash report.

                      1 Reply Last reply Reply Quote 0
                      • T
                        tjgertge
                        last edited by

                        Still no crash report, and still nothing logged in the BIOS log.

                        I've ran untangle, and most recently ClearOS on this box for like 5 months and no crashes.  I'd just rather run pfSense.

                        So you think it's something faulty?  I suppose I could see if SuperMicro would be willing to replace the system board.

                        I've tried each drive individually, so I don't think it's the drives.  And you said earlier you didn't think it was RAM, because there was no crash report.

                        In your similar setups are you using dual hard drives in a RAID array? 
                        The only thing I have changed from defaults in BIOS is the IDE/SATA Config.  I have it set as follows:
                        Configure Sata#1 as: RAID
                        ICH Raid CodeBase: Adaptec

                        If memory serves me right, I tried the CodeBase as Intel, and it wouldn't even see the raid volume.

                        1 Reply Last reply Reply Quote 0
                        • T
                          tjgertge
                          last edited by

                          Ok, thought I'd provide an update…

                          So I stumbled upon SuperMicro's supported OSes page.  Supposedly FreeBSD is supported, but not the onboard RAID.
                          http://www.supermicro.com/support/resources/OS/Atom.cfm

                          So I installed pfSense setting up a gmirror.  That didn't seem to solve it.

                          So I started wondering if it had something to do with ACPI.  Looking in BIOS it was set to ACPI version 2.0.  I switched it to 3.0.  It's been up for about 24 hours now, so I'm cautiously optimistic now. It would never make it a full 24 hours before.

                          1 Reply Last reply Reply Quote 0
                          • T
                            tjgertge
                            last edited by

                            Spoke too soon, rebooted overnight.  Back to the drawing board.

                            1 Reply Last reply Reply Quote 0
                            • T
                              tim.mcmanus
                              last edited by

                              Can you bypass the RAID on the motherboard and directly connect to an IDE/SATA port?

                              1 Reply Last reply Reply Quote 0
                              • T
                                tjgertge
                                last edited by

                                Turning RAID on and off is just a BIOS setting, no jumpers or anything on the board for it.  SATA ports are the same, there aren't special ones for the RAID.  According to SuperMicro, AHCI mode is supported, which is what I have it on now.  I'm accomplishing the RAID with gmirror now.

                                I just swapped the RAM out with a different brand that I happened to have, so I'm going to give this a go now and see what happens.  So I went from 8GB of crucial ram to 8GB of Hynix ram that I had left over from a ram upgrade on my laptop.

                                I just can't think of what would be physically wrong with the board to only give me grief in FreeBSD, but work fine in other linux variants.  But if the RAM doesn't do it, I think my only other option is to see if SuperMicro will send me another board.  I just don't know if they will.

                                1 Reply Last reply Reply Quote 0
                                • T
                                  tjgertge
                                  last edited by

                                  Well it's not the RAM.  It rebooted in less than three hours this time.

                                  I've submitted an RMA to SuperMicro, hopefully they'll send a replacement.

                                  1 Reply Last reply Reply Quote 0
                                  • T
                                    tjgertge
                                    last edited by

                                    SuperMicro is shipping a new system board.  Hope to have it in a day or two.

                                    1 Reply Last reply Reply Quote 0
                                    • T
                                      tjgertge
                                      last edited by

                                      New system board is in, so we'll soon see if this is the answer.

                                      Interesting side note.  It's been up for about 3 hours now.  The CPU is running about 10 degrees cooler than on the other board.  It was never close to overheating.  I just thought it was note worthy that the new one is running cooler.

                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        cheddarlump
                                        last edited by

                                        Running 2 of those same boxes here with 2.0.2 amd64 on them.

                                        Never had any issues..  I did read however that a lot of people had temperature issues with them, and to fix them they taped the vents on the FRONT (?!) closed so it forced air in from the back, across the passive CPU cooler, and out through the power supply.  Seemed weird, but they say that the temps drop more than 10 degrees C when they do that..

                                        Mine run in the mid 50's, and the chips are specced up to 100 degrees C, so I'm gonna leave mine alone for now.

                                        They're surprisingly fast squid boxes when paired with an intel or samsung SSD.. :)  Nice low budget really fast router.

                                        1 Reply Last reply Reply Quote 0
                                        • T
                                          tjgertge
                                          last edited by

                                          The CPU temp on the new system board is sitting at 55-56 degrees celsius.  On the only one it was always 65-70 degrees celsius.  So it never got to the point that it was over heating, but the difference tells me that there was definitely something going on.

                                          It's been up 1.5 days now without rebooting, so I'm optimistic.

                                          1 Reply Last reply Reply Quote 0
                                          • T
                                            tjgertge
                                            last edited by

                                            Four days and counting.  I think it's solved.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.