Random Reboots, Possibly Hardware Related



  • I'm looking for some help with an unstable PFsense box.  The thing randomly reboots itself, sometimes as often as every five minutes.  Trying to fix the problem, I completely reinstalled PFsense last night and went back to version 1.0.1.  It took me three tries to install.  The first two times I got this error message:

    Fatal trap 12:  page fault while in kernel mode
    Cupid = 0; apic id = 00
    Fault virtual address = 0x4000004
    Fault code = supervisor write, page not present
    Instruction pointer = 0x20:0xc0714dbc
    Stack pointer = 0x28:0xef362c6c
    Frame pointer = 0x28:0xef362c7c
    Code segment = base 0x0, limit 0xfffff, type 0x1b
    = DPL 0, pres 1, def32 1, gran 1
    Processor eflags = interrupt enabled, resume, IOPL = 0
    Current process = 110 (umount)
    Trap number =12
    Panic:  page fault
    Cupid = 0
    Uptime: 26s
    Cannot Dump. No dump device defined.
    Automatic reboot in 15 seconds

    I assume this is hardware related, but I don't know what could be causing the problem.  The system ran Memtest for an hour last night with no errors, so I don't think it's a memory thing.  I finished the install last night around 1200.  It crashed once overnight, at about 6AM.  Currently it's 1030AM and it has 4 hours uptime.  When the system crashes, it simply reboots and comes back online like nothing happened.

    The hardware (minus the hard drive and network cards) was new in August:
    ECS 755-A2 Motherboard (SiS Chipset, onboard network disabled)
    Sempron CPU
    1Gb RAM
    4 3Com PCI network cards

    Does anyone have ideas on what could be going on or what I should do to test it?  Thanks!



  • Had a similar issue with a power supply unit that couldnt deliver the necessary current on the 12V line.
    do you know http://www.ultimatebootcd.com/ ?
    Maybe you could do a burnin test with it and see if it happens also when under different load than from pfSense.
    Do you have the same problem when you install another OS?



  • I've never tried another OS on the system.  I downloaded the bootdisk and ran the CPU Stress test for 25 min and Prime95 for 20 min with no errors.  I didn't want to take the machine down for too long because people get cranky when they don't have internet!

    I suppose it could be a power supply problem.  The power supply is a CoolerMaster RS-430PMSR, which I thought was a good brand.  400W should be plenty for my system.

    Are there any ideas on that error message I got?



  • @swanny:

    When the system crashes, it simply reboots and comes back online like nothing happened.

    When I was running versions around the 1.0.1 time frame I never had any problems on my Nokia IP330 box. Now for the last few months with snapshots of 1.2, even 1.2 RC2 mine will randomly reboot as well. Mine seems to be every 20-30 days and it comes back up like nothing happened. I haven't been able to figure it out as the box works perfectly fine the rest of the time.



  • rsw686: would you be willing to get a kernel on there with debugging and get a backtrace (assuming it's panic'ing)



  • swanny: your problems sound memory related, basic on the panic you're seeing and other symptoms you describe. it can take a lot longer than what you've tested for intermittent failures to show up. Your power supply is adequately sized, but if it's flaky, that could cause this as well. It could be any number of other components as well.



  • @cmb:

    rsw686: would you be willing to get a kernel on there with debugging and get a backtrace (assuming it's panic'ing)

    Sure, is that going to work considering its an embedded install? I'm all up for figuring out why it does this.



  • Yeah, I can get you an embedded kernel with that built in, though don't have one right now. I'll email you when I have one you can try.



  • @rsw686:

    @swanny:

    When the system crashes, it simply reboots and comes back online like nothing happened.

    When I was running versions around the 1.0.1 time frame I never had any problems on my Nokia IP330 box. Now for the last few months with snapshots of 1.2, even 1.2 RC2 mine will randomly reboot as well. Mine seems to be every 20-30 days and it comes back up like nothing happened. I haven't been able to figure it out as the box works perfectly fine the rest of the time.

    Similar problem here.
    PFSense 1.2 as sW
    The box is an Athlon XP, 1 GB ram, IDE HDD
    I have used this box for 3 years as SMB server (Linux), without problem.
    Only HW difference: added 4 new nics (2 already present for a total of 6. Now in testing only LAN and WAN connected/configured).
    Every tot hour the box silently reboots and comes back online like nothing happened.

    Regards, P.



  • Try removing a few of those nics.



  • @sullrich:

    Try removing a few of those nics.

    A nic can create this problem?
    There is a way to debug?

    Thanks, P.



  • Well, if the only change was adding the NICs then that makes them the most likely suspect.



  • Process of elimination young jedi.



  • @PaoloA:

    A nic can create this problem?

    Sure, with that many NIC's you could be getting interrupt handler problems, for instance.

    You're lucky that you can get a reboot within an hour.  Take all 4 out, wait an hour.  Put 2 back in, wait an hour - binary search it like that until you achieve stability. :)



  • I'm having the same problem, and it happens very randomly.  Sometimes within an hour, sometimes it takes a few days.

    I had only two nics in my box.  First, is a 3Com 3c905-TX, second was a Netgear FA-310TX.  Then I tried replacing the Netgear with an Intel Pro S desktop adapter, and attempted it again.  It seems like it was working fine for a while, but it just rebooted about half an hour ago.  Possibly pfsense 1.2 doesn't like my 3Com NIC?



  • Or more likely you too have some dodgy hardware.  Random reboots are usually related to hardware problems, including a lack of sufficient cooling.  Sometimes they can be down to driver problems, but that's pretty uncommon these days.  Actually, with the core drivers in any modern OS, including FreeBSD, it's pretty much unheard of.

    Some time back Toms Hardware posted a useful list of free and commercial tools to allow you to identify faulty hardware.  If you're having intermittent problems it gives you a good starting point for diagnosing the fault(s).


Locked