Random Crashing



  • Just loaded up a pfsense box

    HW:
    AMD Athlon XP 3000+
    1GB DDR
    WD 250GB SATA HDD

    Box is located in a rack in server room and is staying cool.

    Also passed an inquisitor burn in test

    Getting random restarts

    I see the crash dumps but nothing is jumping at me (weak linux skills )

    Here is a link to the crash dump on paste bin

    http://pastebin.com/YBJc3ghp

    Only thing I saw was a DMA write delay right before it crashed so I installed PF to a flash drive and tried it and still getting crashes.

    Can anyone give me any ideas on what is going on?

    Thanks,
    Lothar863


  • Netgate Administrator

    I would normally suggest, bad hard drive, bad ram, overheating, bad psu  in that order but it seems like you've tested for that. It's an older box and bad caps on the board seem a likely suspect but that too should be shown by a burn in.

    Which install type are you using? What NICs? (looks like 6x Realteks)

    Any reason you're using 2.1.1 and not 2.1.2?

    Any idea what this address is and why it keeps moving MACs?

    <6>arp: 172.29.10.25 moved from 00:15:17:54:ee:00 to 00:15:17:54:ee:01 on re0
    <6>arp: 172.29.10.25 moved from 00:15:17:54:ee:01 to 00:15:17:54:ee:00 on re0

    Steve



  • on 2.1.1 after reload to the flash drive. prior to reload was on 2.1.2

    installed using new image downloaded from site and put onto a flash drive. then installed to another flash drive connected to native ports on the back of the system

    the 172 is a virtual nic on another pf sense box at a remote location

    not sure why the mac would change

    checked all the caps before building box and no sign of swelling.

    current temp is about 20 c in the server room and 23c on the cpu heat sink cpu reports 44F in bios



  • i was incorrect about the 172 that is going to a intel pro dual nic card. trunk may not be working properly on that card or it was connected tot he other port.



  • I am not familiar with "Inquisitor", but it seems to be a package of many different stress tests and "inquisitor burn in test" seems to reference specifically the CPU burn-in test. Have you actually done any specific memory tests?


  • Netgate Administrator

    Ah, good point. I just assumed it was a combined hardware test. Does it test the RAM and HD?
    Which image are you using? Are you running full install?

    Steve



  • Defualt test is:

    1800 sec cpu burn
    19 step memory test
    destructive HDD read write test

    and then repeat until stopped

    also ran memtest 86+ and no errors



  • http://pastebin.com/YBJc3ghp

    another one. I have turned off the 172 and it is still crashing


  • Netgate Administrator

    Nothing jumps out at me I'm afraid.  :(
    You may have wait for someone who can read those crash dumps correctly.

    The process that seems to be causing the problem is tcpdump so perhaps it's trying to do something that your NICs don't support. Do you know which Realtek NICs they are?Try disabling all the hardware off loading features in System: Advanced: Networking:
    That's pretty much a guess but easy to try.

    Steve