Zotac CI323 crashes / shuts down / sleeps / turns off nics unexpectately



  • I have a Zotac CI323 directly running PfSense 2.4.3-p1 on SSD. It was running without any trouble until recently. Without any configuration change and after running for at least 3 week successfully, now the Zotac seems to crash without notice every few minutes or hours.

    Sympthoms:

    • PfSense starts normally: everything is working fine
    • After 15 minutes the NICs lights turn off, PfSense is not reachable anymore. The power led on the Zotac remains green. Nothing is being logged anymore in the PfSense logs and I cannot find clues that PfSense started a shutdown or encountered a problem. (NB. after a BIOS update the 15 minutes are extended to 1-3 hours before crash)
    • When I press the power button (while on green led) the Zotac gives a boot-sound and it boots normally: the cycle repeats itself

    I have tried many things to find the problem:

    • updating the BIOS (now I have some extra time before a crash)
    • using the 1.94 Realtek drivers
    • turning off all VPN services
    • disable offloading
    • disable use of AES-NI
    • act as if the WAN is always online
    • running without a WAN cable connected

    Nothing solved the problem.

    Temperatures of the Zotac are below 50C. So overheating shouldn't be the problem.
    Does anyone have a clue what I can do next?



  • Any error messages in the console or syslog? With the information you provided all that can be said is: get better hardware.



  • Nope, after boot the only logging is a php-fpm message about me logging in to web portal (when I do so). The DHCP logging continues until the crash, providing leases without any problems and then there is silence...



  • @martinnl said in Zotac CI323 crashes / shuts down / sleeps / turns off nics unexpectately:

    Nope, after boot the only logging is a php-fpm message about me logging in to web portal (when I do so). The DHCP logging continues until the crash, providing leases without any problems and then there is silence...

    Well you should read the console when or shortly after it crashed (hint attach a monitor or use a serial console and log it's output), after rebooting it's obviously gone. Also check the syslog and maybe dmesg, not the DHCP log.



  • Hi,

    Do you have console access with a keyboard and vga, or a USB-serial access ?
    If so, boot, enter, and shut down one or more interfaces (the box becomes useless, that just fine during testing). If it stops crashing, enable them one by one until you found the faulty NIC.
    If the systems keeps crashing with no interfaces, doing pretty much nothing, well .....



  • @grimson
    I checked the dmesg and I've been watching the logs using the next command:

    clog -f /var/log/system.log
    

    Appearently I do have some errors around the syslogd, possibly explaining a lack of logging:

    Aug 28 21:27:10	root		/etc/rc.d/hostid: WARNING: hostid: unable to figure out a UUID from DMI data, generating a new one
    Aug 28 21:27:12	syslogd		Logging subprocess 25293 (exec /usr/local/sbin/sshlockout_pf 15) exited due to signal 15.
    Aug 28 21:27:12	syslogd		exiting on signal 15
    Aug 28 21:27:13	syslogd		kernel boot file is /boot/kernel/kernel
    

    I'm trying to figure out what this means and how to resolve it.



  • I run two CI323s in a similar configuration and they've been very stable. It makes me wonder a bit about the hardware. Do you have an alternate stick of RAM you could try swapping out, and/or running memtest86 on the existing stick? If you could spare the machine for a while, you could also boot Ubuntu or another Linux distro from a thumb drive and do some stress testing to see whether you can reproduce crashes outside of pfSense.

    Also if I can inject a question of my own . . . you mentioned updating the BIOS. Can I assume that means you're running the latest (Version 2K180507)? If so, did you experience any issues? I tried updating one of my CI323s to that BIOS as soon as it came out, but it had the effect of making pfSense "lose" one of the NICs. I double checked all the BIOS settings, but on boot pfSense consistently saw only one NIC and bailed, so I had to downgrade to the prior BIOS release.


  • Netgate Administrator

    None of those are unusual.

    As others have said if you are going to see any error output it will be on the physical console so you need to leave it hooked up to catch that.

    If it appears to be hung at the console try pressing Ctl+t. That can reveal a hung process the system is waiting for.

    It sounds more like a hardware fault though. A bad disk for example can behave like that.

    Steve



  • Good point, disk testing would be advisable as well. For both memory and disk testing, the Ultimate Boot CD (a bit of an anachronism, you can just throw it on a thumb drive) is a great resource: http://www.ultimatebootcd.com/



  • @thenarc said in Zotac CI323 crashes / shuts down / sleeps / turns off nics unexpectately:

    Also if I can inject a question of my own . . . you mentioned updating the BIOS. Can I assume that means you're running the latest (Version 2K180507)? If so, did you experience any issues? I tried updating one of my CI323s to that BIOS as soon as it came out, but it had the effect of making pfSense "lose" one of the NICs. I double checked all the BIOS settings, but on boot pfSense consistently saw only one NIC and bailed, so I had to downgrade to the prior BIOS release.

    Yes I'm running the 2K180507 BIOS and I had no problems upgrading to this BIOS: both NICs work fine.
    I've enabled Wake-on-LAN and run Win8 boot sequence.



  • @martinnl said in Zotac CI323 crashes / shuts down / sleeps / turns off nics unexpectately:

    @thenarc said in Zotac CI323 crashes / shuts down / sleeps / turns off nics unexpectately:

    Also if I can inject a question of my own . . . you mentioned updating the BIOS. Can I assume that means you're running the latest (Version 2K180507)? If so, did you experience any issues? I tried updating one of my CI323s to that BIOS as soon as it came out, but it had the effect of making pfSense "lose" one of the NICs. I double checked all the BIOS settings, but on boot pfSense consistently saw only one NIC and bailed, so I had to downgrade to the prior BIOS release.

    Yes I'm running the 2K180507 BIOS and I had no problems upgrading to this BIOS: both NICs work fine.
    I've enabled Wake-on-LAN and run Win8 boot sequence.

    Thanks for the info, maybe I'll be brave and try again ;) Not sure why it went sideways for me, but at least I know it can work now.



  • @stephenw10 @TheNarc

    I'll do A) some log-catching first. I have a spare SSD and memory too (both different brands). I'll have a look if I can B) image the disk and place it on the other SSD and swap memory.
    Plan C) would be to put an other OS on the system :)



  • Mine started acting up when I moved from 2.3 embedded to 2.4, then 2.4 was stable for some reason, then again at one point it started acting up recently. Last thing I did was plug in the UPS USB cable for apcupsd

    Also, interestingly enough, when I plugged in that usb cable, it freaked out and either crashed or rebooted, I can't remember, I run it headless and could never get anything useful out of the log either.

    I have removed the usb cable / connection and it's been running fine for 7 days.

    I'm pretty sure the first time i had 2.4 and it was unstable, I had a logitech usb wireless receiver for keyboard/mouse plugged in and at some point I removed it and it was stable for approx 70 days.

    So, while I can't confirm at the moment, I think there's something going on with the USB. If you have anything USB plugged in, try removing it and see what happens.



  • @gertjan said in Zotac CI323 crashes / shuts down / sleeps / turns off nics unexpectately:

    If so, boot, enter, and shut down one or more interfaces (the box becomes useless, that just fine during testing). If it stops crashing, enable them one by one until you found the faulty NIC.

    I disabled all interfaces except the LAN interface and I shut down all services except the syslogd service. PfSense is now running for 16 hours without problems. Though pretty much bricked.

    I'm bringing services back on every 3 hours now and then see when it crashes again. Now running including the DHCP service an NTP service.



  • @duren said in Zotac CI323 crashes / shuts down / sleeps / turns off nics unexpectately:

    So, while I can't confirm at the moment, I think there's something going on with the USB. If you have anything USB plugged in, try removing it and see what happens.

    Thx for the suggestion. That won't be a cause in my case: I have nothing plugged in except two ethernet cables and the power supply.



  • I'd suspect that the power supply is flaking out. But memory test can't hurt.



  • Power supply is an interesting thought . . . the CI323 uses an external AC-to-DC brick. Are you still using the one that came with the machine? If you have another with the same specs (same voltage, same polarity, and at least as high a current rating as the official brick) you could try swapping that out.



  • @vamike said in Zotac CI323 crashes / shuts down / sleeps / turns off nics unexpectately:

    I'd suspect that the power supply is flaking out. But memory test can't hurt.

    I had this very issue. I ended up trading mine out for a rackmount unit because my Zotac kept crashing and turning completely off. Took me a few times until I realized that if I wiggled the cord, it would turn on and off. No matter how I turned or secured the cord, it would lose power randomly. Plus I needed more NIC ports so it was inevitable to switch out. Great little box until I ran into those issues a few months ago.

    Note - Mine was the internal power barrel connectors and not the brick itself. That's why I didn't bother trying to fix it. Could be an easy fix with some solder, but oh well :)



  • Ok, my CI323 is back on track! It has run for 48 hours until a manual reboot and is now running over 60 hours.

    But what was the missing puzzle in the unexpected shutdowns?
    I haven't been able to reproduce the error, but I guess that resetting all logs and enlarging the maximum file size did the trick.

    I have reset the logs in order to solve the 'signal 15' message. But without the effected result: the 'signal 15' message is still there.
    After this step I have disabled all but the LAN interface and all but the syslogd service. Bringing back all services and interfaces one by one, every 3 hours. Ending up in the situation that all interfaces and services were up, without the system shutting down anymore.
    After 48 hours I've rebooted to start with all interfaces and services enabled by default. And still it did not shut down anymore.

    As the only actual change in the system is de reset of the logging I would say this solved my problem.
    An other option would be that the gradual enabling of the services/interfaces could have broken some kind of chain of events. This seems less likely to me.

    So: I'm happily using my CI323 again! (on 1.95 realtek drivers).
    I'll be back when i'm able to reproduce the error (solving).


  • Netgate Administrator

    I would be surprised if that is what solved it. The behaviour really looks hardware related.

    Still, it would be nice if that has resolved it. 😉

    Steve


 

© Copyright 2002 - 2018 Rubicon Communications, LLC | Privacy Policy