Random reboots



  • I've been having an issue with pfsense rebooting at random every 2 or 3 weeks. The system log shows nothing at all for several hours prior to the start of the boot sequence.

    Example-

    Sep 7 09:23:22	kernel		Cache level 1:
    Sep 7 09:23:22	kernel		LoUU:2 LoC:2 LoUIS:2
    Sep 7 09:23:22	kernel		UMULL, SMULL, SIMD(ext)
    Sep 7 09:23:22	kernel		Optional instructions:
    Sep 7 09:23:22	kernel		Multiprocessing, Thumb2, Security, VMSAv7, Coherent Walk
    Sep 7 09:23:22	kernel		CPU Features:
    Sep 7 09:23:22	kernel		CPU: ARM Cortex-A9 r4p1 (ECO: 0x00000000)
    Sep 7 09:23:22	kernel		FreeBSD clang version 5.0.1 (tags/RELEASE_501/final 320880) (based on LLVM 5.0.1)
    Sep 7 09:23:22	kernel		root@buildbot2.netgate.com:/xbuilder/crossbuild-243/work/armv6/obj/cETt6vGN/arm.armv6/xbuilder/crossbuild-243/pfSense/tmp/FreeBSD-src/sys/pfSense-SG-3100 arm
    Sep 7 09:23:22	kernel		FreeBSD 11.1-RELEASE-p10 #0 r313908+9347c667615(factory-RELENG_2_4): Thu May 10 17:03:14 CDT 2018
    Sep 7 09:23:22	kernel		FreeBSD is a registered trademark of The FreeBSD Foundation.
    Sep 7 09:23:22	kernel		The Regents of the University of California. All rights reserved.
    Sep 7 09:23:22	kernel		Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    Sep 7 09:23:22	kernel		Copyright (c) 1992-2017 The FreeBSD Project.
    Sep 7 09:23:22	syslogd		kernel boot file is /boot/kernel/kernel
    Sep 7 01:01:01	php-cgi		rc.dyndns.update: phpDynDNS (odin.dahoney.me): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
    Sep 7 00:08:13	php		/usr/local/pkg/snort/snort_check_for_rule_updates.php: End of portal.pfsense.org configuration backup (success).
    Sep 7 00:08:11	php		/usr/local/pkg/snort/snort_check_for_rule_updates.php: Beginning https://portal.pfsense.org configuration backup.
    Sep 7 00:08:11	check_reload_status		Syncing firewall
    

    The reboot occurred at 09:23:22 but there were no entries for over 8 hours prior to that. Would really appreciate any advice on where to start looking to track this down.


  • Netgate Administrator

    Check the monitoring graphs for CPU or memory spikes leading up to that. A lack of any data there could also be telling.

    If you can hook up a console and log the output there might be more there if it's unable to write to the system log for some reason, a failing drive for example.

    Steve



  • @stephenw10
    graph

    Is this the right graph? I don't see anything unusual here.


  • Netgate Administrator

    Indeed, I don't see anything spiking there but it did stop logging data entirely.
    I would hook up a console to log any output there if you can. When there is nothing logged at all like that it does start to point towards a hardware issue though. That's an SG-3100 I assume?

    Steve



  • @stephenw10
    I can hookup console no problem. Not sure how to setup the logging though on my Mac.

    I’m investigating buying replacement ram (rather than a long shutdown to run memtest) to test that. Just need to figure out what to buy.🙂

    If that doesn’t work I may just contact Netgate regarding warrantied replacement.


  • Netgate Administrator

    The RAM is non-replacable in the SG-3100.
    If you have not re-installed it clean I would certainly try that first:
    https://www.netgate.com/docs/pfsense/solutions/sg-3100/reinstall-pfsense.html

    Logging the console output in OSX is not something I've ever tried to do though I don't imagine it's difficult with the right terminal client. I always use putty in Linux or Windows(if forced).

    Steve



  • @stephenw10 said in Random reboots:

    If you have not re-installed it clean I would certainly try that first:

    Hasn't been more than a couple of months since the last clean install but worth a shot. I'll also work out the console logging to see if that turns up anything useful.



  • @stephenw10 said in Random reboots:

    The RAM is non-replacable in the SG-3100.
    If you have not re-installed it clean I would certainly try that first:
    https://www.netgate.com/docs/pfsense/solutions/sg-3100/reinstall-pfsense.html

    Logging the console output in OSX is not something I've ever tried to do though I don't imagine it's difficult with the right terminal client. I always use putty in Linux or Windows(if forced).

    Steve

    I've got console logging started, but never tried this before. Should I just be logging the shell window?


  • Netgate Administrator

    Yes, just log all the output there. If there is some drive error etc it will show on the console.

    Steve


  • Netgate

    Yeah Terminal by default will just keep a scrollback buffer.

    But I use C-Kermit for serial console. Not sure if screen etc changes things.

    If you're using screen, Ctrl-A H might be a friend.



  • @derelict
    Actually, I already had the serial app installed. It will save everything to a text file.



  • No luck with the console messages. Had a reboot this morning. Nothing shown on the console other than the bootup.😞

    Guess I'll give the clean install a try next?



  • Is it possible that a latency alarm could trigger this behavior?

    Looking through all my logs I see a latency alarm occurred about the same time as the reboot.

    Sep 14 02:59:58	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 30% dest_addr 8.8.8.8 bind_addr 24.211.135.107 identifier "WAN_DHCP "
    
    Sep 14 02:59:51	syslogd		kernel boot file is /boot/kernel/kernel
    

    Time stamp shows the reboot happened first though, so maybe the latency is being caused by the boot process???

    Edit: Nevermind. Looks like I read the log entry wrong. Now that I’ve had a little sleep I see that it’s just showing the current settings.


  • Netgate Administrator

    Yes that is logged when apinger is started with whatever settings are configured.

    Nothing in the logs, no error reports and nothing on the console looks like either a hardware issue or a power problem.

    I assume nothing else is rebooting at that time though.

    Steve



  • No. Nothing else rebooting. I doubt it’s an external power problem. The unit is powered via a rack mounted APC ups, so power should be relatively clean.


  • Netgate Administrator

    Over heating then maybe, the PSU brick or the CPU.

    Steve



  • I believe this is the cpu temp. It seems to always run a little warm to me (between 64 and 70C). Not sure what a normal temp would be on the 3100.

    http://i67.tinypic.com/28as01x.jpg

    No idea about power supply temps.



  • @stephenw10 said in Random reboots:

    If you have not re-installed it clean I would certainly try that first:
    https://www.netgate.com/docs/pfsense/solutions/sg-3100/reinstall-pfsense.html

    Finally got a chance to do the re-install. Install failed at first. The USB port is extremely hard to insert a flash drive into and I didn't have it fully inserted. After a good hard push and another reboot I think it went ok. We'll see what happens.


  • Netgate Administrator

    That temperature looks normal for the 3100. Nothing I haven't seen here.

    Steve



  • @stephenw10 said in Random reboots:

    That temperature looks normal for the 3100. Nothing I haven't seen here.

    Steve

    Thanks. I might change my set points for temp alerts so that it’s not in the yellow as much.



  • I'm also having the same issue in a 3100. Random restarts. I'm suspecting temperature. It constantly runs at 70+. I've seen 77-78 at a point. Couldn't get to log historical data of temperature though.



  • @dbinoj said in Random reboots:

    Couldn't get to log historical data of temperature though.

    Yeah, I'd lover to know how to do this as well. The clean install seems to have fixed my issue though [fingers crossed].


  • Netgate

    Is it sitting on top of something hot? The base plate of the 3100 is a passive heat sink so if it doesn't have decent airflow it would likely warm up.



  • @derelict Yeah, it was on top of a mac mini and ambient temp is ~32c. Now after moving from the mac mini's top, temperature is at 67c.

    At what temperature the 3100 resets itself?


  • Netgate Administrator

    I have not personally ever seen it reset due to overheating even when load testing.
    I know that during manufacturer testing they were run in up to 65°C ambient so I would expect quite a bit hotter than what you're seeing.

    Steve



  • If overheating is indeed the root cause, I'm very concerned that the sg-3100 does not have an adequate heatsink. Are owners expected to place the device on a ice block? There's barely 1/8 of an inch clearance with the rubber feet. How are owners supposed to get airflow from underneath?



  • @msf2000
    I think it’s unlikely that my reboots were due to overheating since a clean install seems to have corrected the problem. Still keeping my fingers crossed, but it’s been 2 weeks with no reboots.


  • Netgate Administrator

    This post is deleted!

  • Netgate Administrator

    @msf2000

    They are not and there is no problem as far as I know. As I said they were tested to far higher temperatures.

    Steve



  • @stephenw10 hopefully they are not. I've never had a reboot after I moved mine from top of a Mac mini to a rack tray. So I thought it must be the temperature. If not, that reboot will be a mystery for me because there is no logs.



  • I thought this issue had resolved itself but looks like I was wrong. Just had another unexpected reboot. Nothing in the logs and nothing captured via serial console messages.

    Oct 14 08:10:12	php		/usr/local/pkg/apcupsd_mail.php: Message sent to walterstarks@icloud.com OK
    Oct 14 08:11:25	apcupsd	33988	Communications with UPS restored.
    Oct 14 08:11:27	php		/usr/local/pkg/apcupsd_mail.php: Message sent to walterstarks@icloud.com OK
    Oct 14 13:10:02	snort	33227	invalid appid in appStatRecord (4385)
    Oct 14 14:25:00	snort	33227	invalid appid in appStatRecord (4385)
    Oct 14 16:30:01	php-fpm	22709	/diag_backup.php: Successful login for user 'backup' from: 10.0.1.20 (Local Database)
    Oct 14 18:24:06	syslogd		kernel boot file is /boot/kernel/kernel
    Oct 14 18:24:06	kernel		Copyright (c) 1992-2018 The FreeBSD Project.
    Oct 14 18:24:06	kernel		Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
    Oct 14 18:24:06	kernel		The Regents of the University of California. All rights reserved.
    Oct 14 18:24:06	kernel		FreeBSD is a registered trademark of The FreeBSD Foundation.
    

    No idea where to go from here. ❓