Random reboots
-
I've been having an issue with pfsense rebooting at random every 2 or 3 weeks. The system log shows nothing at all for several hours prior to the start of the boot sequence.
Example-
Sep 7 09:23:22 kernel Cache level 1: Sep 7 09:23:22 kernel LoUU:2 LoC:2 LoUIS:2 Sep 7 09:23:22 kernel UMULL, SMULL, SIMD(ext) Sep 7 09:23:22 kernel Optional instructions: Sep 7 09:23:22 kernel Multiprocessing, Thumb2, Security, VMSAv7, Coherent Walk Sep 7 09:23:22 kernel CPU Features: Sep 7 09:23:22 kernel CPU: ARM Cortex-A9 r4p1 (ECO: 0x00000000) Sep 7 09:23:22 kernel FreeBSD clang version 5.0.1 (tags/RELEASE_501/final 320880) (based on LLVM 5.0.1) Sep 7 09:23:22 kernel root@buildbot2.netgate.com:/xbuilder/crossbuild-243/work/armv6/obj/cETt6vGN/arm.armv6/xbuilder/crossbuild-243/pfSense/tmp/FreeBSD-src/sys/pfSense-SG-3100 arm Sep 7 09:23:22 kernel FreeBSD 11.1-RELEASE-p10 #0 r313908+9347c667615(factory-RELENG_2_4): Thu May 10 17:03:14 CDT 2018 Sep 7 09:23:22 kernel FreeBSD is a registered trademark of The FreeBSD Foundation. Sep 7 09:23:22 kernel The Regents of the University of California. All rights reserved. Sep 7 09:23:22 kernel Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Sep 7 09:23:22 kernel Copyright (c) 1992-2017 The FreeBSD Project. Sep 7 09:23:22 syslogd kernel boot file is /boot/kernel/kernel Sep 7 01:01:01 php-cgi rc.dyndns.update: phpDynDNS (odin.dahoney.me): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Sep 7 00:08:13 php /usr/local/pkg/snort/snort_check_for_rule_updates.php: End of portal.pfsense.org configuration backup (success). Sep 7 00:08:11 php /usr/local/pkg/snort/snort_check_for_rule_updates.php: Beginning https://portal.pfsense.org configuration backup. Sep 7 00:08:11 check_reload_status Syncing firewall
The reboot occurred at 09:23:22 but there were no entries for over 8 hours prior to that. Would really appreciate any advice on where to start looking to track this down.
-
Check the monitoring graphs for CPU or memory spikes leading up to that. A lack of any data there could also be telling.
If you can hook up a console and log the output there might be more there if it's unable to write to the system log for some reason, a failing drive for example.
Steve
-
Is this the right graph? I don't see anything unusual here.
-
Indeed, I don't see anything spiking there but it did stop logging data entirely.
I would hook up a console to log any output there if you can. When there is nothing logged at all like that it does start to point towards a hardware issue though. That's an SG-3100 I assume?Steve
-
@stephenw10
I can hookup console no problem. Not sure how to setup the logging though on my Mac.I’m investigating buying replacement ram (rather than a long shutdown to run memtest) to test that. Just need to figure out what to buy.
If that doesn’t work I may just contact Netgate regarding warrantied replacement.
-
The RAM is non-replacable in the SG-3100.
If you have not re-installed it clean I would certainly try that first:
https://www.netgate.com/docs/pfsense/solutions/sg-3100/reinstall-pfsense.htmlLogging the console output in OSX is not something I've ever tried to do though I don't imagine it's difficult with the right terminal client. I always use putty in Linux or Windows(if forced).
Steve
-
@stephenw10 said in Random reboots:
If you have not re-installed it clean I would certainly try that first:
Hasn't been more than a couple of months since the last clean install but worth a shot. I'll also work out the console logging to see if that turns up anything useful.
-
@stephenw10 said in Random reboots:
The RAM is non-replacable in the SG-3100.
If you have not re-installed it clean I would certainly try that first:
https://www.netgate.com/docs/pfsense/solutions/sg-3100/reinstall-pfsense.htmlLogging the console output in OSX is not something I've ever tried to do though I don't imagine it's difficult with the right terminal client. I always use putty in Linux or Windows(if forced).
Steve
I've got console logging started, but never tried this before. Should I just be logging the shell window?
-
Yes, just log all the output there. If there is some drive error etc it will show on the console.
Steve
-
Yeah Terminal by default will just keep a scrollback buffer.
But I use C-Kermit for serial console. Not sure if screen etc changes things.
If you're using screen,
Ctrl-A H
might be a friend. -
@derelict
Actually, I already had the serial app installed. It will save everything to a text file. -
No luck with the console messages. Had a reboot this morning. Nothing shown on the console other than the bootup.
Guess I'll give the clean install a try next?
-
Is it possible that a latency alarm could trigger this behavior?
Looking through all my logs I see a latency alarm occurred about the same time as the reboot.
Sep 14 02:59:58 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 30% dest_addr 8.8.8.8 bind_addr 24.211.135.107 identifier "WAN_DHCP "
Sep 14 02:59:51 syslogd kernel boot file is /boot/kernel/kernel
Time stamp shows the reboot happened first though, so maybe the latency is being caused by the boot process???
Edit: Nevermind. Looks like I read the log entry wrong. Now that I’ve had a little sleep I see that it’s just showing the current settings.
-
Yes that is logged when apinger is started with whatever settings are configured.
Nothing in the logs, no error reports and nothing on the console looks like either a hardware issue or a power problem.
I assume nothing else is rebooting at that time though.
Steve
-
No. Nothing else rebooting. I doubt it’s an external power problem. The unit is powered via a rack mounted APC ups, so power should be relatively clean.
-
Over heating then maybe, the PSU brick or the CPU.
Steve
-
I believe this is the cpu temp. It seems to always run a little warm to me (between 64 and 70C). Not sure what a normal temp would be on the 3100.
No idea about power supply temps.
-
@stephenw10 said in Random reboots:
If you have not re-installed it clean I would certainly try that first:
https://www.netgate.com/docs/pfsense/solutions/sg-3100/reinstall-pfsense.htmlFinally got a chance to do the re-install. Install failed at first. The USB port is extremely hard to insert a flash drive into and I didn't have it fully inserted. After a good hard push and another reboot I think it went ok. We'll see what happens.
-
That temperature looks normal for the 3100. Nothing I haven't seen here.
Steve
-
@stephenw10 said in Random reboots:
That temperature looks normal for the 3100. Nothing I haven't seen here.
Steve
Thanks. I might change my set points for temp alerts so that it’s not in the yellow as much.