Random reboots

GP

Hi all

I run several boxes with pfSense on it and am really grateful for such an awesome product.
My latest box shows a weird behavior, it restarts randomly. I get the following notification by mail: xx.xx.xx Bootup complete

This happens in random intervals with no visible pattern. Sometimes after two days, sometimes after a week, sometimes every day. The version is 2.4.3-RELEASE-p1 and the software runs on a PC-Engines APU2.

The average temperature is around 60°C and the box is only used as DHCP server, OpenVPN server and has some firewall rules on it.

The system log tells me the following:

Aug 10 01:01:01 php-cgi rc.dyndns.update: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
Aug 11 01:01:01 php-cgi rc.dyndns.update: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
Aug 12 01:01:00 php-cgi rc.dyndns.update: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
Aug 12 17:29:00 syslogd kernel boot file is /boot/kernel/kernel
Aug 12 17:29:00 kernel Copyright (c) 1992-2017 The FreeBSD Project.
Aug 12 17:29:00 kernel Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994

Could you point me in the direction where I could possibly find the reason for these reboots?

Thank you in advance!

jimp

A random reboot is usually physical or hardware. Things like power issues, overheating, faulty hardware and so on. That isn't 100%, but more often than not it's something along those lines.

The first thing to do is connect something to its serial console and have it log the output over time. Then, when it fails, see what showed up on the console. The logs won't tell you everything, but the console output will have more detail. For example, if there is an issue with the disk, it wouldn't be able to write the logs to keep the output, but after a hardware reset the disk may have recovered.

GP

Jimp, thank you very much for your suggestion! I was finally able to catch such a reboot with putty logging.

First there are several items like this:

ahcich0: Timeout on slot 17 port 0
ahcich0: is 00000008 cs 00000000 ss 00000000 rs 00038000 tfd 40 serr 00000000 cmd 00407117
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 30 8f 7e 40 03 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command

In between of twenty or so retries, there is a line

ada0: <SATA SSD SBFM01.1> s/n DDDF077C02EE05628026 detached

Finally it ends with

panic: I/O to pool 'zroot' appears to be hung on vdev guid 1346866773727312689 at '/dev/ada0p3'.
cpuid = 1
KDB: enter: panic
[ thread pid 0 tid 100025 ]

and then it dumps a whole lot if different information. In the end it tries to write it to disk which fails and then the reboot happens:

db:0:kdb.enter.default>  capture off
db:0:kdb.enter.default>  textdump dump
textdump_writeblock: offset 2147483136, error 6
Textdump: Error 6 writing dump
db:0:kdb.enter.default>  reset
cpu_reset: Restarting BSP

So I would assume that the 16 GB mSATA SSD module has a problem - or should I look somewhere else?

jimp

You are correct, that would appear to be a symptom of a failing disk