Intermittent reboots
-
I'm running pfsense 2.7.2-RELEASE on a R86S mini PC and experiencing frequent (few times a day) reboots.
When I check the machine I don't see anything in/var/crash
. I'm exporting the logs to a remote syslog server but I don't see any errors or warnings captured around the time of reboots to help troubleshoot. I've run memtest on the machine overnight and not hit any errors. I also set up a spare machine to act as my router and left this one running but not connected to anything and managed to keep it up for days without a reboot. As soon as I brought it back into service I started encountering the reboots again. I backed up my config and restored it to the spare machine and it ran fine for over a week. It appears to be some interaction between my config, this machine, and some activity that occurs when it's actually in use that's causing the reboot, but I'm at a complete loss as to how I can narrow down what exactly that might be.
Any suggestions for troubleshooting would be appreciated. I'm running pretty stock pfsense, the only packages I have installed are wireguard and pfblockerng-devel.Thanks very much
-
Do you see anything logged locally when that happens? How do you know it's rebooting? Uptime shows low?
If it actually panics and there is no crash report it could be a drive issue. It could also have no SWAP though default install do include SWAP.
Steve
-
@stephenw10 thanks for following up
- Nothing in the local logs, they just start at the kernel bootup sequence, that's why I was hoping remote capture would shed some light
- Uptime shows low, and more obviously my connection drops if I happen to be offline when it happens
- Originally I had it installed on eMMC, but I've done a reinstall to nvme since from some of my reading I thought the storage format might have been why I wasn't able to capture logs
- I've got 1GB of swap installed now. I'm not 100% sure I did on the previous iteration though.
-
Ah, OK. Then, yes, you should now get a crash report if it's panicking.
As a test you can force it to panic at some more convenient time by running:
sysctl debug.kdb.panic=1
It show a crash report after rebooting.
-
@stephenw10 ok that did trigger a crash log as expected. Now I'll just wait for an unexpected reboot. Thanks very much for your help!
-
@stephenw10 alright, I just experienced a couple reboots in a row. No crash log when I booted back up so I guess it's not panicking. Looking at the logs I can see some stuff that might be relevant but nothing I can conclusively point to:
2024-04-02 10:39:05.274 [94169:0] notice: Restart of unbound 1.18.0.
2024-04-02 10:39:05.273 /rc.newwanip: Gateway, NONE AVAILABLE
2024-04-02 10:39:05.272 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% alarm_hold 10000ms dest_addr <My external IP> bind_addr <My external IP> identifier "WAN_DHCP "
Anything in particular you'd suggest I look for in particular or test to help figure this out?
-
Those logs are expected at boot. Or at any time the WAN reconnects.
If it hard reboots without panicking that is almost always a hardware issue unfortunately. About the only thing you can do there to get more info is to log the console output but I don't think that device has a serial console?
-
@stephenw10 I don't think so, unless I can use one of the USB ports as a console. I'm noticing on my spare machine (that I swapped back to) that I'm experiencing a lot of packet loss. Is it possible that an upstream issue like that could be leading to reboots? I have "disable gateway monitoring action" unchecked under routing -> gateways for my WAN so I wouldn't expect so, but the fact that it only seems to reboot when there's actual network activity happening makes me wonder.
-
It shouldn't be possible for anything external to reboot it. You might see a lo of logs or disconnections. Or potentially it could stop passing traffic entirely but it would still remain up. Or panic and log that.