Random crashes
-
Removed the nvme drive, installed a 2.5 standard HD, reinstalled 2.3.4, updated it to 2.3.4-p1 and installed the same packages. After restoring the config file everything worked well as expected, but this morning I checked the uptime and it looks like it rebooted itself again around 4.40am.
It always happens within 36hrs and again I can't see anything in the logs :(
I'm going to remove packages one at the time and see what happens. Removed ntopng and manually rebooted. Next step will be disable the traffic shaper and after that overnight memtest on the box.
-
After more troubleshooting and keeping a serial console always connected I was able to determine that the crashes are caused by the HD getting detached for some unknown reason:
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <st9160310as de06=""> s/n 5GV6HR23D detached (ada0:ahcich0:0:0:0): Periph destroyed /: got error 6 while accessing filesystem</st9160310as>
I could try a different HD, but I think the current one is good as it was working just fine on my previous pfSense box (laptop). Also, I don't know for sure if the same thing was happening with the nvme drive since I did not have a console connected at the time.
What I find interesting is that it always happens between 24 and 28 hours of uptime. Once 24 have passed I know that a crash is imminent.
-
Just happened again, after about 26 hours, with the same storage detached issue.
Is there anything at the OS level that would cause the storage to get detached maybe after inactivity?
This is also a Skylake based CPU/chip set, are there any known compatibility issues?I'll try a different HD, but I'm not confident it will change anything. After that I might consider giving 2.4 a try… I'm really puzzled, because when it works it works very well... So frustrating
-
Unfortunately I have to report that things have not improved for me. I tried two standard HD, switched to 2.4 and tried the NVME drive in efi mode which, at least now, is supported by the installer. I replaced the ram with specific brand and model on the DS68U compatibility list, even if the existing one passed memtest86 with flying colors.
Nothing! every day, between 24hrs and 31hrs (new record) of uptime the box just spontaneously reboots. It can be in the middle of the night or any time during the day, but never before reaching at least 24 hrs of uptime. When it runs, it works great, performance is good, load low, temps stable around 34C, SMART reporting all good. I am at a loss…
I will have to schedule a daily reboot at night with cron at this point I don't know what else to do.
-
Did you ever figure this out? i converted my old skylake based server into a router and I'm having the same issue. I switched the ssd drive out thinking it was it. Changed the bios sata controller to ide mode and it seemed stable for a long time(a full month without demounting the drive) It seems totally random, sometimes I can't go a few hours without having an issue. Going to try a fresh install with a usb drive as the mount instead tomorrow.
-
You see a crash report?
Anything on the console?
If it reboots at random with nothing logged it's almost certainly hardware.
Steve
-
it's the pref destroyed error with the ssd dismounting. I'm running a USB drive install since this morning without any issues so far.
-
Be sure you don't have SWAP (or at least are not swapping) and have moved /var /tmp to RAM if running from flash.
Also check the root is mounted noatime.
[2.4.4-RELEASE][admin@fw1.stevew.lan]/root: mount -p /dev/diskid/DISK-9E18E959s2a / ufs rw,noatime 1 1 devfs /dev devfs rw 0 0 /dev/diskid/DISK-9E18E959s1 /boot/u-boot msdosfs rw,noatime 0 0 /dev/md0 /tmp ufs rw 2 2 /dev/md1 /var ufs rw 2 2 devfs /var/dhcpd/dev devfs rw 0 0
Steve
-
@stephenw10 said in Random crashes:
noatime
/dev/gptid/ba785815-9ce5-11e9-8bc0-90e2ba09f08c / ufs rw 1 1 devfs /dev devfs rw 0 0 /dev/md0 /tmp ufs rw 2 2 /dev/md1 /var ufs rw 2 2 devfs /var/dhcpd/dev devfs rw 0 0
it seems swap is enabled. I was following this tutorial on disabling "Swap" https://forum.netgate.com/topic/107375/howto-remove-swap-post-install-and-resize/2
/dev/gptid/ba785815-9ce5-11e9-8bc0-90e2ba09f08c / ufs rw 1 1 #/dev/gptid/ba7d42de-9ce5-11e9-8bc0-90e2ba09f08c none swap sw 0 0
Should I change the "ba785815-9ce5-11e9-8bc0-90e2ba09f08c" to something else before rebooting? I'm pretty sure it's good but i wanted to double check before rebooting.
-
Hmm, I've never tried that. I would backup the config re-install, remove the swap during the install.
You should edit the fstab to set it to mount root noatime though if it's not already.
Steve