all services fail to start all packages gone
-
Upgraded to 21.05.1 yesterday and everything was working properly after the upgrade so not sure if this is related.
Got a couple of notifications from watchdog that all services had stopped and were being restarted so I logged into the firewall and found that all services were stopped, all packages were gone and there aren't any logs. I restarted the 3100 but issues still persist. I've tried restarting the services but just get failed to start. No idea what to try next?
-
Tried a restore but didn't help. Looks like all monitoring is gone.
-
Serial interface shows this repeating-
kern.ipc.maxpipekva exceeded; see tuning(7) kern.ipc.maxpipekva exceeded; see tuning(7)
-
That is a symptom of something else doing something it shouldn't be a spawning loads of pipes.
Try running abd the command line:
ps -auxwwd
Steve
-
@stephenw10 said in all services fail to start all packages gone:
That is a symptom of something else doing something it shouldn't be a spawning loads of pipes.
Try running abd the command line:
ps -auxwwd
Steve
Thanks. I should have posted this last night but it was late. I started a support ticket with Netgate and they sent me a recovery image for 21.05 within just a few minutes. Took me a bit to re-configure the basic settings and restore the rest from recent backups but got everything up and running a little after midnight.
Tried to capture a status report for tech support before recovery but couldn’t get that to work. Wouldn’t retrieve any of the info. Tech Support also sent me a recovery image for 21.05.1. I’ll try updating via that process later this week when I have a few hours free and see what happens. Hopefully this was just gremlins and they’re gone now. If not I’ve got the recovery images already and can revert to 21.05 again.
-
Ah, good to hear.
-
@stephenw10
Looks like I jinxed myself. System went right back into the same state this afternoon.Is it possible that an automated local backup could cause this? Maybe it's just coincidence but both days this has occurred at roughly the same time as my automated backup runs.
Backup script-
#!/bin/bash BACKUP_HOST=10.0.1.1 BACKUP_USER=backup BACKUP_PASSWORD=redacted # Create config file directory if it doesn't exist [ -d files/ ] || mkdir files # Fetch the login form and save the cookies and CSRF token: wget -qO- --keep-session-cookies --save-cookies cookies.txt \ --no-check-certificate https://${BACKUP_HOST}/diag_backup.php \ | grep "name='__csrf_magic'" | sed 's/.*value="\(.*\)".*/\1/' > csrf.txt # Submit the login form along with the first CSRF token and save the second CSRF token (can’t reuse the same file) – now the script is logged in and can take action: wget -qO- --keep-session-cookies --load-cookies cookies.txt \ --save-cookies cookies.txt --no-check-certificate \ --post-data "login=Login&usernamefld=${BACKUP_USER}&passwordfld=${BACKUP_PASSWORD}&__csrf_magic=$(cat csrf.txt)" \ https://${BACKUP_HOST}/diag_backup.php | grep "name='__csrf_magic'" \ | sed 's/.*value="\(.*\)".*/\1/' > csrf2.txt # Submit the download form along with the second CSRF token to save a copy of config.xml: wget --keep-session-cookies --load-cookies cookies.txt --no-check-certificate \ --post-data "download=download&donotbackuprrd=yes&__csrf_magic=$(head -n 1 csrf2.txt)" \ https://${BACKUP_HOST}/diag_backup.php -O ./files/config_${BACKUP_HOST}_$(date +%Y-%m-%d-%H-%M-%S).xml 2>/dev/null # Clean up rm cookies.txt csrf.txt csrf2.txt unset BACKUP_HOST BACKUP_USER BACKUP_PASSWORD # Remove files older than 100 days find /mnt/user/odin_backup/OdinBackUp/files/ -type f -name '*.xml' -mtime +100 -exec rm {} \;
I've done another recovery and the system is back up and running but wondering for how long.
-
It seems unlikely but I guess it could if it's doing something unexpected.
Run
ps -auxwwd
if it fails and see what it's actually doing.Steve
-
Looks like this is the gw_leds script which it appears you're also running:
https://forum.netgate.com/topic/165680/sg-3100-21-05-1-kern-ipc-maxpipekva-exceeded-see-tuning-7Steve
-
@stephenw10 said in all services fail to start all packages gone:
Looks like this is the gw_leds script which it appears you're also running:
https://forum.netgate.com/topic/165680/sg-3100-21-05-1-kern-ipc-maxpipekva-exceeded-see-tuning-7Steve
Thanks. I’ll follow that post.