20 minute pause on reboot
-
Odd that restarting the web interface does anything there. In prior versions, that'd kill off all PHP which could fix a variety of things, but in 2.2 that doesn't touch PHP.
That a system where you can mess with things a bit? I'm curious if you uninstall all packages and reboot, if it still does that.
-
One moment please…
@cmb:
That a system where you can mess with things a bit? I'm curious if you uninstall all packages and reboot, if it still does that.
-
Good guess, and somewhat curious results.
[Note that I had done several reboot tests previously, including removing all but one last package.]
If I remove all packages, then the reboot works as expected. Further, after the clean reboot with no packages installed, I am able to re-add packages and reboot continues to work as expected. Perhaps there is a flag set regarding packages on firmware install that isn't being cleared?
When the next snapshot update is available, I will try uninstalling all packages prior to performing the firmware update.
Is there anything else I can do to diagnose?
@cmb:
I'm curious if you uninstall all packages and reboot, if it still does that.
-
Chris, I was able to confirm this with the last firmware update.
1. Remove all packages
2. Invoke firmware auto update
[System updates and reboots in a timely manner]
3. Initiate reboot via the web interface
[System reboots in a timely manner]
4. Add a package
5. Initiate reboot via the web interface
[System reboots in a timely manner]So it does appear that it is associated with installing packages.
I can leave a package installed next time before the firmware upgrade to reconfirm if you like.
-
Can you narrow it down to a specific package, or does it matter which?
-
I have two packages installed (NUT and pfBlockerNG). It doesn't appear to matter which is enabled. My testing with a single package was with NUT. pfBlockerNG was completely removed prior to the tests.
Is there a package you would consider safe that you prefer I test with? I'd like something relatively easy to clean up if possible as this is a non-vm production install.
-
I had this problem with 2.2 alpha a few months ago, I don't remember what I did to fix it. I just tested this and it works now, and rebooted normally after update and reboot.
Avahi Network Management 0.6.31 pkg v1.09
Cron Services 0.1.8
nmap Security nmap-6.47 pkg v1.3
ntopng Network Management 1.2.1 v0.2
Open-VM-Tools Services 1280544_9
pfBlockerNG Firewall Current: 0.99
Installed: 1.0
snort Security 2.9.7.0 pkg v3.2.1
System Patches System 1.0.2 -
anything other than nut would suffice as a test. nmap's an easy one that requires no config and would at least determine whether it's specific to having a package installed, or having nut installed. I suspect it's specific to nut.
-
I think you are correct. I installed nmap and removed NUT. The problem did not occur.
I submitted a patch to nut.inc in the bug tracking system do fix the NUT rc file. Could the firmware upgrade problem be related to the problem with NUT's rc file?
-
I need to do more testing to be 100% sure, but what I am currently thinking is that the problem occurs if anything gets in the way of successful package re-installation and activation.
This morning's fix to NUT solves the problem if the package manager is using the standard server. However, if a non standard package server is set and NUT cannot be re-installed, then the problem still occurs.
[The reason for the non-standard package server is that I'm beta testing pfBlockerNG and I had neglected to clear the server before the firmware update.]
-
So is it only during reboots where packages are reinstalled? Thought earlier it was every reboot.
-
Once it occurs, it's every reboot.
I believe that what causes it to occur is associated with package reinstall (or lack thereof) on reboot. However, once the problem starts, the 20 minute delay appears to happen every reboot until all packages are removed.
Still working on what exactly triggers it.
-
So, I have not had sufficient time to vigorously test to be 100% what certain what triggers the problem…
However, I currently have a system that experiences the 20 minute pause on each reboot. During the pause, the system will not route packets, DNS/DHCP do not function, and the web interface cannot be connected to. The system is responsive to ping and ssh. This is true regardless of whether it's the first reboot after install or subsequent reboots.
If I go in via ssh and look at top, the system is completely idle. If I choose "Restart webConfigurator" during the pause, the system will recover within a few seconds and complete the boot process normally.
Are there any file system contents that would be useful in diagnosing the problem? Things you would like me to check, either during a pause or after? I will try to leave the system in it's current state as long as possible.
More information:
At the point of the firmware upgrade, the status was as follows:
- Two packages installed: NUT and pfBlockerNG
- The package server was set to a non standard server (pfBlockerNG beta server)
- The system had previously been through a reboot cleanly
- Auto update with fill backup was selected
During the pause resulting from the first reboot after firmware install:
- The system was left alone for 5 minutes or more after boot
- Top was checked via ssh--system was idle
- Package installation had not begun
- The webConfigurator was restarted via the ssh menu
- The system recovers and begins normal startup, including installing packages
- After pfBlocker is installed, the system declares itself to be up and all is normal
- Note that NUT cannot be installed because the system is pointing to a non-standard package server which does not have the NUT package available
During the pause resulting from the second reboot after firmware install:
- The system was left alone for 5 minutes or more after boot
- Top was checked via ssh--system was idle
- The webConfigurator was restarted via the ssh menu
- The system recovers and competes normal startup within a few seconds
In previous testing I have determined that the following will fix the pause:
- Remove all packages via the UI
- Reboot
- Reinstall all packages via the UI
- Reboot
Please let me know what I can do to help diagnose
-
Chris, anything I can get you from the existing system?
I'm currently holding off on the next snapshot to preserve the problem.
-
What's the last log entry you see in the system log (SSH in, run "clog /var/log/system.log") while it's hung? That'll hopefully help narrow down where to look.
-
I did a full dump during and after for a reboot yesterday. You can see my login at 12:27:35. Nothing else in the system log until 12:40:38.
Dec 29 12:26:37 fw kernel: vlan0: changing name to 'em1_vlan1003'
Dec 29 12:26:37 fw php: rc.bootup: Resyncing OpenVPN instances.
Dec 29 12:26:37 fw kernel: done.
Dec 29 12:26:37 fw kernel:
Dec 29 12:26:37 fw kernel: tun1: changing name to 'ovpns1'
Dec 29 12:26:37 fw kernel: tun2: changing name to 'ovpns2'
Dec 29 12:26:37 fw kernel: ovpns1: link state changed to UP
Dec 29 12:26:37 fw kernel: pflog0: promiscuous mode enabled
Dec 29 12:26:37 fw check_reload_status: rc.newwanip starting ovpns1
Dec 29 12:26:37 fw kernel:
Dec 29 12:26:37 fw kernel: ovpns2: link state changed to UP
Dec 29 12:26:37 fw check_reload_status: rc.newwanip starting ovpns2
Dec 29 12:26:38 fw sshd[14700]: Server listening on :: port 22.
Dec 29 12:26:38 fw sshd[14700]: Server listening on 0.0.0.0 port 22.
Dec 29 12:26:38 fw sshlockout[14992]: sshlockout/webConfigurator v3.0 starting up
Dec 29 12:26:40 fw check_reload_status: Linkup starting em2
Dec 29 12:26:40 fw kernel: em2: link state changed to UP
Dec 29 12:26:41 fw check_reload_status: Linkup starting em0
Dec 29 12:26:41 fw kernel: em0: link state changed to UP
Dec 29 12:26:41 fw check_reload_status: Linkup starting em1
Dec 29 12:26:41 fw kernel: em1: link state changed to UP
Dec 29 12:26:41 fw kernel: em1_vlan1003: link state changed to UP
Dec 29 12:26:41 fw check_reload_status: Linkup starting em1_vlan1003
Dec 29 12:27:35 fw sshd[16156]: Accepted publickey for root from 192.168.230.21 port 59902 ssh2: RSA be:81:cc:09:1d:7e:72:99:ef:a4:89:c6:9f:14:1d:29
Dec 29 12:30:47 fw sshd[19907]: Accepted publickey for root from 192.168.230.21 port 59978 ssh2: RSA be:81:cc:09:1d:7e:72:99:ef:a4:89:c6:9f:14:1d:29Dec 29 12:40:38 fw kernel: ..
Dec 29 12:40:39 fw php: rc.restart_webgui: Creating rrd update script
Dec 29 12:40:39 fw sshd[19907]: fatal: Write failed: Operation not permitted
Dec 29 12:40:39 fw sshd[19907]: fatal: Write failed: Operation not permitted
Dec 29 12:40:39 fw php: rc.bootup: Removing static route for monitor 50.152.240.1 and adding a new route through 50.193.41.78
Dec 29 12:40:39 fw kernel: .done.
Dec 29 12:40:39 fw sshd[16156]: fatal: Write failed: Operation not permitted
Dec 29 12:40:39 fw sshd[16156]: fatal: Write failed: Operation not permitted
Dec 29 12:40:41 fw kernel: done.
Dec 29 12:40:41 fw lighttpd[31353]: (server.c.1558) server stopped by UID = 0 PID = 81972
Dec 29 12:40:41 fw kernel: done.
Dec 29 12:40:42 fw php: rc.bootup: ROUTING: setting default route to 50.193.41.78
Dec 29 12:40:42 fw kernel: done.
Dec 29 12:40:44 fw kernel: done.
Dec 29 12:40:44 fw kernel: done.
Dec 29 12:40:44 fw kernel: done.
Dec 29 12:40:44 fw dhcpleases: kqueue error: unkown
Dec 29 12:40:44 fw check_reload_status: Updating all dyndns
Dec 29 12:40:49 fw kernel: ..
Dec 29 12:40:49 fw kernel: ..
Dec 29 12:40:49 fw kernel: .done.
Dec 29 12:40:49 fw kernel: done
Dec 29 12:40:51 fw kernel: done.
Dec 29 12:40:58 fw php-fpm[6277]: /index.php: Successful login for user 'denny' from: 192.168.230.21
Dec 29 12:40:58 fw php-fpm[6277]: /index.php: Successful login for user 'denny' from: 192.168.230.21
Dec 29 12:41:02 fw php: rc.bootup: Creating rrd update script
Dec 29 12:41:02 fw kernel: done.
Dec 29 12:41:02 fw syslogd: exiting on signal 15
Dec 29 12:41:02 fw syslogd: kernel boot file is /boot/kernel/kernel
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: Restarting/Starting all packages.
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: [pfBlockerNG] Sync terminated during boot process.
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: [pfBlockerNG] Sync terminated during boot process.
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: [pfBlockerNG] Sync terminated during boot process.
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: [pfBlockerNG] Sync terminated during boot process.
Dec 29 12:41:04 fw login: login on ttyv0 as root
Dec 29 12:41:04 fw sshlockout[71340]: sshlockout/webConfigurator v3.0 starting up
Dec 29 12:41:20 fw sshd[12868]: Accepted publickey for root from 192.168.230.21 port 60066 ssh2: RSA be:81:cc:09:1d:7e:72:99:ef:a4:89:c6:9f:14:1d:29
Dec 29 12:41:27 fw snmpd[1924]: could not encode error response
Dec 29 12:41:28 fw snmpd[1924]: could not encode error response -
Of note is that the problem hasn't occurred in the last several updates… was anything done specifically to address this?
-
Of note is that the problem hasn't occurred in the last several updates… was anything done specifically to address this?
I'm pretty sure the problem with the nut package was fixed, which seemed to be the root cause of the delay.