20 minute pause on reboot
-
I have two packages installed (NUT and pfBlockerNG). It doesn't appear to matter which is enabled. My testing with a single package was with NUT. pfBlockerNG was completely removed prior to the tests.
Is there a package you would consider safe that you prefer I test with? I'd like something relatively easy to clean up if possible as this is a non-vm production install.
-
I had this problem with 2.2 alpha a few months ago, I don't remember what I did to fix it. I just tested this and it works now, and rebooted normally after update and reboot.
Avahi Network Management 0.6.31 pkg v1.09
Cron Services 0.1.8
nmap Security nmap-6.47 pkg v1.3
ntopng Network Management 1.2.1 v0.2
Open-VM-Tools Services 1280544_9
pfBlockerNG Firewall Current: 0.99
Installed: 1.0
snort Security 2.9.7.0 pkg v3.2.1
System Patches System 1.0.2 -
anything other than nut would suffice as a test. nmap's an easy one that requires no config and would at least determine whether it's specific to having a package installed, or having nut installed. I suspect it's specific to nut.
-
I think you are correct. I installed nmap and removed NUT. The problem did not occur.
I submitted a patch to nut.inc in the bug tracking system do fix the NUT rc file. Could the firmware upgrade problem be related to the problem with NUT's rc file?
-
I need to do more testing to be 100% sure, but what I am currently thinking is that the problem occurs if anything gets in the way of successful package re-installation and activation.
This morning's fix to NUT solves the problem if the package manager is using the standard server. However, if a non standard package server is set and NUT cannot be re-installed, then the problem still occurs.
[The reason for the non-standard package server is that I'm beta testing pfBlockerNG and I had neglected to clear the server before the firmware update.]
-
So is it only during reboots where packages are reinstalled? Thought earlier it was every reboot.
-
Once it occurs, it's every reboot.
I believe that what causes it to occur is associated with package reinstall (or lack thereof) on reboot. However, once the problem starts, the 20 minute delay appears to happen every reboot until all packages are removed.
Still working on what exactly triggers it.
-
So, I have not had sufficient time to vigorously test to be 100% what certain what triggers the problem…
However, I currently have a system that experiences the 20 minute pause on each reboot. During the pause, the system will not route packets, DNS/DHCP do not function, and the web interface cannot be connected to. The system is responsive to ping and ssh. This is true regardless of whether it's the first reboot after install or subsequent reboots.
If I go in via ssh and look at top, the system is completely idle. If I choose "Restart webConfigurator" during the pause, the system will recover within a few seconds and complete the boot process normally.
Are there any file system contents that would be useful in diagnosing the problem? Things you would like me to check, either during a pause or after? I will try to leave the system in it's current state as long as possible.
More information:
At the point of the firmware upgrade, the status was as follows:
- Two packages installed: NUT and pfBlockerNG
- The package server was set to a non standard server (pfBlockerNG beta server)
- The system had previously been through a reboot cleanly
- Auto update with fill backup was selected
During the pause resulting from the first reboot after firmware install:
- The system was left alone for 5 minutes or more after boot
- Top was checked via ssh--system was idle
- Package installation had not begun
- The webConfigurator was restarted via the ssh menu
- The system recovers and begins normal startup, including installing packages
- After pfBlocker is installed, the system declares itself to be up and all is normal
- Note that NUT cannot be installed because the system is pointing to a non-standard package server which does not have the NUT package available
During the pause resulting from the second reboot after firmware install:
- The system was left alone for 5 minutes or more after boot
- Top was checked via ssh--system was idle
- The webConfigurator was restarted via the ssh menu
- The system recovers and competes normal startup within a few seconds
In previous testing I have determined that the following will fix the pause:
- Remove all packages via the UI
- Reboot
- Reinstall all packages via the UI
- Reboot
Please let me know what I can do to help diagnose
-
Chris, anything I can get you from the existing system?
I'm currently holding off on the next snapshot to preserve the problem.
-
What's the last log entry you see in the system log (SSH in, run "clog /var/log/system.log") while it's hung? That'll hopefully help narrow down where to look.
-
I did a full dump during and after for a reboot yesterday. You can see my login at 12:27:35. Nothing else in the system log until 12:40:38.
Dec 29 12:26:37 fw kernel: vlan0: changing name to 'em1_vlan1003'
Dec 29 12:26:37 fw php: rc.bootup: Resyncing OpenVPN instances.
Dec 29 12:26:37 fw kernel: done.
Dec 29 12:26:37 fw kernel:
Dec 29 12:26:37 fw kernel: tun1: changing name to 'ovpns1'
Dec 29 12:26:37 fw kernel: tun2: changing name to 'ovpns2'
Dec 29 12:26:37 fw kernel: ovpns1: link state changed to UP
Dec 29 12:26:37 fw kernel: pflog0: promiscuous mode enabled
Dec 29 12:26:37 fw check_reload_status: rc.newwanip starting ovpns1
Dec 29 12:26:37 fw kernel:
Dec 29 12:26:37 fw kernel: ovpns2: link state changed to UP
Dec 29 12:26:37 fw check_reload_status: rc.newwanip starting ovpns2
Dec 29 12:26:38 fw sshd[14700]: Server listening on :: port 22.
Dec 29 12:26:38 fw sshd[14700]: Server listening on 0.0.0.0 port 22.
Dec 29 12:26:38 fw sshlockout[14992]: sshlockout/webConfigurator v3.0 starting up
Dec 29 12:26:40 fw check_reload_status: Linkup starting em2
Dec 29 12:26:40 fw kernel: em2: link state changed to UP
Dec 29 12:26:41 fw check_reload_status: Linkup starting em0
Dec 29 12:26:41 fw kernel: em0: link state changed to UP
Dec 29 12:26:41 fw check_reload_status: Linkup starting em1
Dec 29 12:26:41 fw kernel: em1: link state changed to UP
Dec 29 12:26:41 fw kernel: em1_vlan1003: link state changed to UP
Dec 29 12:26:41 fw check_reload_status: Linkup starting em1_vlan1003
Dec 29 12:27:35 fw sshd[16156]: Accepted publickey for root from 192.168.230.21 port 59902 ssh2: RSA be:81:cc:09:1d:7e:72:99:ef:a4:89:c6:9f:14:1d:29
Dec 29 12:30:47 fw sshd[19907]: Accepted publickey for root from 192.168.230.21 port 59978 ssh2: RSA be:81:cc:09:1d:7e:72:99:ef:a4:89:c6:9f:14:1d:29Dec 29 12:40:38 fw kernel: ..
Dec 29 12:40:39 fw php: rc.restart_webgui: Creating rrd update script
Dec 29 12:40:39 fw sshd[19907]: fatal: Write failed: Operation not permitted
Dec 29 12:40:39 fw sshd[19907]: fatal: Write failed: Operation not permitted
Dec 29 12:40:39 fw php: rc.bootup: Removing static route for monitor 50.152.240.1 and adding a new route through 50.193.41.78
Dec 29 12:40:39 fw kernel: .done.
Dec 29 12:40:39 fw sshd[16156]: fatal: Write failed: Operation not permitted
Dec 29 12:40:39 fw sshd[16156]: fatal: Write failed: Operation not permitted
Dec 29 12:40:41 fw kernel: done.
Dec 29 12:40:41 fw lighttpd[31353]: (server.c.1558) server stopped by UID = 0 PID = 81972
Dec 29 12:40:41 fw kernel: done.
Dec 29 12:40:42 fw php: rc.bootup: ROUTING: setting default route to 50.193.41.78
Dec 29 12:40:42 fw kernel: done.
Dec 29 12:40:44 fw kernel: done.
Dec 29 12:40:44 fw kernel: done.
Dec 29 12:40:44 fw kernel: done.
Dec 29 12:40:44 fw dhcpleases: kqueue error: unkown
Dec 29 12:40:44 fw check_reload_status: Updating all dyndns
Dec 29 12:40:49 fw kernel: ..
Dec 29 12:40:49 fw kernel: ..
Dec 29 12:40:49 fw kernel: .done.
Dec 29 12:40:49 fw kernel: done
Dec 29 12:40:51 fw kernel: done.
Dec 29 12:40:58 fw php-fpm[6277]: /index.php: Successful login for user 'denny' from: 192.168.230.21
Dec 29 12:40:58 fw php-fpm[6277]: /index.php: Successful login for user 'denny' from: 192.168.230.21
Dec 29 12:41:02 fw php: rc.bootup: Creating rrd update script
Dec 29 12:41:02 fw kernel: done.
Dec 29 12:41:02 fw syslogd: exiting on signal 15
Dec 29 12:41:02 fw syslogd: kernel boot file is /boot/kernel/kernel
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: Restarting/Starting all packages.
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: [pfBlockerNG] Sync terminated during boot process.
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: [pfBlockerNG] Sync terminated during boot process.
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: [pfBlockerNG] Sync terminated during boot process.
Dec 29 12:41:02 fw php-fpm[14317]: /rc.start_packages: [pfBlockerNG] Sync terminated during boot process.
Dec 29 12:41:04 fw login: login on ttyv0 as root
Dec 29 12:41:04 fw sshlockout[71340]: sshlockout/webConfigurator v3.0 starting up
Dec 29 12:41:20 fw sshd[12868]: Accepted publickey for root from 192.168.230.21 port 60066 ssh2: RSA be:81:cc:09:1d:7e:72:99:ef:a4:89:c6:9f:14:1d:29
Dec 29 12:41:27 fw snmpd[1924]: could not encode error response
Dec 29 12:41:28 fw snmpd[1924]: could not encode error response -
Of note is that the problem hasn't occurred in the last several updates… was anything done specifically to address this?
-
Of note is that the problem hasn't occurred in the last several updates… was anything done specifically to address this?
I'm pretty sure the problem with the nut package was fixed, which seemed to be the root cause of the delay.