Netgate 1100 not rebooting
-
After reboooting from GUI, device enters the state, where both LEDs of all network connections flicker rapidly. This happened first with 24.03-BETA, but happens also with 23.09.1-RELEASE reinstalled.
It happened a few days ago, so I may have missed some details already, but While in BETA, I tested automatic shutdown with APC Smart-UPS 1000 connected to the USB port and apcupsd service. All three devices connected to UPS did shutdown while they were still powered, except 1100, because shutdown behavior was set to Halt in apsupsd.
To start 1100 up I disconnected power and after all LEDs had blanked, connected power again. This was the first time when the LEDs then started to flicker. Disconnected the UPS, but it didn't help. Only after waiting much longer after disconnecting power and connecting it again 1100 booted up. Only that it booted to 23.09.1-RELEASE.
No matter what I tried, I couldn't get it to reboot to 24.03-BETA. I didn't try to shutdown device after setting default boot environment to 24.03-BETA to see, if that would have worked. At this point I reinstalled 23.09.1-RELEASE, but reboot would still not work. Also, I did not reboot 1100 between installing beta and testing UPS.
This is console output after initiating reboot in 23.09.1-RELEASE:
Netgate 1100 Netgate Device ID: ****** Serial: ****** *** Welcome to Netgate pfSense Plus 23.09.1-RELEASE (arm64) on pfSense1100 *** Current Boot Environment: default_20240407162645 Next Boot Environment: default_20240407162645 WAN (wan) -> mvneta0.4090 -> v4/DHCP4: 91.155.14.186/23 LAN (lan) -> mvneta0.4091 -> v4: 192.168.99.1/24 OPT (opt1) -> mvneta0.4092 -> v4: 172.16.1.1/24 0) Logout (SSH only) 9) pfTop 1) Assign Interfaces 10) Filter Logs 2) Set interface(s) IP address 11) Restart webConfigurator 3) Reset webConfigurator password 12) PHP shell + Netgate pfSense Plus tools 4) Reset to factory defaults 13) Update from console 5) Reboot system 14) Disable Secure Shell (sshd) 6) Halt system 15) Restore recent configuration 7) Ping host 16) Restart PHP-FPM 8) Shell Enter an option: pflog0: promiscuous mode disabled Waiting (max 60 seconds) for system process `vnlru' to stop... done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining... 0 0 0 0 done All buffers synced. Uptime: 33m30s Khelp module "ertt" can't unload until its ref
-
What happens if you run reboot at the command line?
Or use option 5 at the console menu?
Steve
-
@stephenw10 I'll try it when I have a chance.
Has this bug somehow crept in again?
-
Nope, that only happened at shutdown. It still rebooted fine with that present.
The
ertt
output is harmless. -
@stephenw10 Reboot 23.09.1 from console fails:
FreeBSD/arm64 (pfSense1100.localdomain) (ttyu0) Netgate 1100 Netgate Device ID: *** Serial: *** *** Welcome to Netgate pfSense Plus 23.09.1-RELEASE (arm64) on pfSense1100 *** Current Boot Environment: default_20240407162645 Next Boot Environment: default_20240407162645 WAN (wan) -> mvneta0.4090 -> v4/DHCP4: 91.155.14.186/23 LAN (lan) -> mvneta0.4091 -> v4: 192.168.99.1/24 OPT (opt1) -> mvneta0.4092 -> v4: 172.16.1.1/24 0) Logout (SSH only) 9) pfTop 1) Assign Interfaces 10) Filter Logs 2) Set interface(s) IP address 11) Restart webConfigurator 3) Reset webConfigurator password 12) PHP shell + Netgate pfSense Plus tools 4) Reset to factory defaults 13) Update from console 5) Reboot system 14) Disable Secure Shell (sshd) 6) Halt system 15) Restore recent configuration 7) Ping host 16) Restart PHP-FPM 8) Shell Enter an option: 5 Netgate pfSense Plus will reboot. This may take a few minutes, depending on your hardware. Do you want to proceed? Y/y: Reboot normally R/r: Reroot (Stop processes, remount disks, re-run startup sequence) Enter: Abort Enter an option: y Netgate pfSense Plus is rebooting now. Stopping package apcupsd...done. Stopping /usr/local/etc/rc.d/pfb_dnsbl.sh...done. Stopping /usr/local/etc/rc.d/pfb_filter.sh...done. pflog0: promiscuous mode disabled Waiting (max 60 seconds) for system process `vnlru' to stop... done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining... 0 0 0 done All buffers synced. Uptime: 4d20h8m34s Khelp module "ertt" can't unload until its ref
I also couldn't get 1100 to boot, so reinstalled 23.09.1 again. Having nothing to loose at this point, I updated to 24.03-RC. 1100 didn't reboot during update, but did boot after having been unplugged from power for a while. Update finished after boot, but had the same problem as 6100 with unbound not starting, so another reboot attempt and boot after that:
Netgate 1100 Netgate Device ID: ** Serial: ** *** Welcome to Netgate pfSense Plus 24.03-RC (arm64) on pfSense1100 *** Current Boot Environment: default Next Boot Environment: default WAN (wan) -> mvneta0.4090 -> v4/DHCP4: 91.155.14.186/23 LAN (lan) -> mvneta0.4091 -> v4: 192.168.99.1/24 OPT (opt1) -> mvneta0.4092 -> v4: 172.16.1.1/24 0) Logout (SSH only) 9) pfTop 1) Assign Interfaces 10) Filter Logs 2) Set interface(s) IP address 11) Restart GUI 3) Reset admin account and password 12) PHP shell + Netgate pfSense Plus tools 4) Reset to factory defaults 13) Update from console 5) Reboot system 14) Disable Secure Shell (sshd) 6) Halt system 15) Restore recent configuration 7) Ping host 16) Restart PHP-FPM 8) Shell Enter an option: 5 Netgate pfSense Plus will reboot. This may take a few minutes, depending on your hardware. Do you want to proceed? Y/y: Reboot normally R/r: Reroot (Stop processes, remount disks, re-run startup sequence) Enter: Abort Enter an option: y Netgate pfSense Plus is rebooting now. Stopping package apcupsd...done. Stopping /usr/local/etc/rc.d/pfb_dnsbl.sh...done. Stopping /usr/local/etc/rc.d/pfb_filter.sh...done. pflog0: promiscuous mode disabled Waiting (max 60 seconds) for system process `vnlru' to stop... done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining... 0 0 0 done All buffers synced. Uptime: 17m31s
During usb recovery console displayed these errors:
scanning bus 0 for devices... 1 USB Device(s) found scanning bus 1 for devices... 2 USB Device(s) found scanning usb for storage devices... 1 Storage Device(s) found 18022 armada-3720-netgate-1100.dtb 18022 armada-3720-sg1100.dtb 12944 armada-3720-netgate-2100.dtb 12944 armada-3720-sg2100.dtb System Volume Information/ 132485 config.xml 5 file(s), 1 dir(s) 2097152 bytes read in 104 ms (19.2 MiB/s) 18022 bytes read in 17 ms (1 MiB/s) ## Starting EFI application at 07000000 ... Card did not respond to voltage select! Scanning disk sdhci@d0000.blk... Disk sdhci@d0000.blk not ready Scanning disk sdhci@d8000.blk... bad MBR sector signature 0x0000 bad MBR sector signature 0x0000 . . . bad MBR sector signature 0x0000 bad MBR sector signature 0x0000 Scanning disk usb_mass_storage.lun0... Found 5 disks
-
None of those errors are a concern, all 1100s show that.
Do you have any additional hardware attached to that 1100?
-
@stephenw10 Console, OPT and WAN ports were connected and also UPS was connected to USB3 port when trying to reboot 23.09.1 above, then I disconnected UPS. Connected it again after update and the boots above.
This reboot problem started around the same time I connected UPS to 1100, but that may just be coincidental.
-
Hmm, did you try rebooting on a completely clean install of 23.09.1?
-
@stephenw10 Turned out, that the culprit was apcupsd, which somehow interfered rebooting. Once I removed it, 24.03-RC rebooted normally.
I used apcupsd only because nut didn't find/connect UPS on 1100.
-
Ah, nice catch!
Not seeing a bug for that, is it a known issue?
-
If this problem is specific to 1100, then it might have been unknown. Might be not so common to connect UPS to 1100, I don't know. Should be easy to reproduce.
-
After uninstalling apcupsd and then rebooting 1100 successfully I thought I was done with reboot problem. Then I read about someone with 1100 and apcupsd having no problems with rebooting and thought I would check if my 1100 would still reboot. Turned out it did not, so apcupsd was not the culprit after all.
Tried the revised RC, though was pretty sure it wouldn't change anything. It did not. Then reinstalled 23.09.01 again and tried to reboot without first restoring config. It did not reboot.
I have had 1100 for less than three years, is it failing already? ** Edit: I've used RAM disk from the beginning **
What do flickering lights of the ethernet connectors indicate? Earlier I tried another power brick, but it didn't change anything.
Here's the recovery and reboot attempt log: putty.txt
-
Hmm, nothing there looks especially unusual, even the Bad MBR logs.
@pfsjap said in Netgate 1100 not rebooting:
What do flickering lights of the ethernet connectors indicate?
How exactly do you mean 'flickering'? Can you capture that in a video?
-
@stephenw10 Here's a short clip: Netgate 1100 - Trim.7z
-
Hmm, no that's not good. You shouldn't see the port LEDs light without anything connected. Does it do that continually?
-
@stephenw10 Continually after it starts? For as long as I have had patience to wait.
Reboot is like shutdown, to make it boot again I have to unplug power, wait and then replug power. Sometimes I have to do unplugging-waiting-replugging several times until booting succeeds. The longer I wait in between, the more likely booting succeeds.
-
Sorry I mean does it start doing the immediately after powering it on or sometime later? Like before or after POST. If it starts flickering immediately that's probably a hardware issue. If it starts after a minute or so it could be the drivers.
-
After reboot the LEDs stay off, but start flickering after about 4 minutes and 15 seconds (I don't know how consistent this time is, I only have one data point). First it is kind of random but over time becomes like on that video clip.
Usually I don't wait for this long, but unplug power right away, then wait and replug again. After plugging in power the flickering may start right away, or not. Anyway, I just try cycling power again.
If power has been unplugged long enough, 1100 boots up right away.
-
-
What uboot version do you have?:
[24.03-RC][root@1100-3.stevew.lan]/root: kenv | grep smbios.bios smbios.bios.reldate="10/07/2021" smbios.bios.vendor="U-Boot" smbios.bios.version="2018.03-devel-18.12.3-gc9aa92c-dirty"
-
@stephenw10 Seems to be same as in your post above:
[24.03-RC][admin@pfSense1100.localdomain]/root: uname -a FreeBSD pfSense1100.localdomain 15.0-CURRENT FreeBSD 15.0-CURRENT #0 plus-RELENG_24_03-n256311-e71f834dd81: Tue Apr 16 00:38:13 UTC 2024 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/obj/aarch64/f8EaPNPx/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/sources/FreeBSD-src-plus-RELENG_24_03/arm64.aarch64/sys/pfSense arm64 [24.03-RC][admin@pfSense1100.localdomain]/root: kenv | grep smbios.bios smbios.bios.reldate="10/07/2021" smbios.bios.vendor="U-Boot" smbios.bios.version="2018.03-devel-18.12.3-gc9aa92c-dirty" [24.03-RC][admin@pfSense1100.localdomain]/root: