22.01 - problem with ssh host key permissions after restore from backup, sshd fails to start
-
I wanted a clean start today with the latest 22.01 beta, so I backed up my running config from 22.01.b.20211220.0600 and did a clean install of 21.05.2 to my 6100.
Then, first thing I did after bootup was try to restore my config. Ouch, that completely nuked the system. GUI/php painted a bunch of errors (something about the encryption key length, not sure if it was SSH or the encrypted password it was complaining about, sorry I didn't take a screenshot...). Rendered the whole box inoperable though...even from console.
So I reformatted, and this time updated to 22.01 first before trying to restore my config. That worked (mostly).
But I found I could no longer SSH into the system. I eventually noticed the warning that sshd was failing to start:
Log:
Looking at the perms from another system, it seems like both
ssh_host_ed25519_key
andssh_host_rsa_key
(private keys) should bechmod 600
—so I did that, and voila, ssh was back in business.This is a potential gotcha though, hope it can be fixed in time for 22.01
update: redmine #12640
-
@luckman212 how did you changed the mode?
forgive my ignorance, when i try on Diagnostics > Command Prompt
i get Not a directory.chmod 600 /usr/sbin/sshd/ssh_host_ed25519_key
-
@oddussiben-3161 I would suggest getting familiar with the console and doing it from there. But, either way, the files you would need to change are:
/etc/ssh/ssh_host_rsa_key
/etc/ssh/ssh_host_ed25519_key
-
@luckman212 cant ssh to the box, as sshd is not starting after restoring from backup
-
@oddussiben-3161 right- I guess serial console is also out of the question then? Anyway, your command should work as long as you fix the paths.
-
@luckman212 yes, thanks.
i ran the command but no luck (i only have ssh_host_ed25519_key), sshd is still down, rebooted the box too. -
@oddussiben-3161 please post your logs.
-
SSH KeyGen pfSense has started creating missing SSH keys. SSH Startup will be delayed. Please note that reloading the filter rules and changes will be delayed until this operation is completed. @ 2022-01-12 19:11:55
this is all i see
-
@oddussiben-3161 I meant from your Status > System Logs. Anything related to sshd there?
You can run this command from Diagnostics > Command Prompt
grep ssh /var/log/system.log
post the output.
-
did
Reset Log Files
yesterday. since thensyslogd
service is not starting :( -
I'm running out of ideas, but this sure seems like a bigger issue than just permissions on the ssh keys. You really need to hook up a console cable to see what's happening during boot, you'd probably spot the error pretty quickly.
I'd start thinking about backing up and reinstalling. Unless Netgate support has any other ideas.
-
@luckman212 upgraded 2.5 to 2.6RC now syslogd started working :d
sshd is still down, here is the log
grep ssh /var/log/system.log Jan 14 17:25:00 the sshguard[80932]: Exiting on signal. Jan 14 17:25:00 the sshguard[26945]: Now monitoring attacks. Jan 14 17:26:00 the sshguard[26945]: Exiting on signal. Jan 14 17:26:00 the sshguard[95288]: Now monitoring attacks. Jan 14 17:27:00 the sshguard[95288]: Exiting on signal. Jan 14 17:27:00 the sshguard[92574]: Now monitoring attacks. Jan 14 17:28:00 the sshguard[92574]: Exiting on signal. Jan 14 17:28:00 the sshguard[32703]: Now monitoring attacks. Jan 14 17:28:27 the check_reload_status[418]: starting sshd Jan 14 17:29:00 the sshguard[32703]: Exiting on signal. Jan 14 17:29:00 the sshguard[31906]: Now monitoring attacks. Jan 14 17:30:00 the sshguard[31906]: Exiting on signal. Jan 14 17:30:00 the sshguard[3832]: Now monitoring attacks.
-
@oddussiben-3161 Still not clear what could be causing that. There's no way to turn off sshguard that I know of. Can you try showing some more context:
grep -B1 -A3 ssh /var/log/system.log
-
grep -B1 -A3 ssh /var/log/system.log Jan 15 09:41:00 the newsyslog[24752]: logfile turned over due to size>97K Jan 15 09:41:00 the sshguard[39770]: Exiting on signal. Jan 15 09:41:00 the sshguard[25944]: Now monitoring attacks. Jan 15 09:42:00 the sshguard[25944]: Exiting on signal. Jan 15 09:42:00 the sshguard[33294]: Now monitoring attacks. Jan 15 09:43:00 the sshguard[33294]: Exiting on signal. Jan 15 09:43:00 the sshguard[9172]: Now monitoring attacks. Jan 15 09:44:00 the sshguard[9172]: Exiting on signal. Jan 15 09:44:00 the sshguard[76252]: Now monitoring attacks. Jan 15 09:44:03 the php[75348]: pfSsh.php: OpenVPN ID client01 PID 57590 still running, killing. Jan 15 09:44:03 the kernel: ovpnc01: link state changed to DOWN Jan 15 09:44:04 the check_reload_status[418]: Reloading filter -- Jan 15 09:44:07 the php-fpm[61923]: /rc.newwanip: Removing static route for monitor 94.x.x.x and adding a new route through 10.x.x.x Jan 15 09:44:07 the check_reload_status[418]: starting sshd Jan 15 09:44:08 the php-fpm[61923]: /rc.newwanip: IP Address has changed, killing states on former IP Address 10.x.x.x. Jan 15 09:44:11 the php-fpm[61923]: /rc.newwanip: Creating rrd update script Jan 15 09:44:13 the php-fpm[61923]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 10.x.x.x -> 10.x.x.x - Restarting packages. -- Jan 15 09:44:59 the check_reload_status[418]: starting sshd Jan 15 09:45:00 the sshguard[76252]: Exiting on signal. Jan 15 09:45:00 the sshguard[17392]: Now monitoring attacks. Jan 15 09:45:03 the php[15086]: /usr/local/sbin/acbupload.php: End of configuration backup to https://acb.netgate.com/save (success). Jan 15 09:47:00 the sshguard[17392]: Exiting on signal. Jan 15 09:47:00 the sshguard[78519]: Now monitoring attacks. Jan 15 09:47:32 the php-fpm[35829]: /index.php: Successful login for user 'admin' from: 192.168.1.x (Local Database)
-
@oddussiben-3161 Amazingly that still doesn't reveal much detail, I assume sshguard is failing or stuck in a loop because sshd isn't starting properly. And your logfile is too small and is rotating out the important part(s) before we get a chance to see them.
Try increasing your logfile size limit to 5000000 (5MB) and reboot, then run this command to look for sshd only:
grep -B2 -A5 sshd /var/log/system.log
-
increased log size, after reboot
Jan 15 12:23:43 the kernel: coretemp0: <CPU On-Die Thermal Sensors> on cpu0 Jan 15 12:23:43 the kernel: lo0: link state changed to UP Jan 15 12:23:44 the php-fpm[346]: /sshd: New alert found: pfSense has started creating missing SSH keys. SSH Startup will be delayed. Please note that reloading the filter rules and changes will be delayed until this operation is completed. Jan 15 12:23:44 the check_reload_status[418]: Linkup starting igb0 Jan 15 12:23:44 the kernel: Jan 15 12:23:44 the kernel: igb0: link state changed to UP Jan 15 12:23:44 the kernel: igb0: link state changed to DOWN Jan 15 12:23:44 the check_reload_status[418]: Linkup starting igb0 -- Jan 15 12:24:20 the kernel: done. Jan 15 12:24:21 the root[84238]: /etc/rc.d/hostid: WARNING: hostid: unable to figure out a UUID from DMI data, generating a new one Jan 15 12:24:21 the php-fpm[346]: /sshd: API to Telegram did not return data in expected format! Jan 15 12:24:22 the php-fpm[32372]: /index.php: Successful login for user 'admin' from: 192.168.1.x (Local Database) Jan 15 12:24:22 the sshguard[93066]: Now monitoring attacks. Jan 15 12:24:23 the syslogd: exiting on signal 15 Jan 15 12:24:23 the syslogd: kernel boot file is /boot/kernel/kernel Jan 15 12:24:23 the kernel: ovpnc1: link state changed to UP
-
Not 100% sure, but it looks like the system is failing to generate the host keys during boot.
Can you try going to Diagnostics → Command Prompt and executing this command:
for t in rsa ed25519; do rm -f /etc/ssh/ssh_host_${t}_key; ssh-keygen -t $t -b 4096 -N '' -f /etc/ssh/ssh_host_${t}_key; done
You should see output like this after a few seconds:
Try rebooting after that & see if sshd is working.
-
@luckman212 you are awesome!!!!.
that worked. thank you -
@oddussiben-3161 Glad to help.