Two nodes with v2.3.2 - ssh faulty on one?
-
Hi all,
as per topic title we installed two new nodes with pfsense 2.3.2 (latest stable) and imported our configuration. Besides a few minor problems with VIPs or packages from the old version, this went fine.
We have a third machine installed with Ubuntu 14.04LTS which is configured to SSH into both nodes with a specific user and via SSH key. That was running smooth for the last 3 years. After the two new nodes were brought in and replaced the old machines, I now have a strange effect:The new node fwl01 is working fine. The server (crimson) is ssh'ing into it and scp'ing some files. Great!
The other node fwl02 is working - 1 out of 10 times. :O If I manually hop onto crimson and just call a simple "ssh fwl02" I get a connection at about 1 out of 10 times. SSH debugging mode didn't work either, the server simply doesn't get a response from pfSense on the other end. pfSense itself logs that access with a strange SSH failure:Sep 23 11:40:11 fwl02 sshd[94279]: fatal: Fssh_ssh_dispatch_run_fatal: Connection from <ip>port 38615: Operation not permitted [preauth]
Sep 23 11:43:29 fwl02 sshd[28060]: fatal: Fssh_ssh_dispatch_run_fatal: Connection from <ip>port 58669: Operation not permitted [preauth]
Sep 23 11:59:00 fwl02 sshd[66932]: Accepted publickey for nbackup from <ip>port 58670 ssh2: RSA SHA256:9OtuB6wUUNpHyZSNB/B+BG2nlFUr9WvGDoGw9caQY7Y
Sep 23 11:59:00 fwl02 sshd[67355]: Received disconnect from <ip>port 58670:11: disconnected by user</ip></ip></ip></ip>As can be seen, the first two connects were faulty, the one a few minutes later got through and actually worked. I can reproduce that at will on that server with normal SSH as well as SCP. As Ubuntu 14.04LTS already ships with ED25519 support, new(er) SSH KEX, MACs or Ciphers shouldn't be a problem here. Additionally fwl01 seems to have no trouble at all connecting with crimson. Even if you fire a dozen ssh requests at fwl01, it answers every one of it. fwl02 not so much. As they were both installed from the exact same USB Image and got literally the same configuration (as they are running in a CARP cluster scenario) I'm out of clues whatsoever triggers such a phenomenom. Never had problems with pfSense' SSH implementation at all.
Any clues?
Greets
Jens -
If it works inconsistently, then it's either something with the server itself (hardware, perhaps?) or something on the network between them. Perhaps the network layer is being interrupted or there is some other inconsistency there.
Googling that error turns up nothing, which is even more perplexing.
Are there any other errors in your system log around the ssh errors?
-
with the server itself (hardware, perhaps?)
checked that multiple times already. No faulty hardware found up to this point. Even ECC on RAMs checked without error.
Googling that error turns up nothing, which is even more perplexing.
I hear you. You'd have to see my expression on the google results…
Are there any other errors in your system log around the ssh errors?
Checked that. No, the fatal error message is the only one triggered by a connection request via ssh. no others are logged (checked with running clog -f on system.log at the same time).
Strange thing: from a windows machine with running mobaxterm and a shell like environment, i can connect to ssh on fwl02 multiple times without running into the same error. I'm at a loss on this one. Never had any trouble with the backup VM though as this is an almost naked minimal install of Ubuntu Trusty that otherwise has no problems connecting anywhere. And as fwl01 isn't making the same errors (nor are our two office pfSense's) I'm going bonkers about what the heck that may be. -
Well did you turn debug in your ssh client so you might gleen some actual information to work with? Most likely some sort of issue with what cipher/algo to use, etc..
-
@johnpoz: yeah already done and no, as soon as I see the SSH error message on fwl02 there's no feedback from the sshd to the client anymore. The communication simply stops (for the client) with
debug1: SSH2_MSG_KEXINIT sent
Every other debug output is exactly the same. If the connection works, it gets a response with
debug1: SSH2_MSG_KEXINIT received debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-ed25519-cert-v01@openssh.com,ssh-ed25519,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-dss-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-dss-cert-v00@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-sha1,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib ... ...etc etc...
But other than that there is no visible "error". The client just never gets a response back from pfSense after. Even worse - the ssh client binary won't even get a timeout and just sits there waiting forever :/
fatal: Fssh_ssh_dispatch_run_fatal: Connection from <ip> port 38615: Operation not permitted [preauth]</ip>
Old KEY/MAC/Ciphers were my first guess as we had quite a few of those problems with hosting customers after trusty and xenial upped their game with thighter opensshd settings and quite a few customer tools with bad and old ssh settings no longer worked. But that error is a first for us.
-
So seems when you client asks for kex the sshd is crashing.. That what I take from your saying sent but nothing comes back.
-
Aye, that's what I assume. As to why I'm stuck, as it doesn't "crash" every time (as if that whole thing wasn't crazy enough already)