firewall unresponsive - kernel: sonewconn: pcb: pru_attach() failed
System Netgate 3100
Version 22.01-RELEASE (arm)
built on Mon Feb 07 16:39:01 UTC 2022
Sequence of events from system log:
Jan 27 11:49:10 firewall sshd: Received disconnect from 220.127.116.11 port 55774:11: Bye Bye [preauth] Jan 27 11:49:10 firewall sshd: Disconnected from authenticating user root 18.104.22.168 port 55774 [preauth] Jan 27 11:49:10 firewall sshguard: Attack from "22.214.171.124" on service SSH with danger 10. Jan 27 11:49:10 firewall sshguard: Blocking "126.96.36.199/32" for 240 secs (3 attacks in 224 secs, after 2 abuses over 656 secs.) Jan 27 11:49:30 firewall sshd: Received disconnect from 188.8.131.52 port 46824:11: Bye Bye [preauth] Jan 27 11:49:30 firewall sshd: Disconnected from authenticating user root 184.108.40.206 port 46824 [preauth] Jan 27 11:49:30 firewall sshguard: Attack from "220.127.116.11" on service SSH with danger 10. Jan 27 11:49:30 firewall sshguard: Blocking "18.104.22.168/32" for 240 secs (3 attacks in 243 secs, after 2 abuses over 634 secs.) Jan 27 11:53:44 firewall sshguard: 22.214.171.124: unblocking after 274 secs Jan 27 11:53:44 firewall sshguard: 126.96.36.199: unblocking after 254 secs Jan 27 12:20:21 firewall kernel: sonewconn: pcb 0xeeec1148: pru_attach() failed Jan 27 12:20:21 firewall kernel: sonewconn: pcb 0xeeec1148: pru_attach() failed Jan 27 12:20:21 firewall kernel: sonewconn: pcb 0xeeec1148: pru_attach() failed (...)
The last line was repeated 3,510 times over 4 minutes with "0xeeec1148" changing to a different value a few times.
The problem was discovered quickly as squid web proxy stopped working and therefore internet browsing.
As I could access the firewall neither via ssh nor https I power cycled it and it has been behaving since (2+ hours).
Before the reboot it was up for many months without noticeable issues.
As far as I can tell no other logs add anything useful to the above.
I'm not sure whether shhguard activity was the direct cause. It's been quite evenly spread across Jan 2023 with about 76,000 "Attack from" flagged in total.
Is that fairly normal?
Any idea why the firewall crashed and needed a power cycle?
no clue about the error.
but i advise to remove the pass rules on your wan for ssh / webgui access => use vpn
It doesn't look like sshguard was directly involved in preventing Squid respond but you should not be seeing those SSH attacks like that.
@heper Web GUI has never been exposed to the world directly, only over VPN or SSH port forwarding.
Do not allow SSH connections from any remote IP. If you must have it open limit the source IPs that can connect.
@stephenw10 said in firewall unresponsive - kernel: sonewconn: pcb: pru_attach() failed:
you should not be seeing those SSH attacks like that.
What can I realistically do about it if I need to keep SSH port open to the world (it's not 22 BTW)?
I've yet to see a good reason to have it open to any source IP but it should at least be key only if it must be.
But any restriction to the source IP would help there. Use dyndns if you need to connect from unknown IPs. Try geo-restricting it with a pfBlocker alias.
It's all useful advise and thanks for that but how about:
kernel: sonewconn: pcb 0xeeec1148: pru_attach() failed
@adamw the first thing google told me was a BSD mailing list from 2017 - indicating this can happen when the system ran out of memory.
You could check the graphs
Interestingly there is no data between 0:00 sharp and the power cycle at 12:25:
Mmm, if it stopped logging data at the same point that's probably exhaustion of something. Drive space maybe? I would expect it to stopped logging at all if that was the case though.
@adamw it also indicates only 2-4% free memory before rrd data stopped ...
# df -h Filesystem Size Used Avail Capacity Mounted on /dev/diskid/DISK-XXXXXXXXXXXXXXXXX 28G 5.9G 20G 23% / devfs 1.0K 1.0K 0B 100% /dev /dev/diskid/DISK-YYYYYYYYYYYYYYYY 34M 2.0M 32M 6% /boot/u-boot tmpfs 4.0M 148K 3.9M 4% /var/run devfs 1.0K 1.0K 0B 100% /var/dhcpd/dev
System log was populated the whole time with no unusual entries around midnight.
Anything logged when it stops updating the RRD files at 0:00?
Nothing in /var/log/system.log(s)
Anywhere else to check?
Not really, I'd expect to see something there if the RRD update script stopped and it was still logging at all.
Is this the first time you've seen this?
First time I've seen the firewall crashing like that and producing "kernel: sonewconn: pcb: pru_attach() failed".
Before the crash the uptime was 257 days. When looking at 1 year memory usage graph some slow build ups can be observed:
Hmm, well I would upgrade to 22.05. Or you could wait for 23.01 at this point.
Is there any reason you're still running 22.01?
I have 3 x Netgate 3100 appliances. 2 live and one spare. One of the live ones is located in a distant datacenter so upgrading it remotely is too risky.
Typically I upgrade all 3 firewalls only about once per year when I have other reasons to travel to the dc. I import config to the spare one and just physically swap them around followed by some testing. If anything goes wrong then I just swap them back.
Unless the issue comes back I'll wait for the next major release with the first follow up update.