firewall unresponsive - kernel: sonewconn: pcb: pru_attach() failed

adamw

System Netgate 3100
Version 22.01-RELEASE (arm)
built on Mon Feb 07 16:39:01 UTC 2022
FreeBSD 12.3-STABLE

Sequence of events from system log:

Jan 27 11:49:10 firewall sshd[57894]: Received disconnect from 52.160.46.145 port 55774:11: Bye Bye [preauth]
Jan 27 11:49:10 firewall sshd[57894]: Disconnected from authenticating user root 52.160.46.145 port 55774 [preauth]
Jan 27 11:49:10 firewall sshguard[25971]: Attack from "52.160.46.145" on service SSH with danger 10.
Jan 27 11:49:10 firewall sshguard[25971]: Blocking "52.160.46.145/32" for 240 secs (3 attacks in 224 secs, after 2 abuses over 656 secs.)
Jan 27 11:49:30 firewall sshd[58241]: Received disconnect from 209.141.45.231 port 46824:11: Bye Bye [preauth]
Jan 27 11:49:30 firewall sshd[58241]: Disconnected from authenticating user root 209.141.45.231 port 46824 [preauth]
Jan 27 11:49:30 firewall sshguard[25971]: Attack from "209.141.45.231" on service SSH with danger 10.
Jan 27 11:49:30 firewall sshguard[25971]: Blocking "209.141.45.231/32" for 240 secs (3 attacks in 243 secs, after 2 abuses over 634 secs.)
Jan 27 11:53:44 firewall sshguard[25971]: 52.160.46.145: unblocking after 274 secs
Jan 27 11:53:44 firewall sshguard[25971]: 209.141.45.231: unblocking after 254 secs
Jan 27 12:20:21 firewall kernel: sonewconn: pcb 0xeeec1148: pru_attach() failed
Jan 27 12:20:21 firewall kernel: sonewconn: pcb 0xeeec1148: pru_attach() failed
Jan 27 12:20:21 firewall kernel: sonewconn: pcb 0xeeec1148: pru_attach() failed
(...)

The last line was repeated 3,510 times over 4 minutes with "0xeeec1148" changing to a different value a few times.

The problem was discovered quickly as squid web proxy stopped working and therefore internet browsing.
As I could access the firewall neither via ssh nor https I power cycled it and it has been behaving since (2+ hours).
Before the reboot it was up for many months without noticeable issues.

As far as I can tell no other logs add anything useful to the above.

I'm not sure whether shhguard activity was the direct cause. It's been quite evenly spread across Jan 2023 with about 76,000 "Attack from" flagged in total.
Is that fairly normal?

Any idea why the firewall crashed and needed a power cycle?

heper

@adamw
no clue about the error.
but i advise to remove the pass rules on your wan for ssh / webgui access => use vpn

stephenw10

Yup, that.
It doesn't look like sshguard was directly involved in preventing Squid respond but you should not be seeing those SSH attacks like that.

adamw

@heper Web GUI has never been exposed to the world directly, only over VPN or SSH port forwarding.

stephenw10

Do not allow SSH connections from any remote IP. If you must have it open limit the source IPs that can connect.

adamw

@stephenw10 said in firewall unresponsive - kernel: sonewconn: pcb: pru_attach() failed:

you should not be seeing those SSH attacks like that.

What can I realistically do about it if I need to keep SSH port open to the world (it's not 22 BTW)?

stephenw10

I've yet to see a good reason to have it open to any source IP but it should at least be key only if it must be.
But any restriction to the source IP would help there. Use dyndns if you need to connect from unknown IPs. Try geo-restricting it with a pfBlocker alias.

adamw

It's all useful advise and thanks for that but how about:

kernel: sonewconn: pcb 0xeeec1148: pru_attach() failed

?

heper

@adamw the first thing google told me was a BSD mailing list from 2017 - indicating this can happen when the system ran out of memory.

You could check the graphs

adamw

@heper

Interestingly there is no data between 0:00 sharp and the power cycle at 12:25:

stephenw10

Mmm, if it stopped logging data at the same point that's probably exhaustion of something. Drive space maybe? I would expect it to stopped logging at all if that was the case though.

heper

@adamw it also indicates only 2-4% free memory before rrd data stopped ...

adamw

# df -h
Filesystem                             Size    Used   Avail Capacity  Mounted on
/dev/diskid/DISK-XXXXXXXXXXXXXXXXX     28G    5.9G     20G    23%    /
devfs                                  1.0K    1.0K      0B   100%    /dev
/dev/diskid/DISK-YYYYYYYYYYYYYYYY      34M    2.0M     32M     6%    /boot/u-boot
tmpfs                                  4.0M    148K    3.9M     4%    /var/run
devfs                                  1.0K    1.0K      0B   100%    /var/dhcpd/dev

System log was populated the whole time with no unusual entries around midnight.

stephenw10

Anything logged when it stops updating the RRD files at 0:00?

adamw

@stephenw10

Nothing in /var/log/system.log(s)
Anywhere else to check?

stephenw10

Not really, I'd expect to see something there if the RRD update script stopped and it was still logging at all.
Is this the first time you've seen this?

adamw

@stephenw10
First time I've seen the firewall crashing like that and producing "kernel: sonewconn: pcb: pru_attach() failed".

Before the crash the uptime was 257 days. When looking at 1 year memory usage graph some slow build ups can be observed:

stephenw10

Hmm, well I would upgrade to 22.05. Or you could wait for 23.01 at this point.

Is there any reason you're still running 22.01?

adamw

@stephenw10

I have 3 x Netgate 3100 appliances. 2 live and one spare. One of the live ones is located in a distant datacenter so upgrading it remotely is too risky.

Typically I upgrade all 3 firewalls only about once per year when I have other reasons to travel to the dc. I import config to the spare one and just physically swap them around followed by some testing. If anything goes wrong then I just swap them back.

Unless the issue comes back I'll wait for the next major release with the first follow up update.