firewall unresponsive - kernel: sonewconn: pcb: pru_attach() failed

heper

@adamw the first thing google told me was a BSD mailing list from 2017 - indicating this can happen when the system ran out of memory.

You could check the graphs

adamw

@heper

Interestingly there is no data between 0:00 sharp and the power cycle at 12:25:

stephenw10

Mmm, if it stopped logging data at the same point that's probably exhaustion of something. Drive space maybe? I would expect it to stopped logging at all if that was the case though.

heper

@adamw it also indicates only 2-4% free memory before rrd data stopped ...

adamw

# df -h
Filesystem                             Size    Used   Avail Capacity  Mounted on
/dev/diskid/DISK-XXXXXXXXXXXXXXXXX     28G    5.9G     20G    23%    /
devfs                                  1.0K    1.0K      0B   100%    /dev
/dev/diskid/DISK-YYYYYYYYYYYYYYYY      34M    2.0M     32M     6%    /boot/u-boot
tmpfs                                  4.0M    148K    3.9M     4%    /var/run
devfs                                  1.0K    1.0K      0B   100%    /var/dhcpd/dev

System log was populated the whole time with no unusual entries around midnight.

stephenw10

Anything logged when it stops updating the RRD files at 0:00?

adamw

@stephenw10

Nothing in /var/log/system.log(s)
Anywhere else to check?

stephenw10

Not really, I'd expect to see something there if the RRD update script stopped and it was still logging at all.
Is this the first time you've seen this?

adamw

@stephenw10
First time I've seen the firewall crashing like that and producing "kernel: sonewconn: pcb: pru_attach() failed".

Before the crash the uptime was 257 days. When looking at 1 year memory usage graph some slow build ups can be observed:

stephenw10

Hmm, well I would upgrade to 22.05. Or you could wait for 23.01 at this point.

Is there any reason you're still running 22.01?

adamw

@stephenw10

I have 3 x Netgate 3100 appliances. 2 live and one spare. One of the live ones is located in a distant datacenter so upgrading it remotely is too risky.

Typically I upgrade all 3 firewalls only about once per year when I have other reasons to travel to the dc. I import config to the spare one and just physically swap them around followed by some testing. If anything goes wrong then I just swap them back.

Unless the issue comes back I'll wait for the next major release with the first follow up update.