21.02 Sudden lockup
-
Installed 21.02 about 3 hours ago. 30 minutes ago our SG-3100 became unresponsive. Had to power off and restart. Not sure how to diagnose what might have happened. Running snort and pfBlockerNG on this device.
-
First step is to connect to the console and leave a terminal open monitoring the console output in case it happens again. Set the terminal program to log in case it might scroll beyond your terminal's buffer size.
-
Same here. I Upgrade to 21.02 (SG-3100) 2h ago and since that the device became totally unresponsive already the 3rd time. Console disconnects only thing helping is a powercycle. Will revert now back to prev. version.
Only things I found is a lot of strange non functional things in the system log:
Feb 17 23:24:01 nginx 2021/02/17 23:24:01 [error] 37560#100148: send() failed (54: Connection reset by peer) Feb 17 23:23:59 root 71184 /etc/rc.d/hostid: WARNING: hostid: unable to figure out a UUID from DMI data, generating a new one Feb 17 23:23:58 php 375 rc.bootup: The command '/usr/sbin/powerd -b 'adp' -a 'adp' -n 'adp'' returned exit code '69', the output was 'powerd: no cpufreq(4) support -- aborting: No such file or directory' Feb 17 23:23:55 kernel matchaddr failed Feb 17 23:23:54 kernel matchaddr failed Feb 17 23:23:53 kernel matchaddr failed Feb 17 23:23:54 php 375 rc.bootup: The command '/usr/local/sbin/strongswanrc stop' returned exit code '1', the output was 'strongswan not running? (check /var/run/daemon-charon.pid).' Feb 17 23:23:52 kernel .. Feb 17 23:23:52 kernel . Feb 17 23:23:50 kernel . Feb 17 23:23:33 kernel route: writing to routing socket: Network is unreachable
-
Same occurred here - with the 3100 also.
Once dashboard finally came up, CPU was running at 100%. Disabled running services w/o any drop in CPU cycles. Removing SNORT finally brought CPU cycles down and was able to stabilize and reboot. Now seems to be behaving normally -
But, when I reloaded SNORT, its running but not accessible from the menus....and when I then did it remotely on an second 3100 - got the same behavior.
-
Yeah, I just had the same thing happen. I reported this back in 2.5 beta, it seems to only occur on the 3100 series. I still have IPv4/IPv6 addresses on all of my interfaces, but I get total connectivity failure. I had about 6 hours of uptime before this happened. It's completely random.
I tried to get logs of it, but since its a total loss of network communication, my log servers never get anything, and the local logs never showed anything.
I have no packages running btw, just openvpn export.
A power cycle will fix it, but i used a console connection to manually reboot. Everything came right back up.
I will leave a console session open as jimp has suggested.
-
its indeed really weird. Because it was late yesterday (EU) I wanted to do the rollback / reinstall this morning but it didn't crashed during the night. Maybe because of not much load maybe random?
Only thing I changed yesterday was to disable all packages, which actually werent so much;
pfblocker, avahi, service watchdock, lldp
But after the crash the same picture for me, besides the logs posted above no other clue.
Let's see how it run during the day without packages enabled.
-
Same issue here, install went without issues. Device was working for about 30-45 minutes before it froze/locked up the first time. Now I need to power cycle it every 10-60 minutes. Tried removing all unnecessary packages, but without success.
When it freezes I can't even ping it via LAN.
This has now happened 5 times.I'm opening a support ticket in order to get access to the image, so I can test if reinstall solves the issues...
Netgate: SG-3100.
-
@kuser
Added a ticket and got hold of 21.02 image, reflashed the device and reimported backup.
Same issue after about 65 minutes.
The device doesn't actually freeze, but something happens with internal switch/interface.
It stops responding to WAN/LAN, however usb-console is available.I've requested 2.4.5p1 image from NetGate.
-
Has anyone monitored the console yet when this happens? The system log wouldn't have the same information printed to the console necessarily.
And that also would let you check easily if it's actually locked up vs still being responsive at the console but losing connectivity.
-
I can confirm that the console was available the last time I lost LAN/WAN. I didn't find anything interesting in the logs(dmesg), but I do suspect it might be related to the internal switch. But I'm not really sure I know what I was looking for. I am currently connected to the console and can provide some debug information if it locks up again. Anything particular I should check?
I tried service netif restart but that seemed to hang.
-
Every time this has happened to me the console is accessible. Both interfaces also keep their ipv6/ipv4 addresses. It "feels" like routes are randomly disappearing, but I should still be able to ping stuff on the local connected network if that was the issue, and I can't even do that. Traffic pretty much stops.
-
Try to disable pfblockerng, I'm getting similar behavior, and it's working with it disabled.
-
@behemyth If the console is accessible like you said, can you please provide the output?
-
I am also experiencing these same issues with loss of LAN/WAN on my 3100 after upgrade last night to 21.02. I am not running any special packages aside from DHCP, DNS, NTP and UPnP.
-
Been running for 18+ hours. However, just noticed that Snort is NOT running and aborted just after midnight:
Feb 18 00:30:17 kernel pid 76998 (php), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 18 00:30:14 php 76998 [Snort] Building new sid-msg.map file for WAN...
Something is very wrong with this release!
-
I am waiting for it to happen again - I've had a console open and logging since last night. Once it does I will post the output.
-
@rloeb said in 21.02 Sudden lockup:
Been running for 18+ hours. However, just noticed that Snort is NOT running and aborted just after midnight:
Feb 18 00:30:17 kernel pid 76998 (php), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 18 00:30:14 php 76998 [Snort] Building new sid-msg.map file for WAN...
Something is very wrong with this release!
Getting similar errors but with pfblockerng, during boot.
https://forum.netgate.com/post/964587
Feb 18 02:05:29 kernel pid 49475 (php-fpm), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 02:09:02 kernel pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 02:16:21 kernel pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 02:39:03 kernel pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 02:44:59 kernel pid 377 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 02:52:02 kernel pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped) Feb 18 03:07:38 kernel pid 375 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
-
@mcury Can someone provide serial console output? We've asked for this a few times and until someone gives us diagnostics information we can't move forward.
-
@kphillips said in 21.02 Sudden lockup:
@mcury Can someone provide serial console output? We've asked for this a few times and until someone gives us diagnostics information we can't move forward.
Sure, the only problem during boot is the Configuring Firewall.Segmentation fault (core dumped). This only happens after the pfblocker installation, and after a reboot.
Let me install the pfblockerng-devel again, and reboot to provide you the logs.
One moment please. -
@mcury Thank you. So, to confirm, this issue is only present when you are running pfBlockerNG and you don't experience the issue when you are not running pfBlockerNG?