21.02 Sudden lockup
-
When your device "locks up", can someone go into the serial console, choose option 8 for the shell, and run the following?
php /usr/local/www/status.php
then run
cp /tmp/status_output.tgz /root/
After this, reboot your firewall, go into the webConfigurator --> Diagnostics --> Command Prompt and paste in /root/status_output.tgz into the "Download File" prompt and download the file from the firewall.
If you can then open a ticket with our support providing this file, it will hopefully help us figure out what's going on here.
Thank you everyone for your patience.
-
@kphillips @mcury I wanted to come back after a few hours... I was able to finally warm-up, take a nice warm shower, and have a hot meal. Things down in the south (Houston, TX) are not good with this winter storm.
I have not had a hard lockup after uninstalling pfblockerng (Several hours now)
-
@ffuentes Same here.
-
I disabled pfblockerng and it lasts longer but eventually still locks up, of course, i have many devices connected all the time. I have the serial console connected and will snag the status info once it locks again.
-
@softcoder I had the same issue, I had to uninstall it. Disabling did nothing for me.
Try uninstalling it. -
@kphillips I made an attempt...
Since my last post. I followed the manual and reinstalled 21.02 via USB stick.I then restored my previously saved configuration.
After restoring the config and logging in I had 2 errors
Feb 18 17:08:43 php-fpm 97269 /rc.update_urltables: : ERROR: could not update pfB_PRI1_v4 content from https://127.0.0.1:443/pfblockerng/pfblockerng.php?pfb=pfB_PRI1_v4 <br />[ Abuse_Feodo_C2_v4, Abuse_IPBL_v4, Abuse_SSLBL_v4, CINS_army_v4, ET_Block_v4, ET_Comp_v4, ISC_1000_30_v4, ISC_Block_v4, Spamhaus_Drop_v4, Spamhaus_eDrop_v4, Talos_BL_v4 ] Feb 18 17:08:43 php-fpm 97269 /rc.update_urltables: Download file failed with status code 404. URL: https://127.0.0.1:443/pfblockerng/pfblockerng.php?pfb=pfB_PRI1_v4 <br />[ Abuse_Feodo_C2_v4, Abuse_IPBL_v4, Abuse_SSLBL_v4, CINS_army_v4, ET_Block_v4, ET_Comp_v4, ISC_1000_30_v4, ISC_Block_v4, Spamhaus_Drop_v4, Spamhaus_eDrop_v4, Talos_BL_v4 ]
I found under Firewall->Aliases a leftover pfblockerng reference.
I deleted the alias entry and upon saving, pfsense 'crashed'. I then rebooted via serial console.
Back online I tried the access the firewall logs status->system logs->firewall and the GUI just hangs. After 90 seconds or so I can access the dashboard again and have
"Netgate pfSense Plus has detected a crash report or programming bug. Click here for more information."
[18-Feb-2021 18:44:58 America/Los_Angeles] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20415890 bytes) in /usr/local/www/csrf/csrf-magic.php on line 161
This is 100% consistent. I can not access the firewall logs at all. (although the widget on the dashboard is working)
This appears separate from the locking up crash this thread deals with.Pfsense ran for about 45 minutes then crashed. I attempted via serial console to execute your commands however the first command php /usr/local/www/status.php returned "Gathering Status Data..." and hung.
I let it sit for several minutes with no change and no response from the putty console. I closed the connection and reconnected but now could not even access the console menu. It was completely locked. I had to reboot via the reset button on the back of the 3100.
Once I rebooted, I was able to run the commands via the console. I have the output, but I don't know if it is what you are looking for.
I'm on the verge of rolling back to 2.4.5
-
I'm fairly certain I've tracked this one down, after spending a few hours trying various attempts to reproduce on my own SG-3100. I have shared my test results with our developer team. I will update when I have more information or a resolution for you all.
-
@kphillips Is this on redmine to track?
-
@kphillips awesome news. Thanks for the update.
-
I can confirm the same results, SG-3100 locked and executing the: php /usr/local/www/status.php
I saw: "Gathering Status Data..."
and hung, I disconnected and try to reconnect over serial but no response at all from the SG-3100 anymore even though the LEDs were normal.
-
I had the same issue until uninstalling pfblockerng
"""
Configuring firewall.Segmentation fault (core dumped)
""" -
I have the same issue with my SG-3100. Prior to upgrading from 2.4.5p1, I removed all packages (pfBlockerNG-devel, pimd, Avahi, openvpn-client-export). After the upgrade I only reinstalled pimd and Avahi).
After roughly 18 hours, the SG-3100 locked up. It was not accessible and could no longer route traffic. I recycled the power once and it came back online OK. I then rebooted once more just to make sure it had gone through a clean reboot cycle. It's been up and running fine for about 18 hours, although based on this thread, I'm guessing it will randomly happen again until a fix is applied.
-
Since a series of lockups I've been running continuously for 42 hours. However, pfBlockerNG is running; Snort is not running and keeps attempting to update:
Feb 19 00:35:00 php 27294 [Snort] Alert tcpdump packet capture file cleanup job removed 1 tcpdump packet capture file(s) from /var/log/snort/snort_mvneta256093/...
Feb 19 00:30:33 kernel pid 49817 (php), jid 0, uid 0: exited on signal 11 (core dumped)
Feb 19 00:30:30 php 49817 [Snort] Building new sid-msg.map file for WAN...
Feb 19 00:30:29 php 49817 [Snort] Enabling any flowbit-required rules for: WAN...
Feb 19 00:30:29 php 49817 [Snort] Enabling any flowbit-required rules for: WAN...
Feb 19 00:30:23 php 49817 [Snort] Updating rules configuration for: WAN ...
Feb 19 00:30:19 php 49817 [Snort] Emerging Threats Open rules file update downloaded successfully
Feb 19 00:30:18 php 49817 [Snort] There is a new set of Emerging Threats Open rules posted. Downloading emerging.rules.tar.gz...
Feb 19 00:30:17 php 49817 [Snort] Snort GPLv2 Community Rules are up to date...
Feb 19 00:30:17 php 49817 [Snort] Snort AppID Open Text Rules are up to date...
Feb 19 00:30:17 php 49817 [Snort] Snort OpenAppID detectors are up to date...
Feb 19 00:30:15 php 49817 [Snort] Snort Subscriber rules are up to date...Don't know if this info is helpful.
-
Hello All,
Just a quick update that our dev team is going to be pulling 21.02 for the SG-3100 shortly. The issue is with the Filter Reload when the firewall is under moderate to heavy load.
If you push moderate to heavy traffic from LAN out your WAN, then go to Status --> Filter Reload and manually reload the filter, it will freeze sometimes. This problem ONLY affects the SG-3100 AFAIK and we'll have a hotfix out for it ASAP. The reason people with pfBlockerNG were seeing this more often is that pfBlockerNG reloads the filter whenever it reloads its Aliases for blocklists it creates, so the issue would crop up more often. This was a tricky one to track down and took a few hours last night to nail, but I finally was able to reproduce it reliably by heavily running iPerf3 traffic from LAN to WAN.
I'll update you all again when I have more. If you want to revert to 2.4.5p1, please PM me or open a ticket with support and we can provide an image to roll back with. Otherwise, we hope to push out a patch very, very soon.
Thanks again for your patience everyone and your help with tracking this down.
-
Thank you for your diligence and perseverance!
-
If you push moderate to heavy traffic from LAN out your WAN, then go to Status --> Filter Reload and manually reload the filter, it will freeze sometimes.
FWIW, last night disabled IP blocking in pfBlockerNG and things have been stable since. Sounds like this may limit impact of the the issue since hourly filter reloads will no longer occur?
No changes to Firewall rules, skipping Filter Reload
-
@ryanbe said in 21.02 Sudden lockup:
Sounds like this may limit impact of the the issue since hourly filter reloads will no longer occur?
Exactly. Whilst it appeared that pfBlocker was causing this initially it now seems that was only because it periodically reloads the ruleset. Without that happening you will probably never see this issue.
Steve
-
@stephenw10 what about SNORT not loading on SG3100 running 21.02?
-
@stephenw10 I've still occasionally seen this but only when running speed tests. That has been a fairly consistent way to replicate the issue even with pfBlockerNG-devel disabled and running dnsmasq vs unbound.
To note, I have also seen the 3100 not wanting to come back up after a hard power reset and it took 3+ resets and/or running
/etc/rc.reload_all
via console to get things running again/even get into the Web UI, though pfBlocker/unbound were the active processes still at this time. -
Exactly. Whilst it appeared that pfBlocker was causing this initially it now seems that was only because it periodically reloads the ruleset. Without that happening you will probably never see this issue.
Well, guess this was wishful thinking on my part. Just crashed right when my kids joined their school Zoom sessions - I had been on a Zoom for 15 minutes with no issues until the additional load was added. pfBlockerNG was not running an update at that time.