21.02 Sudden lockup
-
Does anyone know if Suricata on 21.02 is impacted the same as Snort? Thanks!
-
On the SG-3100 it would be, in blocking mode at least. Like Snort it has to reload the ruleset whenever a new IP is added to the block table.
Steve
-
@stephenw10 ok thanks for the response
Will hold fire. -
....unless you're seeing this: https://redmine.pfsense.org/issues/11466
That applies to Snort only. -
Hello, is there an update on this issue?
I'm experiencing major packet loss and unable to download new packages.
I've already added
hw.ncpu=1
to/boot/loader.conf.local
.
This had no noticeable affect.Our systems are completely degraded by this issue. We cannot handle the risk and downtime required to reinstall. This is a major impact for us.
Thanks...
-
@router Packet loss is not a symptom of this issue. The SG-3100 would completely freeze up and force a reboot. If you have packet loss, its not 21.02 most likely. Check your gateway monitoring.
-
Well, add me to the list of people that downgraded back to 2.4.5p1. And that was quite a hassle/nightmare by itself. Even with pfBlockerNG removed (which was an unsustainable solution for any period of time, of course), the system was still freezing or behaving erratically at times. Getting the packages back to the way they were pre-21.02 did not automatically happen as it should have -- and I had to manually intervene several times. This failed upgrade cost me a couple of days at least of my time, and like others I am very unhappy about that.
I am a software engineer too and understand how very hard it is to test field configurations for an extremely customizable product. So I'm not trying to make anyone at Netgate fill badly --- but this was pretty disastrous for many users, and a a detailed post mortem explaining what went wrong, why and how it will be avoided in the future would be hugely appreciated. For example, it appears your QA did not have pfBlockerNG(-devel) (which I would be willing to guess is in very widespread use) properly in its standard performance testsuite. I hope that's been rectified.
Thanks for the hard work and responsiveness when things did blow up, particularly you moderators on the front lines absorbing all the screams from your users. And especially to those of you responding while impacted by the much worse disasters in Texas.
-
In case anyone is wondering what the root cause of the SG-3100 locking up was, here is the FreeBSD compiler issue that has been fixed and will be used for the fixed release when it comes out. Dev team has been working hard over the weekend on this one.
https://reviews.freebsd.org/D28821
-
@kphillips I have to add my name into the packet loss issue. I've had this SG-3100 since approx 2019 and across multiple ISPs I've only had one instance of packet loss and that was not pfSense related. After disabling pfBlockerNG-devel I have so far had 1 or 2 complete lockup and today had 2 instances of 90%+ packetloss over my IPV4 main gateway and the overlying IPv6 over v4 tunnel which exits the same gateway. No CPU spikes that I could see.
-
@bldnightowl Even if we had tested pfBlockerNG-devel, it wouldn't have caused the issue unless the firewall was under moderate to heavy load. This was never pfBlockerNG's fault, but was a problem with the filter reload which pfBlockerNG was triggering more often than was normal. I expect we'll be adding more packet-driven stress tests to our list of things to do in any future releases and will be using any and all problems discovered to improve our testing matrix.
Thank you again to everyone for your patience while we work on this. Have a great weekend and stay safe.
-
@nick108 Please open a ticket with our support team. If you have truly hit a bug, we'd like to know about it so that we can make sure any revised release includes it.
-
@kphillips Thank you. Are you in a position to be able to share an ETA?
-
Not beyond 'as soon as possible' unfortunately.
We have to confirm that is the cause and that the listed fix works as expected. Generate new images and test them.
More info to follow as we get it.
Steve
-
@stephenw10 said in 21.02 Sudden lockup:
Not beyond 'as soon as possible' unfortunately.
We have to confirm that is the cause and that the listed fix works as expected. Generate new images and test them.
More info to follow as we get it.
Steve
Pretty much this. The patch I mentioned earlier was literally put into code less than an hour ago. We don't want to make anything worse by just sending it.
-
@kphillips said in 21.02 Sudden lockup:
@router Packet loss is not a symptom of this issue. The SG-3100 would completely freeze up and force a reboot. If you have packet loss, its not 21.02 most likely. Check your gateway monitoring.
Thanks for the reply. However,
Nothing was changed other than this update.
When pfsense is bypassed all the issues go away.I guess we are left with no choice other than to revert or switch to opnsense.
-
@bldnightowl Mission critical software development, testing, deployment, and support has been my career since 1985. First rule for applying an update: develop and test a contingency plan to ensure you can fallback to a known operational state if anything goes wrong during or after the update.
You can blame Netgate for missing a defect but we are all responsible for ensuring that we follow industry established best practices to ensure a rapid recovery from an unplanned event or disaster. If I upgraded my environment without testing or having a fallback plan and I am suffering from unexpected or degraded performance, I am to blame - not Netgate.
If I don’t have the time or resources to test or recover then I wait to upgrade and monitor forums like this to see if there are any issues that may impact my environment.
Finally, thank you very much to everyone who did discover this issue and provided the valuable information for Netgate to address this issue.
-
@router You should open a new thread for that because what you're seeing there is not the pf reload issue this thread is documenting.
Packet loss from the SG-3100 is not an issue we are aware of so if you are hittinf something new then we need info about that in order to address it.
But you can certainly re-install 2.4.5p1 until we have an update ready. Just open a ticket with us:
https://go.netgate.com/Steve
-
Is there somewhere in particular we can check to see when this major issue is resolved? Like a release page or email?
-
@router https://redmine.pfsense.org/issues/11444
-
Yup there. But we will also post here and on the blog etc. It should be pretty hard to miss.
Steve