21.02 Sudden lockup
-
@kphillips I have to add my name into the packet loss issue. I've had this SG-3100 since approx 2019 and across multiple ISPs I've only had one instance of packet loss and that was not pfSense related. After disabling pfBlockerNG-devel I have so far had 1 or 2 complete lockup and today had 2 instances of 90%+ packetloss over my IPV4 main gateway and the overlying IPv6 over v4 tunnel which exits the same gateway. No CPU spikes that I could see.
-
@bldnightowl Even if we had tested pfBlockerNG-devel, it wouldn't have caused the issue unless the firewall was under moderate to heavy load. This was never pfBlockerNG's fault, but was a problem with the filter reload which pfBlockerNG was triggering more often than was normal. I expect we'll be adding more packet-driven stress tests to our list of things to do in any future releases and will be using any and all problems discovered to improve our testing matrix.
Thank you again to everyone for your patience while we work on this. Have a great weekend and stay safe.
-
@nick108 Please open a ticket with our support team. If you have truly hit a bug, we'd like to know about it so that we can make sure any revised release includes it.
-
@kphillips Thank you. Are you in a position to be able to share an ETA?
-
Not beyond 'as soon as possible' unfortunately.
We have to confirm that is the cause and that the listed fix works as expected. Generate new images and test them.
More info to follow as we get it.
Steve
-
@stephenw10 said in 21.02 Sudden lockup:
Not beyond 'as soon as possible' unfortunately.
We have to confirm that is the cause and that the listed fix works as expected. Generate new images and test them.
More info to follow as we get it.
Steve
Pretty much this. The patch I mentioned earlier was literally put into code less than an hour ago. We don't want to make anything worse by just sending it.
-
@kphillips said in 21.02 Sudden lockup:
@router Packet loss is not a symptom of this issue. The SG-3100 would completely freeze up and force a reboot. If you have packet loss, its not 21.02 most likely. Check your gateway monitoring.
Thanks for the reply. However,
Nothing was changed other than this update.
When pfsense is bypassed all the issues go away.I guess we are left with no choice other than to revert or switch to opnsense.
-
@bldnightowl Mission critical software development, testing, deployment, and support has been my career since 1985. First rule for applying an update: develop and test a contingency plan to ensure you can fallback to a known operational state if anything goes wrong during or after the update.
You can blame Netgate for missing a defect but we are all responsible for ensuring that we follow industry established best practices to ensure a rapid recovery from an unplanned event or disaster. If I upgraded my environment without testing or having a fallback plan and I am suffering from unexpected or degraded performance, I am to blame - not Netgate.
If I don’t have the time or resources to test or recover then I wait to upgrade and monitor forums like this to see if there are any issues that may impact my environment.
Finally, thank you very much to everyone who did discover this issue and provided the valuable information for Netgate to address this issue.
-
@router You should open a new thread for that because what you're seeing there is not the pf reload issue this thread is documenting.
Packet loss from the SG-3100 is not an issue we are aware of so if you are hittinf something new then we need info about that in order to address it.
But you can certainly re-install 2.4.5p1 until we have an update ready. Just open a ticket with us:
https://go.netgate.com/Steve
-
Is there somewhere in particular we can check to see when this major issue is resolved? Like a release page or email?
-
@router https://redmine.pfsense.org/issues/11444
-
Yup there. But we will also post here and on the blog etc. It should be pretty hard to miss.
Steve
-
Thank you for getting this resolved. I've been quietly waiting. Thankfully manual restarts haven't caused me too much grief and my remote instance held strong somehow (which gets much more traffic.)
-
Is Snort still not working or is it just not starting correctly?
-
The 21.02p1 update released today addresses only https://redmine.pfsense.org/issues/11444.
By doing we could get the required testing completed faster the fix released to impacted users.
The issue with the Snort package is being worked on now. Since that is a package issue it can be fixed outside of the pfSense releases.
Steve
-
@stephenw10 Understood. Be nice if we could communicate a more global situational awareness...
-
@rloeb said in 21.02 Sudden lockup:
Snort
See the Redmine entry for that in this thread.
Netgate: kudos on the super fast turnaround on the SG-3100 fix.
-
@teamits said in 21.02 Sudden lockup:
Netgate: kudos on the super fast turnaround on the SG-3100 fix.
This!
-
Blog post here for more details: https://www.netgate.com/blog/pfsense-obscure-bugs-and-code-wizards.html
How can I find out when the snort issue has been fixed?
Thanks for the fix.
-
@alpharulez said in 21.02 Sudden lockup:
Blog post here for more details: https://www.netgate.com/blog/pfsense-obscure-bugs-and-code-wizards.html
How can I find out when the snort issue has been fixed?
Thanks for the fix.
You can track the Snort issue here: https://redmine.pfsense.org/issues/11466. This one may take several days to figure out and clear up.