Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    21.02 Sudden lockup

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    164 Posts 30 Posters 51.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      alpharulez @stephenw10
      last edited by

      @stephenw10 ok thanks for the response 👍
      Will hold fire.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        ....unless you're seeing this: https://redmine.pfsense.org/issues/11466
        That applies to Snort only.

        1 Reply Last reply Reply Quote 0
        • R
          router
          last edited by

          Hello, is there an update on this issue?

          I'm experiencing major packet loss and unable to download new packages.

          I've already added hw.ncpu=1 to /boot/loader.conf.local.
          This had no noticeable affect.

          Our systems are completely degraded by this issue. We cannot handle the risk and downtime required to reinstall. This is a major impact for us.

          Thanks...

          K 1 Reply Last reply Reply Quote 0
          • K
            kphillips Administrator Netgate @router
            last edited by

            @router Packet loss is not a symptom of this issue. The SG-3100 would completely freeze up and force a reboot. If you have packet loss, its not 21.02 most likely. Check your gateway monitoring.

            N R 2 Replies Last reply Reply Quote 0
            • B
              bldnightowl
              last edited by

              Well, add me to the list of people that downgraded back to 2.4.5p1. And that was quite a hassle/nightmare by itself. Even with pfBlockerNG removed (which was an unsustainable solution for any period of time, of course), the system was still freezing or behaving erratically at times. Getting the packages back to the way they were pre-21.02 did not automatically happen as it should have -- and I had to manually intervene several times. This failed upgrade cost me a couple of days at least of my time, and like others I am very unhappy about that.

              I am a software engineer too and understand how very hard it is to test field configurations for an extremely customizable product. So I'm not trying to make anyone at Netgate fill badly --- but this was pretty disastrous for many users, and a a detailed post mortem explaining what went wrong, why and how it will be avoided in the future would be hugely appreciated. For example, it appears your QA did not have pfBlockerNG(-devel) (which I would be willing to guess is in very widespread use) properly in its standard performance testsuite. I hope that's been rectified.

              Thanks for the hard work and responsiveness when things did blow up, particularly you moderators on the front lines absorbing all the screams from your users. And especially to those of you responding while impacted by the much worse disasters in Texas.

              K MaxK 0M 2 Replies Last reply Reply Quote 3
              • K
                kphillips Administrator Netgate
                last edited by

                In case anyone is wondering what the root cause of the SG-3100 locking up was, here is the FreeBSD compiler issue that has been fixed and will be used for the fixed release when it comes out. Dev team has been working hard over the weekend on this one.

                https://reviews.freebsd.org/D28821

                A 1 Reply Last reply Reply Quote 5
                • N
                  nick108 @kphillips
                  last edited by nick108

                  @kphillips I have to add my name into the packet loss issue. I've had this SG-3100 since approx 2019 and across multiple ISPs I've only had one instance of packet loss and that was not pfSense related. After disabling pfBlockerNG-devel I have so far had 1 or 2 complete lockup and today had 2 instances of 90%+ packetloss over my IPV4 main gateway and the overlying IPv6 over v4 tunnel which exits the same gateway. No CPU spikes that I could see.

                  K 1 Reply Last reply Reply Quote 0
                  • K
                    kphillips Administrator Netgate @bldnightowl
                    last edited by

                    @bldnightowl Even if we had tested pfBlockerNG-devel, it wouldn't have caused the issue unless the firewall was under moderate to heavy load. This was never pfBlockerNG's fault, but was a problem with the filter reload which pfBlockerNG was triggering more often than was normal. I expect we'll be adding more packet-driven stress tests to our list of things to do in any future releases and will be using any and all problems discovered to improve our testing matrix.

                    Thank you again to everyone for your patience while we work on this. Have a great weekend and stay safe.

                    1 Reply Last reply Reply Quote 3
                    • K
                      kphillips Administrator Netgate @nick108
                      last edited by

                      @nick108 Please open a ticket with our support team. If you have truly hit a bug, we'd like to know about it so that we can make sure any revised release includes it.

                      1 Reply Last reply Reply Quote 1
                      • A
                        alpharulez @kphillips
                        last edited by alpharulez

                        @kphillips Thank you. Are you in a position to be able to share an ETA?

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Not beyond 'as soon as possible' unfortunately.

                          We have to confirm that is the cause and that the listed fix works as expected. Generate new images and test them.

                          More info to follow as we get it.

                          Steve

                          K 1 Reply Last reply Reply Quote 2
                          • K
                            kphillips Administrator Netgate @stephenw10
                            last edited by

                            @stephenw10 said in 21.02 Sudden lockup:

                            Not beyond 'as soon as possible' unfortunately.

                            We have to confirm that is the cause and that the listed fix works as expected. Generate new images and test them.

                            More info to follow as we get it.

                            Steve

                            Pretty much this. The patch I mentioned earlier was literally put into code less than an hour ago. We don't want to make anything worse by just sending it.

                            1 Reply Last reply Reply Quote 5
                            • R
                              router @kphillips
                              last edited by

                              @kphillips said in 21.02 Sudden lockup:

                              @router Packet loss is not a symptom of this issue. The SG-3100 would completely freeze up and force a reboot. If you have packet loss, its not 21.02 most likely. Check your gateway monitoring.

                              Thanks for the reply. However,

                              Nothing was changed other than this update.
                              When pfsense is bypassed all the issues go away.

                              I guess we are left with no choice other than to revert or switch to opnsense.

                              stephenw10S 1 Reply Last reply Reply Quote 0
                              • MaxK 0M
                                MaxK 0 @bldnightowl
                                last edited by

                                @bldnightowl Mission critical software development, testing, deployment, and support has been my career since 1985. First rule for applying an update: develop and test a contingency plan to ensure you can fallback to a known operational state if anything goes wrong during or after the update.

                                You can blame Netgate for missing a defect but we are all responsible for ensuring that we follow industry established best practices to ensure a rapid recovery from an unplanned event or disaster. If I upgraded my environment without testing or having a fallback plan and I am suffering from unexpected or degraded performance, I am to blame - not Netgate.

                                If I don’t have the time or resources to test or recover then I wait to upgrade and monitor forums like this to see if there are any issues that may impact my environment.

                                Finally, thank you very much to everyone who did discover this issue and provided the valuable information for Netgate to address this issue.

                                1 Reply Last reply Reply Quote 5
                                • stephenw10S
                                  stephenw10 Netgate Administrator @router
                                  last edited by

                                  @router You should open a new thread for that because what you're seeing there is not the pf reload issue this thread is documenting.
                                  Packet loss from the SG-3100 is not an issue we are aware of so if you are hittinf something new then we need info about that in order to address it.
                                  But you can certainly re-install 2.4.5p1 until we have an update ready. Just open a ticket with us:
                                  https://go.netgate.com/

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • R
                                    router
                                    last edited by

                                    Is there somewhere in particular we can check to see when this major issue is resolved? Like a release page or email?

                                    S 1 Reply Last reply Reply Quote 0
                                    • S
                                      softcoder @router
                                      last edited by

                                      @router https://redmine.pfsense.org/issues/11444

                                      1 Reply Last reply Reply Quote 1
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        Yup there. But we will also post here and on the blog etc. It should be pretty hard to miss.

                                        Steve

                                        1 Reply Last reply Reply Quote 1
                                        • J
                                          jkibbey
                                          last edited by

                                          Thank you for getting this resolved. I've been quietly waiting. Thankfully manual restarts haven't caused me too much grief and my remote instance held strong somehow (which gets much more traffic.)

                                          1 Reply Last reply Reply Quote 0
                                          • R
                                            rloeb
                                            last edited by

                                            Is Snort still not working or is it just not starting correctly?

                                            S 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.