Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-3100 stops responding every 2 days on 24.03

    General pfSense Questions
    3
    13
    621
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      andrew_cb
      last edited by

      We have a 3100 was upgraded to 24.03 on May 28th, and since then it stops responding every ~2 days and needs to be power-cycled to get it working. This has happened 3 times now.

      The last entries in the system log are

      kernel: fq_codel_new_sched cannot allocate memory for fq_codel configuration parameters
      kernel: si_new     new_sched error
      

      CoDel limiters were configured but yesterday I deleted the traffic limiters and floating firewall rules, which obviously didn't help.

      The firewall is monitored using Zabbix, and looking at the graphs shows that memory usage climbs steadly from reboot to the time the firewall stops responding:
      a0f04857-c3af-4b63-8809-d564220fc635-Memory wired.png

      The free memory is above 1.2GB during this time:
      a79ede3c-54cc-4307-a028-6f44cffa73b2-Memory free.png

      Other graphs don't seem to indicate anything that would explain this (MBUFs and states remain relatively flat).

      I've attached the other Zabbix graphs in case they are useful:
      graphs.zip

      After reboot, the dashboard had a PHP crash error log that contained 3 lines and then 8 lines of

      PHP Fatal error:  Unable to start pfSense module in Unknown on line 0
      

      I also have the logs from the device, but I don't want to post them publicy as I'd need to go through and remove any sensitive information first. I can PM them to someone if they want to have a look.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Run top -HaSP and sort by memory usage. See what is growing.

        A 1 Reply Last reply Reply Quote 0
        • A
          andrew_cb @stephenw10
          last edited by andrew_cb

          @stephenw10
          I see that since the upgrade to 24.03, the system log if full of these messages every 10-30 seconds:

          ugen1.2: <CPS ST Series> at usbus1
          ugen1.2: <CPS ST Series> at usbus1 (disconnected)
          

          From what I can tell, this is a Cyberpower UPS connected to the USB port. I've found some reports that some USB devices, including some Cyberpower units, will reset if they don't establish a connection within a certain amount of time.

          Could this repeated USB connection & disconnection cause a memory leak of some kind?

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Potentially it could. Try unplugging it and see if the leak stops.

            I assume you are running NUT? Do you have the current package version installed?

            A 1 Reply Last reply Reply Quote 0
            • A
              andrew_cb @stephenw10
              last edited by

              @stephenw10 The UPS is at a remote site, so I can't unplug the UPS.
              I was able to disable the USB port using usbconfig and that has stopped the log entries.

              I tried to setup NUT but it wouldn't connect to the UPS. I don't know if it couldn't connect because the USB connection kept resetting so frequently or if that is unrelated. I tried changing the polling settings to some suggestions I found but that didn't help. NUT is currently uninstalled.

              S 1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, OK, interesting. Well I guess you'll know in a few hours.

                1 Reply Last reply Reply Quote 0
                • S
                  SteveITS Galactic Empire @andrew_cb
                  last edited by

                  @andrew_cb I vaguely recall posts about Zabbix being a problem of some kind, but I don't use it and couldn't find it in a quick search. This may just be a red herring, and if so I apologize, but you could try disabling that for a while and just looking at pfSense's graphs.

                  Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                  When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                  Upvote 👍 helpful posts!

                  A 1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Could be this if Zabbix is using SNMP: https://redmine.pfsense.org/issues/15481

                    1 Reply Last reply Reply Quote 0
                    • A
                      andrew_cb @SteveITS
                      last edited by andrew_cb

                      @SteveITS @stephenw10

                      We have Zabbix on 40 other Netgates without issue, including two SG-3100 running 24.03.

                      I don't think we're doing any SNMP monitoring, just Zabbix Agent (active) and Zabbix Proxy both running on the firewalls.

                      Memory usage is holding flat (it's actually decreased slightly) so it might be that disabling USB to workaround the UPS issues is the fix?

                      I will be interesting to see how it looks tomorrow morning.

                      1 Reply Last reply Reply Quote 1
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Mmm, interesting indeed. It's not something we'd ever normally see so it could have a leak that's simply never been hit.

                        1 Reply Last reply Reply Quote 0
                        • A
                          andrew_cb
                          last edited by

                          So disabling the "flapping" USB seems to have resolved the escalating memory usage. I don't know if the non-responsive issue is resolved though, as we replaced the affected unit earlier today with a 4100 because couldn't "see how it goes" and risk any further interruptions and downtime at this customer.

                          Past 90 days. The memory usage on all 10 of our SG-3100 units was flat and nearly identical.
                          09873ac9-cd45-4e70-93a7-ff5a7487662f-image.png

                          Past 11 days. The affected unit was upgraded on 05/28 and the USB ports were disabled on 06/04, and the memory usage remained flat afterward.
                          89a77b27-b0f1-4c59-bfa3-0a3adf9543d1-image.png

                          I should be able to play with the 3100 next week and will try to reproduce the issue on the test bench. Hopefully, that will shed light on what's happening and lead to identifying the root cause.

                          1 Reply Last reply Reply Quote 1
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Interesting. How did you disable the USB exactly in that case?

                            1 Reply Last reply Reply Quote 0
                            • A
                              andrew_cb
                              last edited by

                              I don't recall the exact command but it was something like

                              usbconfig -i ugen0.2 detach_kernel_driver
                              
                              1 Reply Last reply Reply Quote 1
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.