Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG3100 keeps locking up after latest update

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    74 Posts 8 Posters 13.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      The biggest thing that makes me think it's something unique to that install/location is that there are a lot of 3100s running 23.05.1 and if this were common to all 23.0X installs we would be flooded with support tickets.
      It almost has to be some combination of unusual things in that specific setup. Testing either of those units in a different location would confirm that.

      One other thing we could try is using the debug kernel. If there is some issue it might throw some additional errors before it stops responding. I wouldn't really expect to see anything else when it stops though as it logs nothing at all currently.

      Steve

      T 1 Reply Last reply Reply Quote 0
      • T
        tuser11 @stephenw10
        last edited by

        It's been over a month since the last lockup after consistently locking up 2-4 times a month. There was also an un-managed switch that was randomly failing (3 switches downstream of the router and only managing traffic for 2 computers so seemingly unrelated) and it magically stopped failing. I've only been troubleshooting networks for ~9 years and it's not my primary job. I've never seen hardware problems go away on their own while usage remains the same.

        The only things that have happened since the last lockup:

        • Kicked everyone off wifi networks (even separate guest wifi) for a few days after the last failure. The networks are segregated and firewalled but without high confidence in my log analysis, this seemed like a fair step.
        • Publicly made plans in office after getting green light from owner to start locking down network (every device would have to be registered mac and static IP pair before being allowed on network) if problem persisted.
        • Re-allowed everyone to use wifi as usual with the knowledge that the network will eventually be locked down (no more personal devices able to easily get on) to help isolate the mysterious problems if they kept occurring.

        No changes were made, just made plans for next move. All hardware and general usage has remained unchanged and in over a month, not 1 failure.

        Any ideas about how the issues mysteriously went away (or at least haven't happened in ~38 days)?

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Some general power issue maybe?

          T N 2 Replies Last reply Reply Quote 0
          • T
            tuser11 @stephenw10
            last edited by tuser11

            @stephenw10 Not sure since power hasn't changed to my knowledge. And my hopes would be that power issues at that scale would also affect other equipment. For example the issue with the switch was only going on for about a month where as the router issues date back to before the summer.
            My initial hunch was power user or script kiddie based on the environment and employee history.

            1 Reply Last reply Reply Quote 0
            • N
              netplumbers @stephenw10
              last edited by

              @stephenw10
              I continue to have similar issues to this poster that seemed to be improved but not eliminated with a replaced power supply. I think there is something more going on. However, with the end of support for the sg-3100 it may time to shift platforms.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Potentially could be some DoS attack. Something that could make a switch appear to be failing could certainly affect a 3100. Though I'd expect that to requite a volume of traffic that could not go unnoticed.

                T 1 Reply Last reply Reply Quote 0
                • T
                  tuser11 @stephenw10
                  last edited by

                  I've been trying to settle on better monitoring tools to make monitoring, keeping history and alerting on condition changes in traffic easier. If you have suggestions i'm all ears. Security Onion is proving to be too much for someone who doesn't have enough time to figure out the appropriate configurations to be sure i'm properly monitoring. ntopng randomly fails. My graylog server for traffic from the firewall never shows any significant changes but that's no confirmation that there isn't a problem.

                  Snort felt like a bit of a waste considering i would get alerts about issues but couldn't confirm if there really was a problem (erroneous ftp activity alerts when ftp activity itself seems valid). I may go back to focusing on snort or surricata for intrusion detection but still need a good solution for monitoring abnormal network usage.

                  For power management, everything is on APC UPS. Most of the server equipment is on more sensitive versions of APC (pure sine instead of step). I don't think i'll get access to anything better to help protect against building power issues.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Mmm, if it was either power or DoS it would have to be really obscure to not show up there. I'm not sure what else might cause this though.

                    N 1 Reply Last reply Reply Quote 0
                    • N
                      netplumbers @stephenw10
                      last edited by

                      @stephenw10
                      Indeed - part of the problem with this (assuming I have the same issue as the OP) is that there are no logs indicating anything, even on the console. I do have a security onion installation capturing logs from the sg-3100 and monitoring all traffic on the inside interface and it indicated nothing interesting around the time of the failures. I had another lockup just this week. And, power is solid here with UPS, other devices sharing the same UPS feed that are happy and other power monitoring that showed no changes in voltage or frequency around the failures.

                      S 1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Potentially it could be some hardware failure common to those devices but if it is it's not something we've seen across more. And the fact it only seemed o happen after upgrading implies ether software or something in hardware the upgrade started hitting. But again I'd expect to see it far more widespread if so. It could be a combination of uncommon config and hardware.

                        1 Reply Last reply Reply Quote 0
                        • S
                          SteveITS Galactic Empire @netplumbers
                          last edited by

                          @netplumbers do you have (not service) Watchdog enabled or disabled? (https://docs.netgate.com/pfsense/en/latest/config/advanced-misc.html#watchdog) It may be completely unrelated but in the past couple years we’ve had I think 3-4 incidents of client 3100 routers rebooting for no apparent reason. One was twice, and after the second time I turned off the RAM disk to better capture any logs but it hasn’t happened in the year or so after that. Several other 3100s without issue.

                          Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                          When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                          Upvote 👍 helpful posts!

                          N S 2 Replies Last reply Reply Quote 0
                          • N
                            netplumbers @SteveITS
                            last edited by

                            @SteveITS said in SG3100 keeps locking up after latest update:

                            https://docs.netgate.com/pfsense/en/latest/config/advanced-misc.html#watchdog

                            I don't want to hijack this thread but, yes, watchdog is enabled with a 128s timeout. When I experience this lockup, the box reboots sometimes and doesn't others. I was experiencing this with an increasingly high frequency getting to every couple of days before replacing the 3100's power supply under TAC's advise. Now I experience the issue every few months. I think I had 49 days of uptime until it hard locked a few days ago and required a power cycle to return to service. Perhaps if I waited longer the watchdog would have rebooted the box.

                            S 1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Mmm, it's an interesting question. If the watchdog is enabled then it should have rebooted itself rather than locked up requiring manual intervention. The fact it didn't implies at the watchdog process is still running. Or that the hardware suffered something so low level that even the watchdog stopped, which seems very unlikely.

                              1 Reply Last reply Reply Quote 0
                              • S
                                SteveITS Galactic Empire @netplumbers
                                last edited by

                                @netplumbers, @tuser11 Just in case you were unaware, FreeBSD 15 will remove 32 bit ARM support so the 3100s will eventually need replacing anyway. Not really a "solution" to the problem at hand, but it presumably won't follow to new hardware...

                                Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                Upvote 👍 helpful posts!

                                N 1 Reply Last reply Reply Quote 1
                                • N
                                  netplumbers @SteveITS
                                  last edited by

                                  @SteveITS said in SG3100 keeps locking up after latest update:

                                  @netplumbers, @tuser11 Just in case you were unaware, FreeBSD 15 will remove 32 bit ARM support so the 3100s will eventually need replacing anyway. Not really a "solution" to the problem at hand, but it presumably won't follow to new hardware...

                                  Yes, I'm waiting on the end state of the home-lab fallout to decide what/when to replace it. I was about to replace the 3100 before this announcement.

                                  1 Reply Last reply Reply Quote 0
                                  • T
                                    tuser11
                                    last edited by

                                    How do you all deal with e-waste? I've been holding off on upgrading because we have 2 units. I always buy 2 as it's cheaper to have 1 on standby to swap out immediately than to deal with troubleshooting hardware failure in production. I guess it's technically the same problem we all face with phones, laptops, server hard drives, etc. It's not totally related to this thread but it's part of my hangup when getting ready to buy new equipment. I've started evaluating/testing moving our local servers from the Dell racks to ATX boxes or rack mountable ATX so we don't have so much waste when we just need a CPU/etc upgrade.

                                    In this case, just the SG-3100 cpu needs to be tossed but instead we'll have to toss the whole thing. That's 2 units to the garbage. I don't really want to go back to virtualization of pfsense and Netgate doesn't seem to sell any evergreen appliances where just the appropriate components can be upgraded.

                                    S 1 Reply Last reply Reply Quote 0
                                    • S
                                      SteveITS Galactic Empire @tuser11
                                      last edited by

                                      @tuser11 said in SG3100 keeps locking up after latest update:

                                      e-waste

                                      AS an MSP we recover many PCs that clients replace. We have an arrangement with a local e-waste recycler...we are a drop-off location for them, and they will come to our office for a pick-up occasionally. Since they charge a fee for some items, we get a small percentage of that, and probably break even with our time.

                                      Very few devices can just be upgraded to "current"...CPU sockets, memory sockets, drive tech, etc. all change frequently, so trying to upgrade something in a 7 year old PC/device would basically just be trying to find a replacement part from 5-7 years ago. By the time one replaces a motherboard/CPU/RAM/drive it would have been better to just start over. I suppose in that sense virtualization is a big way to eliminate e-waste since only the host hardware needs to be swapped out.

                                      Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                      When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                      Upvote 👍 helpful posts!

                                      1 Reply Last reply Reply Quote 1
                                      • S
                                        SteveITS Galactic Empire @SteveITS
                                        last edited by SteveITS

                                        @SteveITS said in SG3100 keeps locking up after latest update:

                                        we’ve had I think 3-4 incidents of client 3100 routers rebooting for no apparent reason

                                        One happened today, I think the first time for this client. I haven't asked but I doubt anyone was there at 7:34 am when it booted:

                                        Dec 1 07:34:20 	kernel 		Copyright (c) 1992-2023 The FreeBSD Project.
                                        Dec 1 07:34:20 	kernel 		KDB: current backend: ddb
                                        Dec 1 07:34:20 	kernel 		KDB: debugger backends: ddb gdb
                                        Dec 1 07:34:20 	kernel 		GDB: current port: uart
                                        Dec 1 07:34:20 	kernel 		GDB: debug ports: uart
                                        Dec 1 07:34:20 	kernel 		---<<BOOT>>---
                                        Dec 1 07:34:20 	syslogd 		kernel boot file is /boot/kernel/kernel
                                        Dec 1 07:30:07 	sshd 	60046 	banner exchange: Connection from 192.168.16.5 port 63668: invalid format
                                        Dec 1 07:30:07 	sshd 	60046 	error: Fssh_kex_exchange_identification: client sent invalid protocol identifier " "
                                        Dec 1 06:21:43 	kernel 		mvneta1: promiscuous mode enabled
                                        Dec 1 06:21:33 	php-cgi 	58330 	[Suricata] The Rules update has finished.
                                        

                                        The 7:30 entry is a network probe/scan, and benign.

                                        Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                        When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                        Upvote 👍 helpful posts!

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Hmm, does that scan take longer than 5 mins?

                                          S 1 Reply Last reply Reply Quote 0
                                          • S
                                            SteveITS Galactic Empire @stephenw10
                                            last edited by

                                            @stephenw10 It's a port scan to find new PCs/devices on the network. So the computer doing it will take a while to get through the subnet but it shouldn't spend very long on each IP address.

                                            Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                            When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                            Upvote 👍 helpful posts!

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.