Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-3100 hangs every 1-2 days

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    7 Posts 3 Posters 528 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • V
      victorhooi
      last edited by

      Hi,

      I have several Netgate SG-3100 units.

      One of those units appears to periodically lock-up or hang every 1-2 days, which is quite strange..

      It is running 2.5.0-devel (as are several other units), but I'm not clear if it's a hardware issue, or a software issue. I have done updates to the latest DEV builds, and the issue persists.

      Symptoms:

      1. All LAN hosts lose connectivity.
      2. Unable to connect to the pfSense Web UI.
      3. If I connect via the Mini-USB console plug, the unit appears unresponsive.
      4. The front lights are flashing sequentially in blue, and the rear LAN activity are flashing.

      I have excerpted /var/log/system.log below, the router appears to have hung sometimes around 02:54 hrs - then we rebooted it at 06:10 (pulling the power plug and reinserting).

      Oct 15 02:42:59 foobarrouter sshguard[90086]: Attack from "167.99.131.243" on service SSH with danger 2.
      Oct 15 02:47:05 foobarrouter sshd[41617]: Did not receive identification string from 222.161.223.147 port 43260
      Oct 15 02:47:05 foobarrouter sshguard[90086]: Attack from "222.161.223.147" on service SSH with danger 10.
      Oct 15 02:48:30 foobarrouter sshd[16776]: Unable to negotiate with 218.92.0.185 port 37838: no matching key exchange method found. Their offer: diffie-hellman-group1-sha1,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha1 [preauth]
      Oct 15 02:48:30 foobarrouter sshguard[90086]: Attack from "218.92.0.185" on service SSH with danger 10.
      Oct 15 02:50:46 foobarrouter sshd[93530]: Connection closed by 51.210.14.124 port 53144 [preauth]
      Oct 15 02:50:46 foobarrouter sshguard[90086]: Attack from "51.210.14.124" on service SSH with danger 2.
      Oct 15 02:52:20 foobarrouter sshd[33215]: Connection closed by 167.172.52.225 port 42462 [preauth]
      Oct 15 02:52:20 foobarrouter sshguard[90086]: Attack from "167.172.52.225" on service SSH with danger 2.
      Oct 15 02:53:05 foobarrouter sshd[69645]: Unable to negotiate with 122.194.229.122 port 59920: no matching key exchange method found. Their offer: diffie-hellman-group1-sha1,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha1 [preauth]
      Oct 15 02:53:05 foobarrouter sshguard[90086]: Attack from "122.194.229.122" on service SSH with danger 10.
      Oct 15 02:54:14 foobarrouter rc.gateway_alarm[7799]: >>> Gateway alarm: OPT8GW (Addr:110.175.245.125 Alarm:1 RTT:14.945ms RTTsd:4.263ms Loss:22%)
      Oct 15 02:54:14 foobarrouter check_reload_status[386]: updating dyndns OPT8GW
      Oct 15 02:54:14 foobarrouter check_reload_status[386]: Restarting ipsec tunnels
      Oct 15 02:54:14 foobarrouter check_reload_status[386]: Restarting OpenVPN tunnels/interfaces
      Oct 15 02:54:14 foobarrouter check_reload_status[386]: Reloading filter
      Oct 15 06:20:02 foobarrouter syslogd: kernel boot file is /boot/kernel/kernel
      Oct 15 06:20:02 foobarrouter kernel: ---<<BOOT>>---
      Oct 15 06:20:02 foobarrouter kernel: Copyright (c) 1992-2020 The FreeBSD Project.
      Oct 15 06:20:02 foobarrouter kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
      Oct 15 06:20:02 foobarrouter kernel:         The Regents of the University of California. All rights reserved.
      Oct 15 06:20:02 foobarrouter kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
      Oct 15 06:20:02 foobarrouter kernel: FreeBSD 12.2-STABLE 56359d090cf(factory-devel-12) pfSense-SG-3100 arm
      Oct 15 06:20:02 foobarrouter kernel: FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
      Oct 15 06:20:02 foobarrouter kernel: CPU: ARM Cortex-A9 r4p1 (ECO: 0x00000000)
      Oct 15 06:20:02 foobarrouter kernel: CPU Features: 
      
      GertjanG 1 Reply Last reply Reply Quote 0
      • GertjanG
        Gertjan @victorhooi
        last edited by Gertjan

        @victorhooi said in SG-3100 hangs every 1-2 days:

        (pulling the power plug and reinserting)

        This step is advisable.

        edit : and give sshguard a break :remove all firewall "SSH" rules on WAN, only an incoming VPN rule should be there.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Connect the serial console to something and log that if you can. That will show if something is throwing an error and causing the reboot.

          Do you have the watchdog enabled in System > Advanced > Misc?
          If you disable it does it just hang rather than reboot? That might show something at the console if you can't log all it's output.

          And, yeah, you should lock down the WAN from all those drive-by SSH attempts!

          Steve

          1 Reply Last reply Reply Quote 0
          • V
            victorhooi
            last edited by

            I ran a filesystem check per the video link, and online documentation. Output seems to be clean from what I can see:

            Enter full pathname of shell or RETURN for /bin/sh: 
            # fsck -fy /
            ** /dev/diskid/DISK-CEF032182700245s2a
            ** Last Mounted on /
            ** Root file system
            ** Phase 1 - Check Blocks and Sizes
            ** Phase 2 - Check Pathnames
            ** Phase 3 - Check Connectivity
            ** Phase 4 - Check Reference Counts
            ** Phase 5 - Check Cyl groups
            SUMMARY INFORMATION BAD
            SALVAGE? yes
            
            FREE BLK COUNT(S) WRONG IN SUPERBLK
            SALVAGE? yes
            
            BLK(S) MISSING IN BIT MAPS
            SALVAGE? yes
            
            33074 files, 440466 used, 6993345 free (26281 frags, 870883 blocks, 0.4% fragmentation)
            
            ***** FILE SYSTEM IS CLEAN *****
            
            ***** FILE SYSTEM WAS MODIFIED *****
            # fsck -fy /
            ** /dev/diskid/DISK-CEF032182700245s2a
            ** Last Mounted on /
            ** Root file system
            ** Phase 1 - Check Blocks and Sizes
            ** Phase 2 - Check Pathnames
            ** Phase 3 - Check Connectivity
            ** Phase 4 - Check Reference Counts
            ** Phase 5 - Check Cyl groups
            33074 files, 440466 used, 6993345 free (26281 frags, 870883 blocks, 0.4% fragmentation)
            
            ***** FILE SYSTEM IS CLEAN *****
            # fsck -fy /
            ** /dev/diskid/DISK-CEF032182700245s2a
            ** Last Mounted on /
            ** Root file system
            ** Phase 1 - Check Blocks and Sizes
            ** Phase 2 - Check Pathnames
            ** Phase 3 - Check Connectivity
            ** Phase 4 - Check Reference Counts
            ** Phase 5 - Check Cyl groups
            33074 files, 440466 used, 6993345 free (26281 frags, 870883 blocks, 0.4% fragmentation)
            
            ***** FILE SYSTEM IS CLEAN *****
            # fsck -fy /
            ** /dev/diskid/DISK-CEF032182700245s2a
            ** Last Mounted on /
            ** Root file system
            ** Phase 1 - Check Blocks and Sizes
            ** Phase 2 - Check Pathnames
            ** Phase 3 - Check Connectivity
            ** Phase 4 - Check Reference Counts
            ** Phase 5 - Check Cyl groups
            33074 files, 440466 used, 6993345 free (26281 frags, 870883 blocks, 0.4% fragmentation)
            
            ***** FILE SYSTEM IS CLEAN *****
            # fsck -fy /
            ** /dev/diskid/DISK-CEF032182700245s2a
            ** Last Mounted on /
            ** Root file system
            ** Phase 1 - Check Blocks and Sizes
            ** Phase 2 - Check Pathnames
            ** Phase 3 - Check Connectivity
            ** Phase 4 - Check Reference Counts
            ** Phase 5 - Check Cyl groups
            33074 files, 440466 used, 6993345 free (26281 frags, 870883 blocks, 0.4% fragmentation)
            
            ***** FILE SYSTEM IS CLEAN *****
            

            I have checked - I do have the watchguard enabled, and set to 128 seconds. (This appears to be default, as I don't believe I've ever changed this setting):

            dddd3c14-d003-487e-8479-1c6da8beb790-image.png

            When the issue occurs, the SG-3100 doesn't reboot - it simply hangs, the activity lights keep flashing, but there's no longer any internet connectivity for LAN hosts, and you can't SSH or connect to the web interface.

            I've plugged it into a console server, to log the serial output - it doesn't seem to log anything when the issue happens. The next line after the pfSense console prompt is I believe from the startup (after we've power-cycled the unit):

             WIFI (opt9)     -> mvneta1.65 -> v4: 10.7.65.1/23
             PRINTERS (opt10) -> mvneta1.148 -> v4: 10.7.148.1/24
             HIKVISION_DEFAULT (opt11) -> mvneta1.99 -> v4: 192.168.1.1/24
             0) Logout (SSH only)                  9) pfTop
             1) Assign Interfaces                 10) Filter Logs
             2) Set interface(s) IP address       11) Restart webConfigurator
             3) Reset webConfigurator password    12) PHP shell + pfSense tools
             4) Reset to factory defaults         13) Update from console
             5) Reboot system                     14) Disable Secure Shell (sshd)
             6) Halt system                       15) Restore recent configuration
             7) Ping host                         16) Restart PHP-FPM
             8) Shell
            Enter an option:
            General initialization - Version: 1.0.0
            AVS selection from EFUSE disabled (Skip reading EFUSE values)
            Overriding default AVS value to: 0x23
            Detected Device ID 6820
            High speed PHY - Version: 2.0
            Init Customer board board SerDes lanes topology details:
             | Lane # | Speed|    Type     |
             ------------------------------|
             |   0    |  3   |  SATA0      |
             |   1    |  5   |  PCIe0      |
             |   2    |  3   |  SATA1      |
            

            Is it an issue with the unit or something else?

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              With no output at all and the console itself unresponsive it does start to look more like a hardware issue. Except that you are running a 2.5 snapshot so you might be hitting a hard software lock somehow.

              I would re-install 2.4.5p1 as a next step to test that.

              Steve

              1 Reply Last reply Reply Quote 0
              • V
                victorhooi
                last edited by

                Hmm, the issue is still happening on 2.4.5p1 😲. Seems slightly less often (every 2 days or so - but that may just be coincidence).

                If it is hardware - is this repairable at all? Or is there anything Netgate can do?

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Open a ticket with the device details: https://go.netgate.com/

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.