Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Netgate 2100 Stalling - HW issue?

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    16 Posts 4 Posters 671 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Yup, the other monitoring graphs may show something there; a spike in CPU usage or states perhaps. Or a gap in data would also be telling.

      This doesn't look like a hardware issue to me. Or at least if it is it's unlike any hardware issue I've seen before!

      Are you running from eMMC or SSD?

      1 Reply Last reply Reply Quote 0
      • S
        sammiorelli
        last edited by

        See attached. To be honest I'm not seeing any hints here. I took some time to really try to quantify exactly how long the stalls are. I think my 4 minute upward bound estimate previously was just bad luck. Watching carefully over about a half hour, the longest stall was 97s, shortest was 4s, average was about 31s.

        Since the minimum resolution of these graphs is 1m, I think it ends up smoothing over these stall durations.

        It's running on the eMMC that came with the device.

        mbuf and states.png memory processor.png

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Other metric in the monitoring graphs may show something.

          The fact the traffic graphs appear to stop updating rather than go to zero implies something significant is happening. Either the RRD-update process stops or it's unable to get the data. Since it looks like other graphs continue it seems more like it can't get data. I'm surprised there is nothing in the system log at that time. It really feels like pf stops responding.

          S 1 Reply Last reply Reply Quote 0
          • S
            sammiorelli @stephenw10
            last edited by

            So just to be clear, here's the entirety of the logs in System Logs / System / General for this afternoon while this behavior was ongoing. Logs are the same earlier in the morning when I was doing the screenshots of the performance metrics. Just a bunch of the sshguard spam which seems like a known benign issue (https://forum.netgate.com/topic/169923/tons-sshguard-log-entries-and-its-not-enabled/14).

            But also, the frequency of the sshguard log entries is much less than the drop-outs, which happen every 1-2 minutes while the sshguard entries are approximately every 24 minutes.

            I agree with the instinct that this is pf hanging because active connections are never affected during the outage, but zero new connections can be established during the outage time regardless of device. I've also confirmed that during the outage, the WebUI of pfSense itself is also stuck. No clicks to other screens etc work until the outage clears. But I'm at a loss for how to investigate further since the logs are so silent on this topic.

            afternoon logs.png

            w0wW 1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Yes that sshguard restart is usually just log spam and not important. However we have seen issues where the log compression can put significant loading on the firewall. The fact sshguard is restarting implies the logs are rotating every ~20mins. You should check which log is filling and rotating. And I would disable log compression at least as a test. That's in Status > System Logs > Settings.

              1 Reply Last reply Reply Quote 0
              • w0wW
                w0w @sammiorelli
                last edited by

                @sammiorelli
                https://docs.netgate.com/pfsense/troubleshooting/disk-lifetime.html
                Check the eMMC status, just to be sure it is OK and not the root cause.

                1 Reply Last reply Reply Quote 0
                • S
                  sammiorelli
                  last edited by

                  It looks like the default checks to log blocked traffic were putting a lot of logs in the Firewall logs so I turned those off. Confirmed that compression was in the default "none" configuration.

                  What I'm really struggling with on all of this is we're now dealing with a factory-default device. I reflashed it and did not restore my backup and the behavior is unchanged. This feels like a glaring red flag to me.

                  Also checked the eMMC and looks like it's healthy with 0-10% of life consumed and Pre-EOL of Normal. Full report below.

                  =============================================
                  Extended CSD rev 1.8 (MMC 5.1)

                  Card Supported Command sets [S_CMD_SET: 0x01]
                  HPI Features [HPI_FEATURE: 0x01]: implementation based on CMD13
                  Background operations support [BKOPS_SUPPORT: 0x01]
                  Max Packet Read Cmd [MAX_PACKED_READS: 0x3f]
                  Max Packet Write Cmd [MAX_PACKED_WRITES: 0x3f]
                  Data TAG support [DATA_TAG_SUPPORT: 0x01]
                  Data TAG Unit Size [TAG_UNIT_SIZE: 0x03]
                  Tag Resources Size [TAG_RES_SIZE: 0x03]
                  Context Management Capabilities [CONTEXT_CAPABILITIES: 0x05]
                  Large Unit Size [LARGE_UNIT_SIZE_M1: 0x00]
                  Extended partition attribute support [EXT_SUPPORT: 0x03]
                  Generic CMD6 Timer [GENERIC_CMD6_TIME: 0x19]
                  Power off notification [POWER_OFF_LONG_TIME: 0x19]
                  Cache Size [CACHE_SIZE] is 512 KiB
                  Background operations status [BKOPS_STATUS: 0x01]
                  1st Initialisation Time after programmed sector [INI_TIMEOUT_AP: 0x5a]
                  Power class for 52MHz, DDR at 3.6V [PWR_CL_DDR_52_360: 0x00]
                  Power class for 52MHz, DDR at 1.95V [PWR_CL_DDR_52_195: 0xdd]
                  Power class for 200MHz at 3.6V [PWR_CL_200_360: 0xdd]
                  Power class for 200MHz, at 1.95V [PWR_CL_200_195: 0x00]
                  Minimum Performance for 8bit at 52MHz in DDR mode:
                  [MIN_PERF_DDR_W_8_52: 0x00]
                  [MIN_PERF_DDR_R_8_52: 0x00]
                  TRIM Multiplier [TRIM_MULT: 0x03]
                  Secure Feature support [SEC_FEATURE_SUPPORT: 0x55]
                  Boot Information [BOOT_INFO: 0x07]
                  Device supports alternative boot method
                  Device supports dual data rate during boot
                  Device supports high speed timing during boot
                  Boot partition size [BOOT_SIZE_MULTI: 0x20]
                  Access size [ACC_SIZE: 0x08]
                  High-capacity erase unit size [HC_ERASE_GRP_SIZE: 0x01]
                  i.e. 512 KiB
                  High-capacity erase timeout [ERASE_TIMEOUT_MULT: 0x03]
                  Reliable write sector count [REL_WR_SEC_C: 0x01]
                  High-capacity W protect group size [HC_WP_GRP_SIZE: 0x10]
                  i.e. 8192 KiB
                  Sleep current (VCC) [S_C_VCC: 0x05]
                  Sleep current (VCCQ) [S_C_VCCQ: 0x07]
                  Sleep/awake timeout [S_A_TIMEOUT: 0x12]
                  Sector Count [SEC_COUNT: 0x00e90e80]
                  Device is block-addressed
                  Minimum Write Performance for 8bit:
                  [MIN_PERF_W_8_52: 0x0a]
                  [MIN_PERF_R_8_52: 0x0a]
                  [MIN_PERF_W_8_26_4_52: 0x0a]
                  [MIN_PERF_R_8_26_4_52: 0x0a]
                  Minimum Write Performance for 4bit:
                  [MIN_PERF_W_4_26: 0x0a]
                  [MIN_PERF_R_4_26: 0x0a]
                  Power classes registers:
                  [PWR_CL_26_360: 0x00]
                  [PWR_CL_52_360: 0x00]
                  [PWR_CL_26_195: 0xdd]
                  [PWR_CL_52_195: 0xdd]
                  Partition switching timing [PARTITION_SWITCH_TIME: 0x03]
                  Out-of-interrupt busy timing [OUT_OF_INTERRUPT_TIME: 0x0a]
                  I/O Driver Strength [DRIVER_STRENGTH: 0x1f]
                  Card Type [CARD_TYPE: 0x57]
                  HS400 Dual Data Rate eMMC @200MHz 1.8VI/O
                  HS200 Single Data Rate eMMC @200MHz 1.8VI/O
                  HS Dual Data Rate eMMC @52MHz 1.8V or 3VI/O
                  HS eMMC @52MHz - at rated device voltage(s)
                  HS eMMC @26MHz - at rated device voltage(s)
                  CSD structure version [CSD_STRUCTURE: 0x02]
                  Command set [CMD_SET: 0x00]
                  Command set revision [CMD_SET_REV: 0x00]
                  Power class [POWER_CLASS: 0x0d]
                  High-speed interface timing [HS_TIMING: 0x01]
                  Enhanced Strobe mode [STROBE_SUPPORT: 0x01]
                  Erased memory content [ERASED_MEM_CONT: 0x00]
                  Boot configuration bytes [PARTITION_CONFIG: 0x03]
                  Not boot enable
                  R/W Replay Protected Memory Block (RPMB)
                  Boot config protection [BOOT_CONFIG_PROT: 0x00]
                  Boot bus Conditions [BOOT_BUS_CONDITIONS: 0x00]
                  High-density erase group definition [ERASE_GROUP_DEF: 0x01]
                  Boot write protection status registers [BOOT_WP_STATUS]: 0x00
                  Boot Area Write protection [BOOT_WP]: 0x00
                  Power ro locking: possible
                  Permanent ro locking: possible
                  partition 0 ro lock status: not locked
                  partition 1 ro lock status: not locked
                  User area write protection register [USER_WP]: 0x00
                  FW configuration [FW_CONFIG]: 0x00
                  RPMB Size [RPMB_SIZE_MULT]: 0x20
                  Write reliability setting register [WR_REL_SET]: 0x1f
                  user area: the device protects existing data if a power failure occurs during a write operation
                  partition 1: the device protects existing data if a power failure occurs during a write operation
                  partition 2: the device protects existing data if a power failure occurs during a write operation
                  partition 3: the device protects existing data if a power failure occurs during a write operation
                  partition 4: the device protects existing data if a power failure occurs during a write operation
                  Write reliability parameter register [WR_REL_PARAM]: 0x15
                  Device supports writing EXT_CSD_WR_REL_SET
                  Device supports the enhanced def. of reliable write
                  Enable background operations handshake [BKOPS_EN]: 0x02
                  H/W reset function [RST_N_FUNCTION]: 0x00
                  HPI management [HPI_MGMT]: 0x00
                  Partitioning Support [PARTITIONING_SUPPORT]: 0x07
                  Device support partitioning feature
                  Device can have enhanced tech.
                  Max Enhanced Area Size [MAX_ENH_SIZE_MULT]: 0x0001b5
                  i.e. 3579904 KiB
                  Partitions attribute [PARTITIONS_ATTRIBUTE]: 0x00
                  Partitioning Setting [PARTITION_SETTING_COMPLETED]: 0x00
                  Device partition setting NOT complete
                  General Purpose Partition Size
                  [GP_SIZE_MULT_4]: 0x000000
                  [GP_SIZE_MULT_3]: 0x000000
                  [GP_SIZE_MULT_2]: 0x000000
                  [GP_SIZE_MULT_1]: 0x000000
                  Enhanced User Data Area Size [ENH_SIZE_MULT]: 0x000000
                  i.e. 0 KiB
                  Enhanced User Data Start Address [ENH_START_ADDR]: 0x00000000
                  i.e. 0 bytes offset
                  Bad Block Management mode [SEC_BAD_BLK_MGMNT]: 0x00
                  Periodic Wake-up [PERIODIC_WAKEUP]: 0x00
                  Program CID/CSD in DDR mode support [PROGRAM_CID_CSD_DDR_SUPPORT]: 0x01
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[127]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[126]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[125]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[124]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[123]]: 0x01
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[122]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[121]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[120]]: 0x01
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[119]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[118]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[117]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[116]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[115]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[114]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[113]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[112]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[111]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[110]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[109]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[108]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[107]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[106]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[105]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[104]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[103]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[102]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[101]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[100]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[99]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[98]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[97]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[96]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[95]]: 0x02
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[94]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[93]]: 0x01
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[92]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[91]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[90]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[89]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[88]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[87]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[86]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[85]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[84]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[83]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[82]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[81]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[80]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[79]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[78]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[77]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[76]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[75]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[74]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[73]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[72]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[71]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[70]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[69]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[68]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[67]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[66]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[65]]: 0x00
                  Vendor Specific Fields [VENDOR_SPECIFIC_FIELD[64]]: 0x00
                  Native sector size [NATIVE_SECTOR_SIZE]: 0x00
                  Sector size emulation [USE_NATIVE_SECTOR]: 0x00
                  Sector size [DATA_SECTOR_SIZE]: 0x00
                  1st initialization after disabling sector size emulation [INI_TIMEOUT_EMU]: 0x0a
                  Class 6 commands control [CLASS_6_CTRL]: 0x00
                  Number of addressed group to be Released[DYNCAP_NEEDED]: 0x00
                  Exception events control [EXCEPTION_EVENTS_CTRL]: 0x0000
                  Exception events status[EXCEPTION_EVENTS_STATUS]: 0x0000
                  Extended Partitions Attribute [EXT_PARTITIONS_ATTRIBUTE]: 0x0000
                  Context configuration [CONTEXT_CONF[51]]: 0x00
                  Context configuration [CONTEXT_CONF[50]]: 0x00
                  Context configuration [CONTEXT_CONF[49]]: 0x00
                  Context configuration [CONTEXT_CONF[48]]: 0x00
                  Context configuration [CONTEXT_CONF[47]]: 0x00
                  Context configuration [CONTEXT_CONF[46]]: 0x00
                  Context configuration [CONTEXT_CONF[45]]: 0x00
                  Context configuration [CONTEXT_CONF[44]]: 0x00
                  Context configuration [CONTEXT_CONF[43]]: 0x00
                  Context configuration [CONTEXT_CONF[42]]: 0x00
                  Context configuration [CONTEXT_CONF[41]]: 0x00
                  Context configuration [CONTEXT_CONF[40]]: 0x00
                  Context configuration [CONTEXT_CONF[39]]: 0x00
                  Context configuration [CONTEXT_CONF[38]]: 0x00
                  Context configuration [CONTEXT_CONF[37]]: 0x00
                  Packed command status [PACKED_COMMAND_STATUS]: 0x00
                  Packed command failure index [PACKED_FAILURE_INDEX]: 0x00
                  Power Off Notification [POWER_OFF_NOTIFICATION]: 0x00
                  Control to turn the Cache ON/OFF [CACHE_CTRL]: 0x01
                  Control to turn the Cache Barrier ON/OFF [BARRIER_CTRL]: 0x00
                  eMMC Firmware Version: 73103517
                  eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
                  eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
                  eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
                  Secure Removal Type [SECURE_REMOVAL_TYPE]: 0x08
                  information is configured to be removed by an erase of the physical memory
                  Supported Secure Removal Type:
                  information removed using a vendor defined
                  Command Queue Support [CMDQ_SUPPORT]: 0x01
                  Command Queue Depth [CMDQ_DEPTH]: 32
                  Command Enabled [CMDQ_MODE_EN]: 0x00

                  1 Reply Last reply Reply Quote 1
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Hmm, I've never seen a hardware issue present like that though. If it's not a config problem it could be an environmental issue, something in the local network causing a connectivity problem. Somehow.

                    S 1 Reply Last reply Reply Quote 0
                    • S
                      sammiorelli @stephenw10
                      last edited by

                      @stephenw10 the prior device that RMA'd with this behavior was ticket INC-96963. Any chance that device was investigated when it came back?

                      w0wW 1 Reply Last reply Reply Quote 0
                      • w0wW
                        w0w @sammiorelli
                        last edited by

                        @sammiorelli
                        Is it possible that you have enabled flow control on the network, for example, on the switch?
                        Did you try to continuously ping pfSense from the PC and vice versa?

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Hmm, that must have been a while ago, we no longer use that ticket system. Do you have the serial number or NDI from it? You can send it to me in chat.

                          Was that 2100 installed in the same location? Same network?

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.