Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login
    Introducing Netgate Nexus: Multi-Instance Management at Your Fingertips.

    Netgate 4200 freeze and a possible fix

    Scheduled Pinned Locked Moved Hardware
    27 Posts 4 Posters 1.2k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S Offline
      stephenw10 Netgate Administrator
      last edited by

      Yes, if the ISP device is connected directly it will cause errors if it reboots and loses link. But it would also show specific link state change logs.

      1 Reply Last reply Reply Quote 0
      • B Offline
        belajasmert
        last edited by

        Looking at timestamps of events on client devices, I can see that at 00:08 and 04:16 on the 9th Feb a large number of IoT devices lost connectivity.

        So, whatever happened, happened sometime after midnight but there is nothing on any logs near those times. All quiet and normal, and those sendto errors are almost 100% caused by me power cycling the ISP device.

        So, I'll wait and see if this repeats and then maybe disable the only hardware offloading setting not yet disabled.

        1 Reply Last reply Reply Quote 0
        • stephenw10S Offline
          stephenw10 Netgate Administrator
          last edited by

          Hmm, hard to imagine how it could be anything on the firewall with nothing logged at all. 🤔

          Do you have multiple gateways defined? Do you have the system default gateway still set to automatic?
          It may be defaulting to something invalid. Though that would also be logged.

          B 1 Reply Last reply Reply Quote 0
          • B Offline
            belajasmert @stephenw10
            last edited by

            @stephenw10 There are only two gateways, IPv4 + v6. Default gateways are set to automatic. I think those were created automatically and I have not changed the configuration.

            I'm also not using IPv6 at all, it has been disabled.

            B 1 Reply Last reply Reply Quote 0
            • B Offline
              belajasmert @belajasmert
              last edited by

              I guess I could do a bit of cleanup by setting the ipv6 gateway to "none" and also mark it disabled

              1 Reply Last reply Reply Quote 0
              • stephenw10S Offline
                stephenw10 Netgate Administrator
                last edited by

                Yes those are expected and if there are only those it can't be the problem.

                1 Reply Last reply Reply Quote 0
                • B Offline
                  belajasmert
                  last edited by belajasmert

                  Ok, ended up doing nothing to the configuration and now experienced the freeze again - uptime roughly 17 days and 11 hours.

                  Before rebooting I did a lot of searching and noticed one specific issue - the DNS resolver status page would not load at all. Everything else opened up, albeit very slowly. After making sure I had screenshots and logs safely stored away, I tried to restart the DNS resolver service through the UI.

                  The restart never finished and other status pages stopped working as well. After roughly 10 minutes of waiting I power cycled the firewall (ACPI button shutdown first) and everything is now working beautifully again.

                  There are two items that might be related to the issue.

                  1. I am using the firewall to respond to all DNS queries and there is one domain that is never resolved by the service:
                    "Feb 26 21:25:54 filterdns 62690 failed to resolve host xx.yyyyyy.zzzz will retry later again."
                    There is nothing that I can see that is problematic in the DNS log otherwise.

                  2. My PC is directly connected to igc2 LAN port. This results in dpinger sig 15 restarts and other things like link changes. Again, this problem was noticed after my PC wokeup from standby, so I am wondering if this might be related as well.

                  1 Reply Last reply Reply Quote 0
                  • B Offline
                    belajasmert
                    last edited by

                    Did some digging and found these:
                    https://forum.netgate.com/topic/161400/unbound-stops-listening-on-interface
                    https://redmine.pfsense.org/issues/11547

                    I am starting to suspect that the LAN flapping because of the PC sleeping / power on / power off might be the origin of the problem.

                    These are the timestamps for LAN port flapping and correspond to the timing of the firewall freeze:
                    19:59:13 DOWN
                    19:59:19 UP
                    20:01:20 DOWN
                    20:01:22 UP
                    20:01:31 DOWN
                    20:01:34 UP

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Offline
                      stephenw10 Netgate Administrator
                      last edited by

                      Neither a failing filterdns entry nor an interface changing state should really be a problem.

                      The NIC re-linking when your client wakes will trigger bunch of processes but that would only cause a temporary delay if anything.

                      If either of those did cause a problem I'd expect it to log something. Do you see the Unbound service restarting in the logs when you lost access?

                      Do you see anything logged at the time it failed? Services failing spontaneously like that can be a sign of a failing drive. pfSense will keep running but anything that tries to read or write will fail so you end up with services slowly failing. However since logs cannot be written that is a pretty clear indication.

                      1 Reply Last reply Reply Quote 0
                      • B Offline
                        belajasmert
                        last edited by

                        These are the unbound events:
                        19:59:14 — Unbound stopped
                        “Feb 26 19:59:14 unbound … info: service stopped (unbound 1.24.2).”
                        19:59:19 — Unbound started
                        “Feb 26 19:59:19 unbound … info: start of service (unbound 1.24.2).”
                        19:59:22 — Unbound stopped
                        “Feb 26 19:59:22 unbound … info: service stopped (unbound 1.24.2).”
                        19:59:22 — Unbound started
                        “Feb 26 19:59:22 unbound … info: start of service (unbound 1.24.2).”

                        These correspond with the time I started up the PC and the subsequent LAN port flapping. Other logs show only entries that are caused by LAN flapping - nothing that looks like problematic and are always present when the PC starts up.

                        The domain that was failing to resolve was listed in an alias, removed that as unnecessary anyway and that should cleanup the DNS resolver log a bit.

                        The Netgate device itself is new, but I don't know if there is something wrong with this model. The first device I got was DoA. Can I use the SMART status or something else to verify that drive is working as it should?

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Offline
                          stephenw10 Netgate Administrator
                          last edited by

                          Yes on a 4200-max you can use the SMART data to see any drive errors. And I'd certainly expect to see errors there if it were failing.

                          B 1 Reply Last reply Reply Quote 0
                          • B Offline
                            belajasmert @stephenw10
                            last edited by

                            @stephenw10 There are no errors on the log:
                            Logs
                            smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
                            Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

                            === START OF SMART DATA SECTION ===
                            Error Information (NVMe Log 0x01, 16 of 64 entries)
                            No Errors Logged

                            The tests (short / long) both give this when you try to run them:
                            "Test Results
                            smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
                            Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

                            Self-tests not supported"

                            And if you check the NVMe log you can see this:
                            "Logs
                            smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
                            Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

                            =======> INVALID ARGUMENT TO -l: nvmelog
                            =======> VALID ARGUMENTS ARE: error, selftest, selective, directory[,g|s], xerror[,N][,error], xselftest[,N][,selftest], background, sasphy[,reset], sataphy[,reset], scttemp[sts,hist], scttempint,N[,p], scterc[,N,M][,p|reset], devstat[,N], defects[,N], ssd, gplog,N[,RANGE], smartlog,N[,RANGE], nvmelog,N,SIZE, tapedevstat, zdevstat, envrep, farm <=======

                            Use smartctl -h to get a usage summary"

                            Looking at the all SMART sata, there is only one item indicating any issue:
                            "Information
                            smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
                            Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

                            === START OF INFORMATION SECTION ===
                            Model Number: TS128GMTE460T-SIL
                            Serial Number: J279400393
                            Firmware Version: V0804A3
                            PCI Vendor/Subsystem ID: 0x1d79
                            IEEE OUI Identifier: 0x7c3548
                            Controller ID: 1
                            NVMe Version: 1.3
                            Number of Namespaces: 1
                            Namespace 1 Size/Capacity: 128,035,676,160 [128 GB]
                            Namespace 1 Formatted LBA Size: 512
                            Namespace 1 IEEE EUI-64: 7c3548 5264b333c9
                            Local Time is: Sat Feb 28 14:58:26 2026 EET
                            Firmware Updates (0x12): 1 Slot, no Reset required
                            Optional Admin Commands (0x0007): Security Format Frmw_DL
                            Optional NVM Commands (0x0015): Comp DS_Mngmt Sav/Sel_Feat
                            Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg
                            Maximum Data Transfer Size: 64 Pages
                            Warning Comp. Temp. Threshold: 85 Celsius
                            Critical Comp. Temp. Threshold: 90 Celsius

                            Supported Power States
                            St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
                            0 + 6.00W - - 0 0 0 0 15000 0
                            1 + 3.00W - - 1 1 1 1 15000 0
                            2 + 1.50W - - 2 2 2 2 15000 0
                            3 - 0.0450W - - 3 3 3 3 15000 15000
                            4 - 0.0040W - - 4 4 4 4 25000 25000

                            Supported LBA Sizes (NSID 0x1)
                            Id Fmt Data Metadt Rel_Perf
                            0 + 512 0 0

                            === START OF SMART DATA SECTION ===
                            SMART overall-health self-assessment test result: PASSED

                            SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
                            Critical Warning: 0x00
                            Temperature: 52 Celsius
                            Available Spare: 100%
                            Available Spare Threshold: 10%
                            Percentage Used: 0%
                            Data Units Read: 13,327 [6.82 GB]
                            Data Units Written: 842,330 [431 GB]
                            Host Read Commands: 115,375
                            Host Write Commands: 31,129,084
                            Controller Busy Time: 71
                            Power Cycles: 33
                            Power On Hours: 971
                            Unsafe Shutdowns: 12
                            Media and Data Integrity Errors: 0
                            Error Information Log Entries: 0
                            Warning Comp. Temperature Time: 0
                            Critical Comp. Temperature Time: 0

                            Error Information (NVMe Log 0x01, 16 of 64 entries)
                            No Errors Logged

                            Self-tests not supported"

                            1 Reply Last reply Reply Quote 0
                            • B Offline
                              belajasmert
                              last edited by

                              Information
                              smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
                              Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

                              === START OF SMART DATA SECTION ===
                              SMART overall-health self-assessment test result: PASSED

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S Offline
                                stephenw10 Netgate Administrator
                                last edited by

                                Yup, that looks fine. Doesn't look like a drive issue.

                                B 1 Reply Last reply Reply Quote 0
                                • B Offline
                                  belajasmert @stephenw10
                                  last edited by belajasmert

                                  @stephenw10 Ok, just had another freeze today. Based on all the evidence from all the freezes:

                                  • PC directly connected to a port (no switch)
                                  • They always coincide with the PC power on/off (power save)
                                  • LAN flapping during the power on
                                  • A lot of dnsfilter reloads etc. as the firewall is the DNS provider through DNS redirect
                                  • Hardware (storage) looks ok
                                  • There are logs written during freeze (separately checked from today's incident)
                                  • I have a habit on keeping dozens of tabs open on browser (so a lot of DNS queries immediately after LAN flap) and browser if often left open (power save -> LAN flap)
                                  • Previous bugs that were related to PC <-> port direct connect

                                  I'll most likely get a new switch, drop it in between PC and firewall -> expectation that issues get resolved. And I have a couple of devices that might be hooking up to the new switch anyway.

                                  Then we'll see if problems go away.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Yup, good test to confirm it. Still surprising it actually stops Unbound though....

                                    B 1 Reply Last reply Reply Quote 0
                                    • B Offline
                                      belajasmert @stephenw10
                                      last edited by

                                      @stephenw10 Switch now in place and as expected the system & DNS resolver logs are really quiet. If I manage to run for 30 days without freezes and without changing anything else (configuration / my own behaviour) it will be a strong indicator of somekind of issue.

                                      GertjanG 1 Reply Last reply Reply Quote 1
                                      • GertjanG Offline
                                        Gertjan @belajasmert
                                        last edited by

                                        @belajasmert said in Netgate 4200 freeze and a possible fix:

                                        Switch now in place and as expected the system & DNS resolver logs are really quiet

                                        👍

                                        This - the LAN interface events :

                                        19:59:13 DOWN
                                        19:59:19 UP
                                        20:01:20 DOWN
                                        20:01:22 UP
                                        20:01:31 DOWN
                                        20:01:34 UP
                                        

                                        will also trigger other events, like the restart (!) of processes that use this (LAN) interface :
                                        The pfSense WebGUI, (nginx), the resolver (unbound), you found that one already, and more, check the main system log for what happens when an interface goes down and up.

                                        The solution : you've found it : use a switch.

                                        And you can do even better : the upstream WAN device, an ISP router or modem, pfSense itself, and the downstream LAN switch(es), as these are normally all close to each other, hook them up to the same power strip, and use an UPS.

                                        No "help me" PM's please. Use the forum, the community will thank you.

                                        1 Reply Last reply Reply Quote 1
                                        • B Offline
                                          belajasmert
                                          last edited by

                                          Just checking in to confirm that everything is still running smoothly without any issues.

                                          I did end up setting up additional services like Avahi but all freezes are history.

                                          1 Reply Last reply Reply Quote 1
                                          • First post
                                            Last post
                                          Copyright 2026 Rubicon Communications LLC (Netgate). All rights reserved.