Netgate 4200 freeze and a possible fix
-
Did some digging and found these:
https://forum.netgate.com/topic/161400/unbound-stops-listening-on-interface
https://redmine.pfsense.org/issues/11547I am starting to suspect that the LAN flapping because of the PC sleeping / power on / power off might be the origin of the problem.
These are the timestamps for LAN port flapping and correspond to the timing of the firewall freeze:
19:59:13 DOWN
19:59:19 UP
20:01:20 DOWN
20:01:22 UP
20:01:31 DOWN
20:01:34 UP -
Neither a failing filterdns entry nor an interface changing state should really be a problem.
The NIC re-linking when your client wakes will trigger bunch of processes but that would only cause a temporary delay if anything.
If either of those did cause a problem I'd expect it to log something. Do you see the Unbound service restarting in the logs when you lost access?
Do you see anything logged at the time it failed? Services failing spontaneously like that can be a sign of a failing drive. pfSense will keep running but anything that tries to read or write will fail so you end up with services slowly failing. However since logs cannot be written that is a pretty clear indication.
-
These are the unbound events:
19:59:14 — Unbound stopped
“Feb 26 19:59:14 unbound … info: service stopped (unbound 1.24.2).”
19:59:19 — Unbound started
“Feb 26 19:59:19 unbound … info: start of service (unbound 1.24.2).”
19:59:22 — Unbound stopped
“Feb 26 19:59:22 unbound … info: service stopped (unbound 1.24.2).”
19:59:22 — Unbound started
“Feb 26 19:59:22 unbound … info: start of service (unbound 1.24.2).”These correspond with the time I started up the PC and the subsequent LAN port flapping. Other logs show only entries that are caused by LAN flapping - nothing that looks like problematic and are always present when the PC starts up.
The domain that was failing to resolve was listed in an alias, removed that as unnecessary anyway and that should cleanup the DNS resolver log a bit.
The Netgate device itself is new, but I don't know if there is something wrong with this model. The first device I got was DoA. Can I use the SMART status or something else to verify that drive is working as it should?
-
Yes on a 4200-max you can use the SMART data to see any drive errors. And I'd certainly expect to see errors there if it were failing.
-
@stephenw10 There are no errors on the log:
Logs
smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org=== START OF SMART DATA SECTION ===
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors LoggedThe tests (short / long) both give this when you try to run them:
"Test Results
smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.orgSelf-tests not supported"
And if you check the NVMe log you can see this:
"Logs
smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org=======> INVALID ARGUMENT TO -l: nvmelog
=======> VALID ARGUMENTS ARE: error, selftest, selective, directory[,g|s], xerror[,N][,error], xselftest[,N][,selftest], background, sasphy[,reset], sataphy[,reset], scttemp[sts,hist], scttempint,N[,p], scterc[,N,M][,p|reset], devstat[,N], defects[,N], ssd, gplog,N[,RANGE], smartlog,N[,RANGE], nvmelog,N,SIZE, tapedevstat, zdevstat, envrep, farm <=======Use smartctl -h to get a usage summary"
Looking at the all SMART sata, there is only one item indicating any issue:
"Information
smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org=== START OF INFORMATION SECTION ===
Model Number: TS128GMTE460T-SIL
Serial Number: J279400393
Firmware Version: V0804A3
PCI Vendor/Subsystem ID: 0x1d79
IEEE OUI Identifier: 0x7c3548
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 128,035,676,160 [128 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 7c3548 5264b333c9
Local Time is: Sat Feb 28 14:58:26 2026 EET
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x0015): Comp DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 85 Celsius
Critical Comp. Temp. Threshold: 90 CelsiusSupported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.00W - - 0 0 0 0 15000 0
1 + 3.00W - - 1 1 1 1 15000 0
2 + 1.50W - - 2 2 2 2 15000 0
3 - 0.0450W - - 3 3 3 3 15000 15000
4 - 0.0040W - - 4 4 4 4 25000 25000Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSEDSMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 52 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 13,327 [6.82 GB]
Data Units Written: 842,330 [431 GB]
Host Read Commands: 115,375
Host Write Commands: 31,129,084
Controller Busy Time: 71
Power Cycles: 33
Power On Hours: 971
Unsafe Shutdowns: 12
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors LoggedSelf-tests not supported"
-
Information
smartctl 7.5 2025-04-30 r5714 [FreeBSD 16.0-CURRENT amd64] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED -
Yup, that looks fine. Doesn't look like a drive issue.
-
@stephenw10 Ok, just had another freeze today. Based on all the evidence from all the freezes:
- PC directly connected to a port (no switch)
- They always coincide with the PC power on/off (power save)
- LAN flapping during the power on
- A lot of dnsfilter reloads etc. as the firewall is the DNS provider through DNS redirect
- Hardware (storage) looks ok
- There are logs written during freeze (separately checked from today's incident)
- I have a habit on keeping dozens of tabs open on browser (so a lot of DNS queries immediately after LAN flap) and browser if often left open (power save -> LAN flap)
- Previous bugs that were related to PC <-> port direct connect
I'll most likely get a new switch, drop it in between PC and firewall -> expectation that issues get resolved. And I have a couple of devices that might be hooking up to the new switch anyway.
Then we'll see if problems go away.
-
Yup, good test to confirm it. Still surprising it actually stops Unbound though....
-
@stephenw10 Switch now in place and as expected the system & DNS resolver logs are really quiet. If I manage to run for 30 days without freezes and without changing anything else (configuration / my own behaviour) it will be a strong indicator of somekind of issue.
-
@belajasmert said in Netgate 4200 freeze and a possible fix:
Switch now in place and as expected the system & DNS resolver logs are really quiet

This - the LAN interface events :
19:59:13 DOWN 19:59:19 UP 20:01:20 DOWN 20:01:22 UP 20:01:31 DOWN 20:01:34 UPwill also trigger other events, like the restart (!) of processes that use this (LAN) interface :
The pfSense WebGUI, (nginx), the resolver (unbound), you found that one already, and more, check the main system log for what happens when an interface goes down and up.The solution : you've found it : use a switch.
And you can do even better : the upstream WAN device, an ISP router or modem, pfSense itself, and the downstream LAN switch(es), as these are normally all close to each other, hook them up to the same power strip, and use an UPS.