6100 - Stopped passing traffic / Web GUI not accessible
-
@bnetworker said in 6100 - Stopped passing traffic / Web GUI not accessible:
Any ideas how to determine the cause?
Well, you saw the patient, and he wanted to know whast up with him.
You gave him the lethal injection and said : "next".So, maybe some post mortem investigation might shed some lights.
For the next time :
You have the console : use it, and use option 8.
Remember these two words /var/log/
as that is the place where you can find the logs files.
These contain "what happens when when" and are plain text files with timestamps.@bnetworker said in 6100 - Stopped passing traffic / Web GUI not accessible:
by pulling power
pfSense uses a disk with a very resilient file system, but still, don't do that.
It's not a light bulb.
You can use the console, and then 5 or option 6. -
Do you see any gaps in the logs or monitoring graphs? A disk issue can present like that.
-
@stephenw10 said in 6100 - Stopped passing traffic / Web GUI not accessible:
Do you see any gaps in the logs or monitoring graphs? A disk issue can present like that.
Hey @stephenw10 - Yes, the logs do show a long period (looks like issues started about 12:33, then finally fully locked up about 20:50, rebooted short after) where they show 0.00 for all counters:
This is the MAX with the M.2 (P80) 3TE6 (SMART overall-health self-assessment test result: PASSED)
Logs show similar gap:
Apr 17 09:08:00 sshguard 99451 Exiting on signal.
Apr 17 09:08:00 sshguard 9417 Now monitoring attacks.
Apr 17 12:30:00 sshguard 9417 Exiting on signal.
Apr 17 12:30:00 sshguard 51269 Now monitoring attacks.
Apr 17 20:55:55 syslogd kernel boot file is /boot/kernel/kernel
Apr 17 20:55:55 kernel ---<<BOOT>>---
Apr 17 20:55:55 kernel Copyright (c) 1992-2022 The FreeBSD Project -
"Disk full" issues with a 6100 MAX ... only suricata (and snort ? topng ?) users could mange to do that, as some of them didn't image that these packages can create huge log files.
( and only for these users the auto log rotate mechanism is 'broken' )@bnetworker : disks space : all is ok ?
What packages are you suing ?
If I compare your graph with mine, it's identical - it hovers around 310 processes.
pfSense Plus 23.01, on a 4100 MAX, with pfBlockerng doing 'something' - nothing more.Look also at memory used.
That's me trying to load and use very huge pfBlockerng DNSBL feeds. It was not a success story. The system even started to use swap, and that's bad on a firewall.
2 % disk space used.
I wonder why I bought a MAX -
@gertjan - Very little in use. This is the disk usage now.
Very few packages as well:
I did just notice there is a new 03.00.00.03t-uc-18 firmware out there.
-
The new firmware would not affect the disk.
You might check the SMART data.
Unfortunately in that situation often the only place that shows the error would be at the console. If it happens again check the console before resetting it. You would see drive errors there when trying to do anything that tries to read or write from it.
Steve
-
@bnetworker
All that looks fine.One exception though : do yourself a favor, and the Service_Watchdog package.
See it as a nasty dog : yes, it will byte every burglar, if it finds one. If none, it will byte you, the wive, the kinds and even worse : parents-in-law. "Service_Watchdog" is a software development tool (during the 'things don't work well yet' phase)."Service_Watchdog" is bad ... I'm not sure it will lock up a 6100. It might be capable of doing so.
I've never seen pfSense services like unbound, the captive portal etc dying on me for the last 10+ years. Running on own hardware, VM and now a 4100.
The others you've listed : the run or do something at startup, and then they do nothing anymore.
openvpn-client-export : only used when you use the GUI openvpn-client-export to export a ovpn file file.edit : yes, look at the "dmesg" log file ... !
-
@stephenw10 - Smart data shows clean. I'll check the console if it happens again. If the NVME is failing, it it possible to replace with another and reload?
@Gertjan - Service watchdog was to restart OpenVPN, as sometimes I've seen the service stop after a config change, then I'm down remotely till I get home. It's resolved that issue. If it's that horrible and buggy, should we not raise an issue, or is it beyond repair?
-
@bnetworker said in 6100 - Stopped passing traffic / Web GUI not accessible:
If the NVME is failing, it it possible to replace with another and reload?
Yes, that is possible. It's not recommended to open the case normally though as it's easy to damage it doing so. Care is required! If it's in warranty we would replace that for you if needs be.
Steve
-
@stephenw10 said in 6100 - Stopped passing traffic / Web GUI not accessible:
@bnetworker said in 6100 - Stopped passing traffic / Web GUI not accessible:
If the NVME is failing, it it possible to replace with another and reload?
Yes, that is possible. It's not recommended to open the case normally though as it's easy to damage it doing so. Care is required! If it's in warranty we would replace that for you if needs be.
Steve
Understood. I'll keep an eye on it. I'll report back if there are further issues.
-
-
@bnetworker said in 6100 - Stopped passing traffic / Web GUI not accessible:
should we not raise an issue, or is it beyond repair?
It's dumb.
It loops around with a time delay, like */1 as its a cron task, and checks if the pid of the Openvpn server exists.
If the process was commanded to shut down, then the pid will be removed also.
But who gave the shut down command then ?If the process just 'dies', or goes zombie in memory, the pid (file !) still exists. Watchdog still see the file, and does nothing. IMHO : Not a real good indication.
If the process had a bug, and dies 'clean' it will get restarted : to eventually hit the same bug, and die .... etc etc
@bnetworker said in 6100 - Stopped passing traffic / Web GUI not accessible:
beyond repair
If a process dies, have it repaired. Permanent electrocuting it never made anything 'better'. ;)
Btw : Ok, I understand your usage.
I've always an OpenVPN instance running for remote 'admin' access. I never found it 'stopped' because it had 'failed'.Where things go downhill fast, is when it gets used with unbound, and the user also has pfBlockerng installed (of course with many and big dnsbl lists).
unbound can have big startup times, so, if it dies and was revived by the watchdog, and it needs more then one minute to start, it will get restarted while it was already restarting (and didn't write out its pid file yet).
Best situation : the system's DNS is down as unbound never reaches a 'working' state. Worst : the entire system goes downhill fast. -
The service watchdog is a trouble shooting tool and should be seen as such.
There are some situations where you might want to enable it on a service in a more permanent way but even then it's usually to address some underlying bug.
For example if you have an OpenVPN client and want it to be always up you might use the watchdog. But only because if the server side rejects the connection as unauthorized the client will exit and not retry. Most clients would never see that or if they did retrying would not help.Steve