Suricata won't stop
-
Legacy normally but currently have blocking disabled everywhere to rule it out.
-
@ballistic said in Suricata won't stop:
Legacy normally but currently have blocking disabled everywhere to rule it out.
Okay. That's good.
Are you using any RAM disks?
Have a look at the files in
/var/run
on the firewall by using DIAGNOSTICS > EDIT FILE to browse to that folder and see the files there. Look for all the Suricata files. You should see one file per configured interface. The filename will have the interface name along with a UUID (random unique identifier) number. This UUID is like the serial number for an interface instance of Suricata. The presence or absence of the file (with the.pid
suffix) is how the GUI determines if that Suricata interface instance is running or not running. No file equals "not running" in the logic of the GUI. A present file equals "running" for the GUI.The PID files are simple text. They contain the process ID (PID) of the Suricata instance that started and created the file.
Take an inventory of Suricata PID files in
/var/run
and compare that to the actual running instances you see with this command at a shell prompt:ps -ax | grep suricata
They should all match up. But if something causes a running Suricata instance to abort abnormally, then the PID file is not cleaned up. That will confuse the GUI logic and it will misreport the status of Suricata instances.
-
Thanks. I will check it next time is goes wrong.
If I understand your last sentence correctly, you are referring to Suricata process bieng stopped but shows running in the GUI. It's actually the other way around. Process runs (so pid file should be there) but GUI still thinks it's stopped.
Let's see what happens when the problem shows up again.
-
@ballistic said in Suricata won't stop:
Thanks. I will check it next time is goes wrong.
If I understand your last sentence correctly, you are referring to Suricata process bieng stopped but shows running in the GUI. It's actually the other way around. Process runs (so pid file should be there) but GUI still thinks it's stopped.
Let's see what happens when the problem shows up again.
The absence or presence of the PID file is what the GUI code is working with. So yeah, an inventory the next time the problem presents is going to be helpful.
Is there anything going on that might result in a file getting deleted? Typically the file should be "locked' by the running process and normal deletion not allowed.
At any rate, a screenshot of the content from
var/run
while the issue is present will help me. Also including a screenshot (or the output) from theps -ax | grep suricata
command will be helpful to cross-correlate with the PID files.Oh, and one last thing I just thought of. The GUI code that is examining the PID files is actually triggered by Javascript code running on the local client (so in the browser of the device you are connecting to the firewall GUI with). The Javascript is running a recurring series of Ajax form posts to query the status and update the icons. It's possible there could be problems with that script running in your browser.
-
Rock stable so far. I have re-enabled 12 hour updates to see if that makes it break.
-
@ballistic said in Suricata won't stop:
Rock stable so far. I have re-enabled 12 hour updates to see if that makes it break.
If rule updates wind up being the apparent cause, you might consider switching on the "Live Rule Swap on Update" option on the GLOBAL SETTINGS tab. When that option is enabled, Suricata itself is not restarted. Instead, copy of the updated rules is read into a separate memory area, parsed, and then switched to being active. After that, the previous rule set is removed from memory. The only downside of this feature is a temporary increase in RAM usage during the time two copies of the rules are present in memory as the swap is happening.
But really the only time I've seen rule updates result in duplicate processes is when a user also had the Service Watchdog package installed and monitoring Suricata. In that case, because Service Watchdog simply looks for a Suricata process runnning, when it does not see one, it calls the shell script to restart it. If this happens during the rules update cron task when that task is already restarting Suricata, then two copies can get started on the same interface depending on timing. The "Live Rule Swap" feature works around that because the running Suricata instances are themselves not restarted.
-
Thanks for the info! I do not have the watchdog package installed.
Does the Live rule reload have any impact on the size of the config file?
Last time I enabled Suricata on a 4th interface, my already 92MB config file grew to 120MB+ causing a total crash of the GUI (php), SSH, etc. Traffic was still running but the only way I could recover was to SCP in, switch back to the previous config file and reboot.
I have not looked into the cause of this issue yet but at this moment I don't want it to happen again by enabling the live update function. -
@ballistic said in Suricata won't stop:
Does the Live rule reload have any impact on the size of the config file?
Last time I enabled Suricata on a 4th interface, my already 92MB config file grew to 120MB+ causing a total crash of the GUI (php), SSH, etc. Traffic was still running but the only way I could recover was to SCP in, switch back to the previous config file and reboot.No, changing that should have almost zero effect on the config file size. It literally just stores about 41 or 42 ASCII characters in total depending on if is set to "on" or "off".
I can't imagine why Suricata is making your config file that large. It only stores basic configuration info in ASCII XML. The only thing I can possibly imagine that would be larger is if you had thousands of rules forced enabled or disabled, or you had huge SID management conf files. But even then I can't imagine Suricata adding almost 30 MB of stuff to
config.xml
.Have you actually looked in the file to see what is using that amount of data? It may be RRD logging (the input/output stats graph data). That has absolutely nothing to do with Suricata, though.
Open
/conf/config.xml
in an editor and either search for, or scroll down to, the <installed_packages> section. The file is plaintext XML. Then within that section search for <suricata>. Only the info between the two tags <suricata></suricata> are used by the Suricata package. -
Suricata still rock stable.
Regarding the config; It's probably my insane amount of enabled rules that is causing the issue. There are only 5500 lines in the config but some lines are miles long, even Notepad++chokes on it. Specially this one;
<customrules>YWxlcnQgdGNwICRIT01FX05FVCBhbnkgLT4gWzY3L... +another few miles of txt
And that times 3 (3 suricata enabled interfaces) I will look into more efficient rule enabling. Thanks!
-
@ballistic said in Suricata won't stop:
Suricata still rock stable.
Regarding the config; It's probably my insane amount of enabled rules that is causing the issue. There are only 5500 lines in the config but some lines are miles long, even Notepad++chokes on it. Specially this one;
<customrules>YWxlcnQgdGNwICRIT01FX05FVCBhbnkgLT4gWzY3L... +another few miles of txt
And that times 3 (3 suricata enabled interfaces) I will look into more efficient rule enabling. Thanks!
Custom Rules and various lists such as SID management conf files are stored in the
config.xml
as Base64-encoded ASCII text within the applicable element tag. Those can get long if you have tons and tons of those kinds of entries. As originally envisoned and coded, the idea was users would have maybe only a dozen or couple of dozen custom rules max. Nothing inherently wrong with having more, but it will result in a much bigger XML element entry in theconfig.xml
file. -
Thanks for the explanation.
The problems I got after enabling a 4th Suricata interface was something like this:
https://forum.netgate.com/topic/156679/pfsense-fatal-error-allowed-memory-exhausted-cause
Only able to restore after manual config file replacement and reboot.This was on 2.4 with Suricata 2.0.5 or something. I haven't dared to try it on 2.5/2.0.6 yet but perhaps I should open a seperate topic for that as it's out of scope for this topic's subject.
Spoiler alert: I have 56 thousand custom rules per interface. It's urlhaus blocklist which I haven't updated in a while it seems. If i update now, it would up it to 84k rules :) -
@ballistic said in Suricata won't stop:
Thanks for the explanation.
The problems I got after enabling a 4th Suricata interface was something like this:
https://forum.netgate.com/topic/156679/pfsense-fatal-error-allowed-memory-exhausted-cause
Only able to restore after manual config file replacement and reboot.This was on 2.4 with Suricata 2.0.5 or something. I haven't dared to try it on 2.5/2.0.6 yet but perhaps I should open a seperate topic for that as it's out of scope for this topic's subject.
Spoiler alert: I have 56 thousand custom rules per interface. It's urlhaus blocklist which I haven't updated in a while it seems. If i update now, it would up it to 84k rules :)That is an absurd number of rules to be honest. Are they just IP addresses? If from URLHaus, then I suspect they are actually simple lists of IP addresses/networks to block translated into Suricata/Snort rules syntax. Something like that would be more efficient when used as an IP list in pfBlockerNG-devel or else simply loading a URL table alias in pfSense. Because almost 100% of email traffic today is TLS (and thus encrypted), I can't imagine those rules examining any actual content (meaning data in the packet payloads). That is unless you are doing full MITM proxying of email traffic.
-
It's this list;
https://urlhaus.abuse.ch/downloads/suricata-ids/ -
@ballistic said in Suricata won't stop:
It's this list;
https://urlhaus.abuse.ch/downloads/suricata-ids/Did you know that there is now an option under the GLOBAL SETTINGS tab to add your own additional rules URLs? User @viktor_g here on the forums added that new feature a few months ago.
So you could simply copy this URL on that tab as an "additional rules" entry and then Suricata will download the list each time it updates the rules. You would not need to copy all of that text into the Custom Rules dialog, and your
config.xml
would be considerably smaller as well (since it would no longer need all that Base64-encoded text). -
I could hug you sir! :)
Correct URL in this case is https://urlhaus.abuse.ch/downloads/urlhaus_suricata.tar.gz
-
@ballistic said in Suricata won't stop:
I could hug you sir! :)
Correct URL in this case is https://urlhaus.abuse.ch/downloads/urlhaus_suricata.tar.gz
Yes, I was about to post that I've never seen rules lists like that without also having some kind of zip or tar archive available as well... .
Oh, and after examining some of those rules, I see that they are doing content matching on the URI (so the only unencrypted part of https traffic). So not looking at the actual content as I originally said, but looking at the URI instead of just the IP. So these rules would not work in a typical alias.
-
Ok it took a few days but it stopped again after enabling updates.
GUI currently states stopped.
Process is actually running:
root 59462 110.9 21.3 1939960 1774988 - SNs Thu00 590:16.49 /usr/local/bin/suricata -i vtnet0.100 -D -c /usr/local/etc/suricata/suricata_24829_vtnet0.100/suricata.yaml --pidfile /var/run/suricata_vtnet0.10024829.pidPID file is not there;
[2.5.2-RELEASE][admin@thuis]/root: ls -al /var/run |grep suri
-rw-r--r-- 1 root wheel 6 Feb 11 00:03 suricata_vtnet0.10133180.pid
-rw-r--r-- 1 root wheel 6 Feb 11 00:03 suricata_vtnet0.20053803.pidI have now enabled "Live Rule Swap on Update" See how that goes.
-
Will you please post the entire contents of this file:
/usr/local/etc/rc.d/suricata.sh
? I want to see how the interfaces are named in there, because I see you are running VLANs.And just to clarify, is blocking currently disabled?
But if enabled, which type are you using: Legacy Mode or Inline IPS Mode?
-
I have 2 pretty much identical configurated machines. 1 one bare metal (Xeon E-2236, 16GB) which does not experience any kinds of these issues. The one we are have been talking about is a VM on a Proxmox node (i5-8259U, 16GB. 8GB for PFsense)
Because everything was stable, I re-enabled blocking Legacy and it was still stable. Now after enabling the Updates, the problem came back within a day. (1d update interval)#!/bin/sh ######## # This file was automatically generated # by the pfSense service handler. ######## Start of main suricata.sh rc_start() { ### Make sure libraries path cache is up2date /etc/rc.d/ldconfig start ### Lock out other start signals until we are done /usr/bin/touch /var/run/suricata_pkg_starting.lck ## Start suricata on SECUREWIFI (vtnet0.101) ## if [ ! -f /var/run/suricata_vtnet0.10133180.pid ]; then pid=`/bin/pgrep -fn "suricata -i vtnet0.101 -D -c /usr/local/etc/suricata/suricata_33180_vtnet0.101/suricata.yaml "` else pid=`/bin/pgrep -F /var/run/suricata_vtnet0.10133180.pid` fi if [ -z $pid ]; then /bin/cp /dev/null /var/log/suricata/suricata_vtnet0.10133180/suricata.log /usr/bin/logger -p daemon.info -i -t SuricataStartup "Suricata START for SECUREWIFI(33180_vtnet0.101)..." /usr/local/bin/suricata -i vtnet0.101 -D -c /usr/local/etc/suricata/suricata_33180_vtnet0.101/suricata.yaml --pidfile /var/run/suricata_vtnet0.10133180.pid > /dev/null 2>&1 fi sleep 1 ## Start suricata on UNSECUREWIFI (vtnet0.200) ## if [ ! -f /var/run/suricata_vtnet0.20053803.pid ]; then pid=`/bin/pgrep -fn "suricata -i vtnet0.200 -D -c /usr/local/etc/suricata/suricata_53803_vtnet0.200/suricata.yaml "` else pid=`/bin/pgrep -F /var/run/suricata_vtnet0.20053803.pid` fi if [ -z $pid ]; then /bin/cp /dev/null /var/log/suricata/suricata_vtnet0.20053803/suricata.log /usr/bin/logger -p daemon.info -i -t SuricataStartup "Suricata START for UNSECUREWIFI(53803_vtnet0.200)..." /usr/local/bin/suricata -i vtnet0.200 -D -c /usr/local/etc/suricata/suricata_53803_vtnet0.200/suricata.yaml --pidfile /var/run/suricata_vtnet0.20053803.pid > /dev/null 2>&1 fi sleep 1 ## Start suricata on WIRED (vtnet0.100) ## if [ ! -f /var/run/suricata_vtnet0.10024829.pid ]; then pid=`/bin/pgrep -fn "suricata -i vtnet0.100 -D -c /usr/local/etc/suricata/suricata_24829_vtnet0.100/suricata.yaml "` else pid=`/bin/pgrep -F /var/run/suricata_vtnet0.10024829.pid` fi if [ -z $pid ]; then /bin/cp /dev/null /var/log/suricata/suricata_vtnet0.10024829/suricata.log /usr/bin/logger -p daemon.info -i -t SuricataStartup "Suricata START for WIRED(24829_vtnet0.100)..." /usr/local/bin/suricata -i vtnet0.100 -D -c /usr/local/etc/suricata/suricata_24829_vtnet0.100/suricata.yaml --pidfile /var/run/suricata_vtnet0.10024829.pid > /dev/null 2>&1 fi sleep 1 ### Remove the lock since we have started all interfaces if [ -f /var/run/suricata_pkg_starting.lck ]; then /bin/rm /var/run/suricata_pkg_starting.lck fi } rc_stop() { if [ -f /var/run/suricata_vtnet0.10133180.pid ]; then pid=`/bin/pgrep -F /var/run/suricata_vtnet0.10133180.pid` /usr/bin/logger -p daemon.info -i -t SuricataStartup "Suricata STOP for SECUREWIFI(33180_vtnet0.101)..." /bin/pkill -TERM -F /var/run/suricata_vtnet0.10133180.pid time=0 timeout=30 while /bin/kill -TERM $pid 2>/dev/null; do sleep 1 time=$((time+1)) if [ $time -gt $timeout ]; then break fi done if [ -f /var/run/suricata_vtnet0.10133180.pid ]; then /bin/rm /var/run/suricata_vtnet0.10133180.pid fi else pid=`/bin/pgrep -fn "suricata -i vtnet0.101 -D -c /usr/local/etc/suricata/suricata_33180_vtnet0.101/suricata.yaml "` if [ ! -z $pid ]; then /usr/bin/logger -p daemon.info -i -t SuricataStartup "Suricata STOP for SECUREWIFI(33180_vtnet0.101)..." /bin/pkill -TERM -fn "suricata -i vtnet0.101 " time=0 timeout=30 while /bin/kill -TERM $pid 2>/dev/null; do sleep 1 time=$((time+1)) if [ $time -gt $timeout ]; then break fi done fi fi sleep 1 if [ -f /var/run/suricata_vtnet0.20053803.pid ]; then pid=`/bin/pgrep -F /var/run/suricata_vtnet0.20053803.pid` /usr/bin/logger -p daemon.info -i -t SuricataStartup "Suricata STOP for UNSECUREWIFI(53803_vtnet0.200)..." /bin/pkill -TERM -F /var/run/suricata_vtnet0.20053803.pid time=0 timeout=30 while /bin/kill -TERM $pid 2>/dev/null; do sleep 1 time=$((time+1)) if [ $time -gt $timeout ]; then break fi done if [ -f /var/run/suricata_vtnet0.20053803.pid ]; then /bin/rm /var/run/suricata_vtnet0.20053803.pid fi else pid=`/bin/pgrep -fn "suricata -i vtnet0.200 -D -c /usr/local/etc/suricata/suricata_53803_vtnet0.200/suricata.yaml "` if [ ! -z $pid ]; then /usr/bin/logger -p daemon.info -i -t SuricataStartup "Suricata STOP for UNSECUREWIFI(53803_vtnet0.200)..." /bin/pkill -TERM -fn "suricata -i vtnet0.200 " time=0 timeout=30 while /bin/kill -TERM $pid 2>/dev/null; do sleep 1 time=$((time+1)) if [ $time -gt $timeout ]; then break fi done fi fi sleep 1 if [ -f /var/run/suricata_vtnet0.10024829.pid ]; then pid=`/bin/pgrep -F /var/run/suricata_vtnet0.10024829.pid` /usr/bin/logger -p daemon.info -i -t SuricataStartup "Suricata STOP for WIRED(24829_vtnet0.100)..." /bin/pkill -TERM -F /var/run/suricata_vtnet0.10024829.pid time=0 timeout=30 while /bin/kill -TERM $pid 2>/dev/null; do sleep 1 time=$((time+1)) if [ $time -gt $timeout ]; then break fi done if [ -f /var/run/suricata_vtnet0.10024829.pid ]; then /bin/rm /var/run/suricata_vtnet0.10024829.pid fi else pid=`/bin/pgrep -fn "suricata -i vtnet0.100 -D -c /usr/local/etc/suricata/suricata_24829_vtnet0.100/suricata.yaml "` if [ ! -z $pid ]; then /usr/bin/logger -p daemon.info -i -t SuricataStartup "Suricata STOP for WIRED(24829_vtnet0.100)..." /bin/pkill -TERM -fn "suricata -i vtnet0.100 " time=0 timeout=30 while /bin/kill -TERM $pid 2>/dev/null; do sleep 1 time=$((time+1)) if [ $time -gt $timeout ]; then break fi done fi fi sleep 1 } case $1 in start) if [ ! -f /var/run/suricata_pkg_starting.lck ]; then rc_start else /usr/bin/logger -p daemon.info -i -t SuricataStartup "Ignoring additional START command since Suricata is already starting..." fi ;; stop) rc_stop ;; restart) rc_stop sleep 5 rc_start ;; esac
-
@ballistic said in Suricata won't stop:
I have 2 pretty much identical configurated machines. 1 one bare metal (Xeon E-2236, 16GB) which does not experience any kinds of these issues. The one we are have been talking about is a VM on a Proxmox node (i5-8259U, 16GB. 8GB for PFsense)
Oh, that certainly changes the possible causes. In my mind this strongly points the finger at something Proxmox related. If you search here on the Netgate forum you will find a fair number of posts related to various issues with running pfSense on Proxmox. Has this just started and otherwise the Proxmox instances have been good, or is the Proxmox install new?
The shell script you posted looks fine and contains what I expected to see. The script works by searching for the running Suricata process (one section in the file for each configured interface) and sending it a TERM command. It searches two ways. First it looks for the PID file, and if found kills the process by PID using the process ID read from the file in
/var/run
. If it can't find the PID file, then it searches usingpkill
with some of the command line arguments passed when that Suricata instance was started in an attempt to be sure any matching running process is stopped. In either case, it then waits up to 30 seconds in a loop for the signalled Suricata process to stop and remove its PID file from/var/run
. After that 30 seconds, the PID file is forcibly removed if it still exists. This is necessary because if that PID file exists, then Suricata will not start up. It will log an error about a "stale PID file".