surricata keeps shutting down
-
@steveits said in surricata keeps shutting down:
@vjizzle said in surricata keeps shutting down:
this morning I checked and the interface where suricata is enables was stopped again
Same thing on ours. I had checked a couple of times yesterday and at the end of the day it still looked OK.
I almost upgraded our backup (HA) router in our data center to _11 to test it on 21.05...is it OK to run that way if the primary is _10? I would think so but figured I should ask.
Yes, the version mismatch will be okay for now. I will look into the problem. I have a batch of GUI fixes ready to submit for Suricata, but I have been holding off for about a week working on getting the 6.0.3 binary to work with netmap. So far, that is not going well. So I may go ahead and release the GUI fixes and leave the binary on 5.0.6 for now.
-
@darcey said in surricata keeps shutting down:
When suricata is restarted after a rules update, the pid files are deleted but not recreated, despite suricata successfuly starting and invoked with the --pidfile option. Stopping/starting the same interfaces works fine via the suricata->interfaces UI.
Thank you for these details. One of my suspicions was it had something to do with the PID file, because that's how the GUI controls/indicators on the INTERFACES tab query the daemons.
I will look into this today and see what is going on.
-
For users experiencing this issue, I believe the "stopped" condition showing on the INTERFACES tab is incorrect. If the PID file is missing for a running instance, the code on the INTERFACES tab thinks Suricata is not running, even when it actually is.
The pfSense code that populates the SERVICES menu display uses
pgrep
looking for 'suricata' to determine if the service is running. So it will show the daemon present (if it is actually running).So until I get a fix posted, don't 100% trust the display on the INTERFACES tab after an automated rules update. Either check under the SERVICES menu in pfSense, or get a CLI prompt on the firewall and run this command:
ps -ax | grep suricata
It will show any running processes.
WARNING: if Suricata is actually running, and then you go to the INTERFACES tab and click "Start", you will launch an identical clone Suricata process and that will cause you trouble!
-
@steveits said in surricata keeps shutting down:
ERROR: Snort GPLv2 Community Rules md5 download failed
This looks to be this issue.
-
@darcey said in surricata keeps shutting down:
pid files are deleted
That's what I see.
2519 - SNs 17:56.44 /usr/local/bin/suricata -i em0 -D -c /usr/local/etc/suricata/suricata_1532_em0/suricata.yaml --pidfile /var/run/suricata_em01532.pid
ls: /var/run/suricata_em01532.pid: No such file or directory
-
I've reproduced the issue in my test system. I did not see it originally because I had the setting for "Live Rule Updates" enabled on the GLOBAL SETTINGS tab. With that setting, Suricata is not physically stopped and restarted. Instead, it is sent a SIGUS2, which tells it to reload the rules and configuration.
You can temporarily work around this issue by enabling that setting. I'm working to identify exactly where the failure is, and then will get a fix posted for the pfSense team to review and make available.
-
@bmeeks said in surricata keeps shutting down:
You can temporarily work around this issue by enabling that setting. I'm working to identify exactly where the failure is, and then will get a fix posted for the pfSense team to review and make available.
I think I just went with the default at the time I installed your package: live updates disabled.
What issues might typically be seen with this enabled?BTW after reading your earlier posts in this thread, I tried enabling JA3 fingerprinting. I'd overlooked that option.
Once enabled, I obviously no longer get the warnings. However I also found the suricata instance, for the interface on which I enabled JA3, restarted and was recognised as such in the UI. i.e. a pid file was created. Whereas it wasn't for the interface with JA3 still disabled.
Hope that makes sense. Many thanks. -
@darcey said in surricata keeps shutting down:
@bmeeks said in surricata keeps shutting down:
You can temporarily work around this issue by enabling that setting. I'm working to identify exactly where the failure is, and then will get a fix posted for the pfSense team to review and make available.
I think I just went with the default at the time I installed your package: live updates disabled.
Is there any reason I might not want to enable live updates?BTW after reading your earlier posts in this thread, I tried enabling JA3 fingerprinting. I'd overlooked that option.
Once enabled, I obviously no longer get the warnings. However I also found the suricata instance, for the interface on which I enabled JA3, restarted and was recognised as such in the UI. i.e. a pid file was created. Whereas it wasn't for the interface with JA3 still disabled.
Hope that makes sense. Many thanks.The "live rule update" option works pretty well for most folks. There were some edge cases in the past where it was causing a problem, so the default was left in "off". I don't even remember now what those edge cases were. I think it had to do with consuming too much RAM, because with that option enabled, Suricata has to- for a short period- keep two complete copies of the enabled rules in RAM.
The JA3 issue has nothing to do with the PID file problem. I've found the cause of that. It's a race sort of condition between consecutive calls to
suricata_stop()
followed immediately by a call tosuricata_start
. Adding a simple 10-second time delay between those two calls prevents the problem, but I'm looking for a slightly more elegant solution for the permanent fix. When you saved the change, that bumped Suricata and the PID file was created. -
@bmeeks
When I tried enabling the JA3 fingerprint option, I believe I forced a rule update via the web UI. But you now have me doubting myself.
Great work and many thanks for taking the time to explain. I'm going to have a go with the live rules swap. Cheers. -
A package update with a fix for this problem has been posted for review and merge by the pfSense team. I've asked them to update the DEVEL, RELEASE and pfSense+ branches.
The Pull Request is posted here: https://github.com/pfsense/FreeBSD-ports/pull/1085. It may take a day or two for the updated package to be posted.
-
That's a pretty faster turnaround! :) It also shows available on 21.05 and 2.5.2 this morning, already.
The "Live Rule Swap on Update" option did work around it as well. Re: defaulting that to off, I also recall earlier posts mentioning RAM usage on lower RAM hardware.
-
@bmeeks Hope I'm not resurrecting a thread that should be dead.
I'm seeing the same issue with Suricata randomly stopping on some interfaces and not being able to restart them.
This is the error I see in the log: 6/8/2021 -- 07:59:27 - <Error> -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata_mvneta211853.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata_mvneta211853.pid. Aborting!
This what I see running the grep command mentioned above:
9533 - S 0:00.01 sh -c ps -ax | grep suricata 2>&1
59933 - R 0:00.00 grep suricata
87965 - Rs 0:28.53 /usr/local/bin/suricata -i mvneta1.3 -D -c /usr/local/etc/suricata/suricata_33184_mvneta1.3/suricata.yaml --pidfile /var/run/suricata_mvneta1.333184.pidWhen I run ls /var/run/ though there are the "stale" pid's showing below.
suricata_mvneta1.333184.pid
suricata_mvneta114834.pid
suricata_mvneta211853.pidIf I manually delete these I can restart the service on those interfaces, but they ultimately just stop again. I've tried the live update option and that hasn't helped.
For reference I'm running on a Netgate SG-3100 so I would think I have enough horsepower for this as it's only a home system.
-
@bitslammer said in surricata keeps shutting down:
@bmeeks Hope I'm not resurrecting a thread that should be dead.
I'm seeing the same issue with Suricata randomly stopping on some interfaces and not being able to restart them.
This is the error I see in the log: 6/8/2021 -- 07:59:27 - <Error> -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata_mvneta211853.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata_mvneta211853.pid. Aborting!
This what I see running the grep command mentioned above:
9533 - S 0:00.01 sh -c ps -ax | grep suricata 2>&1
59933 - R 0:00.00 grep suricata
87965 - Rs 0:28.53 /usr/local/bin/suricata -i mvneta1.3 -D -c /usr/local/etc/suricata/suricata_33184_mvneta1.3/suricata.yaml --pidfile /var/run/suricata_mvneta1.333184.pidWhen I run ls /var/run/ though there are the "stale" pid's showing below.
suricata_mvneta1.333184.pid
suricata_mvneta114834.pid
suricata_mvneta211853.pidIf I manually delete these I can restart the service on those interfaces, but they ultimately just stop again. I've tried the live update option and that hasn't helped.
For reference I'm running on a Netgate SG-3100 so I would think I have enough horsepower for this as it's only a home system.
Your issue is not the same as the ones posted in this thread. The problem in this thread was a disappearing PID file, not a stale one. You have the "stale" files because Suricata is crashing and not cleaning up after itself. The stale PID files are a symptom, not a cause, of your problem.
Look in the pfSense system log for any Suricata or php-fpm related messages. I'm betting you find some from at least one of those sources.
What version of pfSense+ are you running on your firewall, and what version of the Suricata package?
-
@bmeeks Thanks.
Found the issue:
Aug 26 07:18:27 kernel pid 46136 (suricata), jid 0, uid 0, was killed: out of swap space
Aug 26 07:18:27 kernel pid 55608 (suricata), jid 0, uid 0, was killed: out of swap spaceNot sure how to correct it. I was having no issues prior to the 21.05.1 upgrade. I'm running Suricata 6.0.0_14.
-
@bitslammer said in surricata keeps shutting down:
@bmeeks Thanks.
Found the issue:
Aug 26 07:18:27 kernel pid 46136 (suricata), jid 0, uid 0, was killed: out of swap space
Aug 26 07:18:27 kernel pid 55608 (suricata), jid 0, uid 0, was killed: out of swap spaceNot sure how to correct it. I was having no issues prior to the 21.05.1 upgrade. I'm running Suricata 6.0.0_14.
Check
top
to see what is using up your memory. RAM is limited in the SG-3100, so if you are running a large number of rules, that can be a problem. Other potential problem spots are DNSBL with very large domain blacklists.Do you have "Live Reload" enabled on the GLOBAL SETTINGS tab for the Suricata rules update? If so, that can result in increased memory usage during rule updates as Suricata keeps two copies of the rules in RAM for a bit as it loads the updated rules alongside the existing older rules. After everything is ready, it dumps the older rules. But for a while, two copies exist, and thus RAM usage goes up.
-
@bmeeks Thanks. Kind of figured this was the issue. I'll see what I can clean up and will turn off live update.
-
@bitslammer said in surricata keeps shutting down:
out of swap space
There is an issue in 21.x and 2.5.x with pcscd gradually taking up RAM. There is a patch or you can just stop it after booting.
-
@steveits said in surricata keeps shutting down:
@bitslammer said in surricata keeps shutting down:
out of swap space
There is an issue in 21.x and 2.5.x with pcscd gradually taking up RAM. There is a patch or you can just stop it after booting.
Good catch! I forgot about that bug.
-
@bmeeks said in surricata keeps shutting down:
I forgot about that bug.
I upgraded a bunch of clients this month and went back and stopped it on previous upgrades. Hopefully I have burned it in and won't forget it on one. :) Didn't actually run into a problem, but I think the highest I saw was over 2 GB on our one non-appliance install and it definitely grows over time.
(for lurkers, the intent is to have pcscd not running by default on future versions...would be a non-issue if it didn't have a memory leak)
-
about a day or so after I raised this, the issue was addressed and successfully fixed.
I haven't had an issue since. it all works great