surricata keeps shutting down
-
A package update with a fix for this problem has been posted for review and merge by the pfSense team. I've asked them to update the DEVEL, RELEASE and pfSense+ branches.
The Pull Request is posted here: https://github.com/pfsense/FreeBSD-ports/pull/1085. It may take a day or two for the updated package to be posted.
-
That's a pretty faster turnaround! :) It also shows available on 21.05 and 2.5.2 this morning, already.
The "Live Rule Swap on Update" option did work around it as well. Re: defaulting that to off, I also recall earlier posts mentioning RAM usage on lower RAM hardware.
-
@bmeeks Hope I'm not resurrecting a thread that should be dead.
I'm seeing the same issue with Suricata randomly stopping on some interfaces and not being able to restart them.
This is the error I see in the log: 6/8/2021 -- 07:59:27 - <Error> -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata_mvneta211853.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata_mvneta211853.pid. Aborting!
This what I see running the grep command mentioned above:
9533 - S 0:00.01 sh -c ps -ax | grep suricata 2>&1
59933 - R 0:00.00 grep suricata
87965 - Rs 0:28.53 /usr/local/bin/suricata -i mvneta1.3 -D -c /usr/local/etc/suricata/suricata_33184_mvneta1.3/suricata.yaml --pidfile /var/run/suricata_mvneta1.333184.pidWhen I run ls /var/run/ though there are the "stale" pid's showing below.
suricata_mvneta1.333184.pid
suricata_mvneta114834.pid
suricata_mvneta211853.pidIf I manually delete these I can restart the service on those interfaces, but they ultimately just stop again. I've tried the live update option and that hasn't helped.
For reference I'm running on a Netgate SG-3100 so I would think I have enough horsepower for this as it's only a home system.
-
@bitslammer said in surricata keeps shutting down:
@bmeeks Hope I'm not resurrecting a thread that should be dead.
I'm seeing the same issue with Suricata randomly stopping on some interfaces and not being able to restart them.
This is the error I see in the log: 6/8/2021 -- 07:59:27 - <Error> -- [ERRCODE: SC_ERR_INITIALIZATION(45)] - pid file '/var/run/suricata_mvneta211853.pid' exists but appears stale. Make sure Suricata is not running and then remove /var/run/suricata_mvneta211853.pid. Aborting!
This what I see running the grep command mentioned above:
9533 - S 0:00.01 sh -c ps -ax | grep suricata 2>&1
59933 - R 0:00.00 grep suricata
87965 - Rs 0:28.53 /usr/local/bin/suricata -i mvneta1.3 -D -c /usr/local/etc/suricata/suricata_33184_mvneta1.3/suricata.yaml --pidfile /var/run/suricata_mvneta1.333184.pidWhen I run ls /var/run/ though there are the "stale" pid's showing below.
suricata_mvneta1.333184.pid
suricata_mvneta114834.pid
suricata_mvneta211853.pidIf I manually delete these I can restart the service on those interfaces, but they ultimately just stop again. I've tried the live update option and that hasn't helped.
For reference I'm running on a Netgate SG-3100 so I would think I have enough horsepower for this as it's only a home system.
Your issue is not the same as the ones posted in this thread. The problem in this thread was a disappearing PID file, not a stale one. You have the "stale" files because Suricata is crashing and not cleaning up after itself. The stale PID files are a symptom, not a cause, of your problem.
Look in the pfSense system log for any Suricata or php-fpm related messages. I'm betting you find some from at least one of those sources.
What version of pfSense+ are you running on your firewall, and what version of the Suricata package?
-
@bmeeks Thanks.
Found the issue:
Aug 26 07:18:27 kernel pid 46136 (suricata), jid 0, uid 0, was killed: out of swap space
Aug 26 07:18:27 kernel pid 55608 (suricata), jid 0, uid 0, was killed: out of swap spaceNot sure how to correct it. I was having no issues prior to the 21.05.1 upgrade. I'm running Suricata 6.0.0_14.
-
@bitslammer said in surricata keeps shutting down:
@bmeeks Thanks.
Found the issue:
Aug 26 07:18:27 kernel pid 46136 (suricata), jid 0, uid 0, was killed: out of swap space
Aug 26 07:18:27 kernel pid 55608 (suricata), jid 0, uid 0, was killed: out of swap spaceNot sure how to correct it. I was having no issues prior to the 21.05.1 upgrade. I'm running Suricata 6.0.0_14.
Check
top
to see what is using up your memory. RAM is limited in the SG-3100, so if you are running a large number of rules, that can be a problem. Other potential problem spots are DNSBL with very large domain blacklists.Do you have "Live Reload" enabled on the GLOBAL SETTINGS tab for the Suricata rules update? If so, that can result in increased memory usage during rule updates as Suricata keeps two copies of the rules in RAM for a bit as it loads the updated rules alongside the existing older rules. After everything is ready, it dumps the older rules. But for a while, two copies exist, and thus RAM usage goes up.
-
@bmeeks Thanks. Kind of figured this was the issue. I'll see what I can clean up and will turn off live update.
-
@bitslammer said in surricata keeps shutting down:
out of swap space
There is an issue in 21.x and 2.5.x with pcscd gradually taking up RAM. There is a patch or you can just stop it after booting.
-
@steveits said in surricata keeps shutting down:
@bitslammer said in surricata keeps shutting down:
out of swap space
There is an issue in 21.x and 2.5.x with pcscd gradually taking up RAM. There is a patch or you can just stop it after booting.
Good catch! I forgot about that bug.
-
@bmeeks said in surricata keeps shutting down:
I forgot about that bug.
I upgraded a bunch of clients this month and went back and stopped it on previous upgrades. Hopefully I have burned it in and won't forget it on one. :) Didn't actually run into a problem, but I think the highest I saw was over 2 GB on our one non-appliance install and it definitely grows over time.
(for lurkers, the intent is to have pcscd not running by default on future versions...would be a non-issue if it didn't have a memory leak)
-
about a day or so after I raised this, the issue was addressed and successfully fixed.
I haven't had an issue since. it all works great