Suricata restart after failure

bmeeks

@mind12
What kind of hardware is pfSense running on? Is it by chance an SG-3100 Netgate appliance?

If yes, then look in the firewally's system log for any Signal 10 errors. Let me know what you find.

The cause of the "stale PID file" error is a crashing Suricata process. Suricata creates a PID file when it starts up and then deletes that same file when the process shuts down gracefully. However, if the running process crashes, then it can't/doesn't delete the PID file. The first thing Suricata checks for when starting up is if a PID file already exists for the process. If so, it will not start up and prints the error you are getting.

Oh, and DO NOT run Service Watchdog with Suricata or Snort. That service does not understand how Suricata and Snort work internally with regards to rules updates and such. The Watchdog Service will misinterpret a simple rules update restart with a Suricata service failure and attempt to restart Suricata while Suricata is already restarting itself. This will result in multiple instances running on the same physical interface. The Service Watchdog package should never be used with either Snort or Suricata.

mind12

@bmeeks
I'm running the latest version in Inline mode with one interface on ESXi 6.0 as a VM.
I tried to figure out what caused the issue but couldn't find any related logs. I have also increased the Flow/Stream Memory cap values as suggested by your other post. I will gather logs the next time it happens if you give me some instructions.

I will disable the watchdog service but how should I monitor after that whether suricata is running or not?

bmeeks

@mind12 said in Suricata restart after failure:

I will disable the watchdog service but how should I monitor after that whether suricata is running or not?

Unfortunately, manually checking from time-to-time is the only way. The Service Watchdog package is not suitable for a package such as Suricata which runs multiple processes when you have it configured on more than one interface. The Service Watchdog package simply looks for a process called "suricata", but fails to recognize when there is more than one. This also means it can't tell Suricata is dead on the LAN if it is still running on the WAN. For example, the LAN process might have crashed but the WAN is still running. When Service Watchdog does a search for a running "suricata" process it sees one (the WAN) and is happy, but in fact half of Suricata would be dead (the LAN process).

While in theory things should be better with Suricata on a single interface, in practice it does not work well. The Service Watchdog package is just periodically checking for a running process named "suricata". If it does not find one, it executes the shell command restart script for the process or package. So what happens is during a rules update, when Suricata restarts itself, the Service Watchdog checks and finds the "suricata" process missing. It immediately runs the shell script to restart it without realizing Suricata is in the process of restarting itself due to the rules update. So the result can be either two copies of Suricata running, or the two processes on the same interface can even step on each other and potentially cause a Suricata crash. Since you are running in a VM (and that means Intel code), this could be the cause of your crashes. My question about an SG-3100 and the Signal 10 error only applies to ARM-based hardware.

mind12

So I have no options to resolve the crashes because this is a VM?
I hope Watchdog caused them and the problem will disappear.

bmeeks

@mind12
No, not "no options", but it won't be the problem I am familiar with which is caused by the clang/llvm compiler used to produce ARM binaries on FreeBSD.

Your crashes could be caused by Service Watchdog or it could be a particular rule that you have enabled. I would put the chance of it being a rule rather low, though. There are many Suricata package users on pfSense. Discounting the known issue with ARM-based hardware, I see very, very few complaints about Suricata just crashing.

Let's see if removing Suricata from Service Watchdog helps.

mind12

Let's see. I will post the logs here if it crashes again. Thank you.

mind12

@bmeeks
It happened again. I found the following logs in the system logs:

Feb 26 15:20:10	kernel		pid 57464 (suricata), uid 0: exited on signal 11 (core dumped)
Feb 26 13:21:38	kernel		698.717626 [1071] netmap_grab_packets bad pkt at 150 len 4577

Any idea what is causing this?

bmeeks

@mind12 said in Suricata restart after failure:

@bmeeks
It happened again. I found the following logs in the system logs:
Feb 26 15:20:10	kernel		pid 57464 (suricata), uid 0: exited on signal 11 (core dumped)
Feb 26 13:21:38	kernel		698.717626 [1071] netmap_grab_packets bad pkt at 150 len 4577
Any idea what is causing this?

Yes, the "netmap_grab_packets" message tells me you are using Inline IPS Mode. That mode uses Netmap. If you also have Intel NICs, there are some settings in this thread you need to try. But ultimately you may be better off to just switch to Legacy Blocking Mode. Netmap, Suricata and FreeBSD still do not always live together peacefully. There are some Netmap changes coming to the Suricata binary in the near future. Hopefully they will make Netmap better behave within Suricata.

mind12

Yep, I use inline mode with Intel NIC. I will try to adjust the settings. Thank you.

bmeeks

Those Netmap changes I mentioned are being done by the upstream developers (meaning the Suricata developer team). I don't know the timeline for when they are going to merge them into a release.

mind12

Ok, I'm fine with this netmap setup. I have been using it since you had posted that Inline mode is available.