SNORT - Snort 2.9.2.3 pkg v. 2.2.1 process do not quit via update scripts or GUI

sprout

2.2.2 fixes the issue for me :)

eri--

@ermal:

Put a comment on the pull request but those were not the right fixes.
Especially the != 0 is wrong.

Then why is it this fixed my snort? Changing to != stopped my snort from not killing existing instances when the service is restarted. The "0" simply means the process exited without error. However, that if statement is looking for existing instances in the PID file. When pgrep finds a pid number in the pid file, it returns that number rather than 0. Therefore, it SHOULD be != or the script will not work. I suggest you try it, you'll see I'm right. Just issue that pgrep command as part of an echo statement at a command prompt and see what it returns when the PID file has a valid instance and when it does not. If an instance exists, you'll see the pid number, which is when you should run pkill to stop it, since we are looking to stop existing instances. If an instance does not exists, you will get 0. In that case, no running instances, so don't bother trying to stop them with pkill. Honestly, I did actually test this out before suggesting the fix.

Usually i do not go into this kind of debate but from pgrep(1)

EXIT STATUS
The pgrep and pkill utilities return one of the following values upon
exit:

0 One or more processes were matched.

1 No processes were matched.

2 Invalid options were specified on the command line.

3 An internal error occurred.

And from testing this on the shell

Finidng the pid

pfsense-dev# pgrep -x cron
1519
pfsense-dev# echo $?
0
pfsense-dev#

Not finding

pfsense-dev# pgrep -x init
pfsense-dev# echo $?
1
pfsense-dev#

breusshe

This is a modified version of my last post, those that read the earlier version, my apologies. I'm having a bad day and failed to check myself before making an ass of myself. I hope my attitude didn't put anyone off. That aside, here is the modified version:

The changes I made to snort.inc and, as a consequence, snort.sh fixed my snort. I now only have one copy of any given instance running on my system and when I update the rules, manually stop/start an instance, or even reboot the server.

Without these changes, including the most recent version you released (v.2.2.2) snort does not restart correctly.

You are not looking at the behavior of the test command in relation to it's output and pgrep's output. In truth, my solution is a bit wrong also. Look at this:


echo test "`/bin/pgrep -nF /var/run/snort_re027549.pid`" != "0"
test 39615 != 0

This is the command you are actually running. See how the pid is the first value being tested? With how you have the script setup, you are checking to see if "39615 = 0", which it will never be, so test fails, thus, code not run. With my original solution (used in the above example), the test is to see if "39615 != 0", which it always is, so the code is run. In other words, the exit code is not being checked, the return value is. If a pid exists, that number is returned. Now, let's look at when the pid file doesn't exist:


echo test "`/bin/pgrep -nF /var/run/snort_re0275491.pid`" != "0"
pgrep: Cannot open pidfile `/var/run/snort_re0275491.pid': No such file or directory
test  != 0

So, now we can't find the file and the test is invalid (since the first parameter doesn't even exist, which causes test to throw an "unexpected operator" error). So the code is not run; which is good since the pid file doesn't exist, so neither do any processes. So, the outcome is correct, but the code is wrong.

Okay, so we are both wrong. How do we fix it. The perfect solution would first check for an existing pid file, then perform the check to make sure the code returned is greater than 3 (since this is the highest error code pgrep uses, thus we can assume a valid pid was found). So:


if [ -e /var/run/snort_re027549.pid ]; then
    if [ `/bin/pgrep -nF /var/run/snort_re027549.pid` -gt 3 ]; then

Now we know the file exists, so the test can actually do something accurate and useful. Also, whatever the output is, it exceeds the highest exit code (0-3) which could be used as a return value, so we know we have a pid number. Since we are trying to stop an existing process before starting new ones, this is good, so run the code.

As for letting rc_start() handle the restart code all by itself, you would need to use the same code as I've explained above, and, rather than putting the start command inside an else{} statement, simply call rc_stop() inside of rc_start(). This would best handle the issue without code redundancy. I know it means losing the -HUP from the pkill command but, it isn't necessary and I've seen -HUP fail to stop a process if it is currently busy, which Snort tends to be. Also, I know this would mean barnyard2 is killed irregardless of situation. This could be settled by adding a boolean value: if true, kill barnyard; if false, don't kill barnyard. It might also be necessary to add to the block of code that actually starts snort and barnyard2, a check to make sure that the snort processes stopped by rc_stop() are actually done closing out before starting the new ones. This could be a looped code that, if pgrep finds processes, sleep one second, then check again, otherwise, start snort and barnyard2, then break out.

I'm going to put the more relevant parts of these suggestions into a corrected snort.inc file and post another pull request later today. My corrections will simply remove the if-statement part of the if-else and leave the else-statement part in tact from rc_start(). i.e., I'm going to disable the if-else statement while leaving the else part in tact. I'm also going to remove the line that deletes the pid file (since rc_stop takes care of this) and add at the very top of rc_start() a call to rc_stop.

In rc_stop(), I'm going to add the double if-statement listed in this post to check for the pid file, then run a check for instances in it. I'm also going to do a check for the pid file on the rm statement that happens right after the pkill commands are run.

Ermal, I ask that you test this out to see it actually fixes the problem. The changes I'm proposing are active on my pfsense server right now and they work. I no longer have issues with restarting or starting Snort.

eri--

Install stock 2.2.2 and give me syslog errors on what is happening!
Or run snort.sh from ssh session manually specifying restart and post it here.

For me all works and your suggestion is logic bug.

judex

Hey Breusshe!

Thx for your investigations. Did you also add the rc_stop statement in snort.inc before line 937? This generates working snort.sh for me. Logic bug or not - at leats it works. Stock installation gives no errors in log. You just get two instances.

Greets, Judex

breusshe

Here is restart:


/usr/local/etc/rc.d(60): ./snort.sh restart                
inside rc_start()                                                                                             
inside rc_stop()                                                                                              
Spawning daemon child...                                                                                      
My daemon child 45455 lives...                                                                                
Daemon parent exiting (0)                                                                                     
Spawning daemon child...                                                                                      
My daemon child 45983 lives...                                                                                
Daemon parent exiting (0)

The inside rc_start() and inside rc_stop() are echo tags I put into the snort.sh file so I could see when pid file does not exist errors were occurring. My current iteration of snort.sh does not throw such errors any longer.

and here is pgrep showing how many are now running:


/usr/local/etc/rc.d(61): pgrep snort
45983
45455

breusshe

@Judex:

Yeah, I put that in with my first round of corrections. But, after seeing Ermal's post and checking things out, I've revised my suggested fix. See the long post I put up a few minutes ago.

judex

@ermal:

Install stock 2.2.2 and give me syslog errors on what is happening!
Or run snort.sh from ssh session manually specifying restart and post it here.

I run "snort.sh restart" with stock snort.sh. Before one pid, after two pids:


[2.0.1-RELEASE][root@gatekeeper.me.local]/root(27): pgrep snort
57551
[2.0.1-RELEASE][root@gatekeeper.me.local]/root(28): /usr/local/etc/rc.d/snort.sh restart
Spawning daemon child...
My daemon child 11345 lives...
Daemon parent exiting (0)
[2.0.1-RELEASE][root@gatekeeper.me.local]/root(29): pgrep snort
11345
57551

Sorry for the redundant information. Breusshe was faster…

breusshe

No, that's great, Judex. It shows stock isn't working, but my is. Now the problem can be fixed.

dwood

Uninstalling vs 2.2.1 and installing 2.2.2 worked for me (just now). Snort can be stopped now, and pgrep snort shows only 1 instance per interface when started. Blocked IPs when the interface shows stopped issue also addressed.

Thx :-)

SectorNine50

I've also found that I had two instances of snort running after the update.

I'm not sure if it happened immediately or not, but I noticed today that my RAM and SWAP usage was way up, so I checked top and noticed both the instances.

I also noticed that pressing the stop button in the services window did not stop both snort services, however, pressing stop on the specific interface did successfully stop one of them.

Not sure if this helps or not.

breusshe

Okay, for the third time, I've posted fixes for the snort restart problem to the repository. Here is the pull request:

https://github.com/bsdperimeter/pfsense-packages/pull/278

Hopefully someone will actually try these changes out before stating they will not fix the problem and closing the pull request.

Here's to hope!

eri--

Ok fixed and found the issue why your suggestion of != 0 was seemingly working for you.
Check the new commit i made that fixes the issue/bug in the rc script.

breusshe

@ermal:

Ok fixed and found the issue why your suggestion of != 0 was seemingly working for you.
Check the new commit i made that fixes the issue/bug in the rc script.

Thanks, Ermal. I re-installed snort and re-saved an instance to update the snort.sh file. Your solution works, with two minor hitches. See the below output from top:

1st Trial:


Mem: 950M Active, 107M Inact, 336M Wired, 4K Cache, 248M Buf, 2524M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
59839 root          2  44    0   928M   613M nanslp  0   0:00  0.00% snort
23711 root          2  44    0   616M   301M nanslp  0   0:00  0.00% snort

Mem: 979M Active, 305M Inact, 339M Wired, 4K Cache, 248M Buf, 2294M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
59839 root          2  44    0  1056M   731M nanslp  0   1:27  0.00% snort
23711 root          2  44    0   738M   411M nanslp  1   0:29  0.00% snort

Mem: 409M Active, 280M Inact, 334M Wired, 4K Cache, 248M Buf, 2893M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND

23711 root          2  44    0   876M   543M nanslp  0   0:57  0.00% snort

2nd Trial:


Mem: 945M Active, 112M Inact, 336M Wired, 4K Cache, 248M Buf, 2523M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
29717 root          2  44    0   928M   614M nanslp  2   0:04 20.65% snort
31324 root          2  45    0   616M   300M nanslp  2   0:02 70.61% snort

Mem: 975M Active, 312M Inact, 339M Wired, 4K Cache, 248M Buf, 2290M Free        
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
29717 root          2  44    0  1050M   726M nanslp  2   1:54 10.89% snort
31324 root          2  44    0   746M   420M nanslp  0   0:55 10.25% snort

Mem: 413M Active, 276M Inact, 334M Wired, 4K Cache, 248M Buf, 2894M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND

31324 root          2  44    0   874M   542M nanslp  0   1:44  3.96% snort

3rd trial:


Mem: 948M Active, 109M Inact, 336M Wired, 4K Cache, 248M Buf, 2523M Free        
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
44906 root          2  44    0   616M   300M nanslp  0   0:05 15.58% snort
43256 root          2  44    0   928M   614M nanslp  1   0:05  9.18% snort

Mem: 985M Active, 317M Inact, 339M Wired, 4K Cache, 248M Buf, 2276M Free        
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
43256 root          2  44    0  1058M   734M nanslp  2   1:48  5.76% snort
44906 root          2  44    0   752M   426M nanslp  0   0:52  5.57% snort

Mem: 414M Active, 294M Inact, 335M Wired, 4K Cache, 248M Buf, 2874M Free        
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND

44906 root          2  44    0   894M   561M nanslp  0   1:39  0.68% snort

The above are three trial runs where I just kept issuing the command snort.sh restart, each trial only has three restarts. I then had to issue snort.sh stop because, as you can see in the 3rd run of each trial, one of my snort instances does not restart. That is when I began the next trial.

As you can see, the snort instances use more memory between each successive runs of the snort.sh script. That makes sense to me since -HUP doesn't kill the process, it only refreshes it, mapping changes to new segments of memory. The problem is that many folks use systems with limited resources. As each update to the snort rules happens, a little more memory is lost to HUP. Eventually, all the resources are used up. That means admins need to watch Snort to make sure it isn't hitting that limit. However, this might not be an issue since, as I mentioned, one of my snort instances will not survive the 3rd restart. It is a consistent behavior. On the 3rd restart of snort, an instance of snort does not recover. I then removed -HUP from snort.sh and modified the if-else statement it was in into just an if-statement. The contents of the else-statement were moved outside of the if-statement. Here is the modified code:


if [ $? = 0 ]; then
        /bin/pkill -F /var/run/snort_re027549.pid -a
                        /usr/bin/logger -p daemon.info -i -t SnortStartup "Snort SOFT START For Block Only Rules(27549_re0)..."
fi
       # Start snort and barnyard2
        /bin/rm /var/run/snort_re027549.pid
                /usr/local/bin/snort -R 27549 -D -q -l /var/log/snort/snort_re027549 --pid-path /var/run --nolock-pidfile -G 27549 -c /usr/local/etc/snort/snort_27549_re0/snort.conf -i re0
                /usr/bin/logger -p daemon.info -i -t SnortStartup "Snort START For Block Only Rules(27549_re0)..."

I then started repeating my trials. Only got one trial, but it has six restarts in it. Both snort instances come back with each restart and memory usage is consistent across all restarts.

Removed -HUP, new trials:


Mem: 951M Active, 107M Inact, 336M Wired, 4K Cache, 248M Buf, 2523M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
15636 root          2  44    0   616M   300M nanslp  2   0:13  1.95% snort
52030 root          2  44    0   928M   614M nanslp  2   0:13  1.95% snort

Mem: 946M Active, 111M Inact, 336M Wired, 4K Cache, 248M Buf, 2524M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
12265 root          2  44    0   928M   614M nanslp  2   0:00  0.29% snort
41341 root          2  44    0   616M   300M nanslp  2   0:00  0.29% snort

Mem: 949M Active, 109M Inact, 336M Wired, 4K Cache, 248M Buf, 2523M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
24100 root          2  47    0   616M   300M nanslp  2   0:01 92.82% snort
61007 root          2  44    0   928M   614M nanslp  0   0:01 14.65% snort

Mem: 950M Active, 108M Inact, 336M Wired, 4K Cache, 248M Buf, 2523M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
42501 root          2  44    0   616M   300M nanslp  2   0:03 43.46% snort
12811 root          2  44    0   928M   614M nanslp  2   0:05 19.78% snort

Mem: 952M Active, 106M Inact, 336M Wired, 4K Cache, 248M Buf, 2522M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
36328 root          2  45    0   616M   301M nanslp  2   0:01 71.44% snort
34615 root          2  44    0   928M   614M nanslp  1   0:03 18.95% snort

Mem: 950M Active, 108M Inact, 336M Wired, 4K Cache, 248M Buf, 2523M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
17614 root          2  44    0   616M   300M nanslp  0   0:03 43.65% snort
15390 root          2  44    0   928M   614M nanslp  0   0:05 19.34% snort

I'm suggesting that the -HUP be removed from snort.inc so it does not, at the very least, hinder performance for limited resource users. I know this means that when a snort restart happens, it will also mean the system isn't protected by snort for a few minutes. However, I believe snort dying after a restart will also be prevented with other users. I'm not saying at least one snort instance will die on the third restart, but I do assert that snort will die after some number of restarts and, for a given system, will be consistent. I've posted a patch for proposed patch to snort.inc, unless you can see a better solution than mine:

https://github.com/bsdperimeter/pfsense-packages/pull/281

Maybe a better solution would be to add a new setting to the global settings, a checkbox that allows the user to determine if they want to use -HUP or not. Then they can look at their environment and decide which is better for them: a steady creep of resource loss or not being unprotected for a few minutes. Depending on that setting, either the code as it stands in the repository is used, or something similar to the code I'm proposing.

eri--

I think that time is a factor here with your snort.sh.
A service like snort needs its time to prepare itself for releasing and allocating new resources from a HUP so you have to take that into consideration.

Furthermore i would be expecting a system log associated with that.

breusshe

@ermal:

I think that time is a factor here with your snort.sh.
A service like snort needs its time to prepare itself for releasing and allocating new resources from a HUP so you have to take that into consideration.

Furthermore i would be expecting a system log associated with that.

How much time? will that result in the memory usage dropping back down closer to it's original levels?