100% CPU problem with pfSense 2.3

epionier

I think I could locate the problem in connection with snort. As I stated above snort was not working after PPPoE refresh although the service seemed to be running. I waited for a couple of hours but snort did not re-activate for the "new" PPPoE connection. That is why I restarted the service manually. Having done this the CPU went to 100% and the problem with nonfunctional SSH/Console was back. Unfortunately I did not ran "top -aSH" in advance to verify this which I should have done.

bmeeks

@epionier:

I think I could locate the problem in connection with snort. As I stated above snort was not working after PPPoE refresh although the service seemed to be running. I waited for a couple of hours but snort did not re-activate for the "new" PPPoE connection. That is why I restarted the service manually. Having done this the CPU went to 100% and the problem with nonfunctional SSH/Console was back. Unfortunately I did not ran "top -aSH" in advance to verify this which I should have done.

How large is the system disk partition where /tmp and /var/log are located? Is one or more of these on a RAM disk? If so, Snort will give you fits when any of its "disk space" is on a RAM disk because they almost can never be large enough. Doing so then limits the working memory Snort has for loading rules. Snort needs lot of free disk space in /tmp and /var/log. Lots as in at least 256 MB free in /tmp and 1 GB or more free in /var/log (preferably a lot more to hold the Snort log files).

You gave some information in one of your earlier posts, but I'm not clear if that is for all packages or just Squid.

Bill

epionier

@bmeeks

The SSD disk partition is 15GB with 18% in use. RAM disks are not in use neither for /tmp nor /var.
Here are actual stats of my pfsense having a good life ;) :

MBUF Usage 4% 1016/26584
Load average 0.12, 0.07, 0.07
CPU usage 6%
Memory usage 23% of 3037 MiB
SWAP usage 0% of 4095 MiB
Disk usage ( / ) 18% of 15GiB - ufs
Disk usage ( /var/run ) 4% of 3.4MiB - ufs in RAM

I reduced the RAM to 3GB to try something but as I found out RAM is not the issue. The issue is mostly connected when pfSense reboots / WAN Ip changes or when I re-activate snort services (because of not being "really" active).

A backup of the pfSense VM is done every workday, like last night. The VM is powered off in advance (Open-VM-Tools installed) und is being restarted after the Backup. When I checked it last night after the backup I noticed that the CPU hits again 100%. So I rebooted via Vmware ESXI Manager (connected via IPSec) and quickly started "top -aSH" in the shell and the CPU went striaght to 100% again.

I noticed that there were two identical snort processes running which is the first point I cannot explain. And the CPU main usage changed between the two snort processes and the process "currentipsecpinghosts" and "netstat". I took a picture of that: http://fs5.directupload.net/images/160517/zyfl9k8q.jpg

As I said sometimes the both snort processes mainly used the cpu and then it changed to the other mentioned processes and back again, but CPU was always 100%.

bmeeks

Snort can sometimes get double-started by the firewall system processes during boot up or when the WAN IP address changes. I put some code into the /usr/local/etc/rc.d/snort.sh shell script to try and prevent that, but it still sometimes happens. On an IP change (can be during boot as the interface comes up then gets say a DHCP or PPPoE address), the system issues a few "restart all packages" commands in quick succession. You can see this if you examine the system log. It takes Snort quite a while to startup, and those multiple "restart all packages" commands sometimes result in duplicate Snort processes getting started.

This does not always happen, and for some users it almost never happens, but others see it frequently. On my personal firewall, it happens very rarely if my cable connection bounces several times in quick succession and my WAN IP gets toggled between the modem's private address space and the actual public IP assigned by my cable provider.

Bill

epionier

@bmeeks

Thank you for your explanation. Some days have passed since then and I have the following experiences:

1. I generally run into the 100% CPU problem when pfSense is shutdown and restarted after a couple of minutes. I changed my sort of backup (pfSense is a VM) that it backups from a snapshot and does not power off in advance and restarts pfSense after backup. But this just a workaround and not a permanent solution in my opinion. I strongly believe the CPU problem is mainly due to SNORT (see 3.)

2. The (2nd) problem still remains that SNORT is not properly re-activated when WAN IP changes. E.g. I just looked in the system logs and the WAN IP changed 3 hours ago because of the 24h provider disconnection. Since then there is no SNORT alert (which usually are in an interval of approx. 5 min) in the system logs. The script /rc.newwanip did not restart snort at all, there is no entry for snort in the system logs but the ("old") snort process is still running according to TOP.

3. After snort did not re-activate I manually restarted snort via Services->Snort->Reload. And the CPU went to 100% again. I noticed that a second snort service started and CPU is almost 100% for this process. The second snort process is 0% CPU. But after minute it changed and two processes "sh" took almost 80% of the CPU and the remaining 20% by snort. Some minutes later one "sh" process is 0% but a "cat /tmp/tmpHOSTS" process is using 70% (20% snort, 10% remaining sh process) and a further couple of minutes later the second sh process is 0% too and a "sleep 55" process is using 70%CPU (20% snort, 10% cat). And then CPU usage changes between cat, sleep and snort and so on. Sometimes there are a couple of processes "/usr/local/bin/php -f /usr/local/pkg/snort/snort_check_cron_misc.inc" active, too (in total >10%). The CPU does not lower but when I kill the snort process with its PID (kill -9 PID) the CPU goes down to 0%.

Maybe this information helps with improving the scripts.

Perforado

I suspect the cronjobs aren't checking wheter they're already running maybe?

Had a similar problem with pfblockerng cronjobs.
Spawning every hour for updates and having selected 1M hosts in the "Filter with Alexa"-subconfig just wasn't something an Atom D525 could handle in 60 minutes
or less.
So an hour later another one spawned and so on …

My first firewall ever with 2,5Gb swapped :o

epionier

I don`t belive so because the process connected with the WAN IP reconnect is a script that is loaded when IP changes. Also when the VM starts. I believe the problem is in the script or in the (startup/reload) script the script invokes itself (maybe in conjunction with the snort process).

I listed here: https://forum.pfsense.org/index.php?topic=111883.msg623353#msg623353 my cronjobs that are directly related with snort and snort2c. Perhaps you could compare to yours if they are identical because they were set automatically.

phil123456

Hello,

just installed snort today on pfsense 2.3

cpu is up to 100%

cannot get the shell in the terminal window

is there any fix yet ??

phil123456

ok I added a core and put 2gb instead of 512mb of ram, and now it seem to work fine

jee snort is such a resource hog

bmeeks

@phil123456:

ok I added a core and put 2gb instead of 512mb of ram, and now it seem to work fine

jee snort is such a resource hog

Yes, all IDS/IPS systems are resource hogs because of what they have to do. If you start to run a full Snort or Suricata rule set, you may find even 2 GB of RAM can get a bit tight. 4 GB is a good RAM number for either Snort or Suricata in my view. I suggest at least 2 cores for CPU, and 4 is even better.

Bill