100% CPU problem with pfSense 2.3
-
Hello,
I am using pfSense 2.3 in a ESXI VM configured with a single core and enough RAM (4GB). Everything is working fine but from time to time I suddenly have a 100% CPU without end. I can see that in the ESXI manager that the CPU for the pfSense VM is 100% during hours (at night e.g.) and does not lower down anymore.
When the CPU is 100% I cannot access the Webconfigurator anymore and when I SSH to the machine it displays only the first the welcome line but nothing more (like the selection). Also on the console. But IPSec for example or squid is still working, I am just unable to access pfSense for configuration, etc. The only way to solve that is rebooting the machine which takes a while. Afterwards sometimes the CPU hits 100% immediately again and sometimes not.
Sometimes when I was quick after the reboot I restarted the webconfigurator and PHP on the console/ssh and then it seems to be stable (for some time).
The system logs in the webconfigurator shows no errors or something like that. Also not on the Dashboard.
How do I find out whats causing this? Because when I have 100% CPU I cannot SSH to the machine to check "top -aSH" to find the faulty process.
Last night I assume the problem occurred after the PPPoE renewal because afterwards I did not show any snort entries anymore.
Here is the log for last night: (newest on top, some information XXXed):
May 15 12:15:34 shutdown reboot by root:
May 15 12:09:02 sshd 65760 Accepted keyboard-interactive/pam for admin from 10.21.32.1 port 53297 ssh2
May 15 12:07:58 php-fpm 17736 /index.php: Successful login for user 'admin' from: 10.21.32.1
May 15 03:31:35 kernel pppoe0: promiscuous mode enabled
May 15 03:30:49 xinetd 11808 Reconfigured: new=0 old=1 dropped=0 (services)
May 15 03:30:49 xinetd 11808 readjusting service 6969-udp
May 15 03:30:49 xinetd 11808 Swapping defaults
May 15 03:30:49 xinetd 11808 Starting reconfiguration
May 15 03:30:48 SnortStartup 34071 Snort START for WAN(61399_pppoe0)…
May 15 03:30:48 php-fpm 65608 /rc.start_packages: [lightsquid] Updating cronjobs…
May 15 03:30:48 check_reload_status Syncing firewall
May 15 03:30:47 check_reload_status Syncing firewall
May 15 03:30:47 php-fpm 65608 /rc.start_packages: [lightsquid] Removing old cronjobs…
May 15 03:30:47 php-fpm 65608 /rc.start_packages: [lightsquid] Successfully created '/usr/local/etc/lightsquid/lightsquid.cfg' configuration file.
May 15 03:30:47 php-fpm 65608 /rc.start_packages: [lightsquid] Loaded default '/usr/local/etc/lightsquid/lightsquid.cfg.sample' configuration file.
May 15 03:30:47 check_reload_status Reloading filter
May 15 03:30:46 php-fpm 65608 /rc.start_packages: [squid] Starting a proxy monitor script
May 15 03:30:46 php-fpm 65608 /rc.start_packages: [squid] Reloading for configuration sync…
May 15 03:30:45 php-fpm 65608 /rc.start_packages: [squid] Stopping any running proxy monitors
May 15 03:30:45 php-fpm 65608 /rc.start_packages: [squid] Reloading C-ICAP…
May 15 03:30:45 php-fpm 65608 /rc.start_packages: [squid] Reloading ClamAV…
May 15 03:30:45 php-fpm 65608 /rc.start_packages: Checked cron job for /usr/local/bin/freshclam --config-file=/usr/local/etc/freshclam.conf, no change needed
May 15 03:30:45 php-fpm 65608 /rc.start_packages: [squid] Adding freshclam cronjob.
May 15 03:30:45 php-fpm 65608 /rc.start_packages: Checked cron job for /usr/local/pkg/swapstate_check.php, no change needed
May 15 03:30:45 php-fpm 65608 /rc.start_packages: Checked cron job for /usr/local/sbin/squid -k rotate -f /usr/local/etc/squid/squid.conf, no change needed
May 15 03:30:45 php-fpm 65608 /rc.start_packages: [squid] Adding cronjobs …
May 15 03:30:33 xinetd 11808 Reconfigured: new=0 old=1 dropped=0 (services)
May 15 03:30:33 xinetd 11808 readjusting service 6969-udp
May 15 03:30:33 xinetd 11808 Swapping defaults
May 15 03:30:33 xinetd 11808 Starting reconfiguration
May 15 03:30:32 php-fpm 65608 /rc.start_packages: [squid] - squid_resync function call pr:1 bp: rpc:no
May 15 03:30:32 check_reload_status Reloading filter
May 15 03:30:31 php-fpm 65608 /rc.start_packages: [squid] Starting a proxy monitor script
May 15 03:30:31 php-fpm 65608 /rc.start_packages: [squid] Reloading for configuration sync…
May 15 03:30:29 php-fpm 65608 /rc.start_packages: [squid] Stopping any running proxy monitors
May 15 03:30:28 php-fpm 65608 /rc.start_packages: [squid] Reloading C-ICAP…
May 15 03:30:28 php-fpm 65608 /rc.start_packages: [squid] Reloading ClamAV…
May 15 03:30:28 php-fpm 65608 /rc.start_packages: Checked cron job for /usr/local/bin/freshclam --config-file=/usr/local/etc/freshclam.conf, no change needed
May 15 03:30:28 php-fpm 65608 /rc.start_packages: [squid] Adding freshclam cronjob.
May 15 03:30:28 php-fpm 65608 /rc.start_packages: Checked cron job for /usr/local/pkg/swapstate_check.php, no change needed
May 15 03:30:28 php-fpm 65608 /rc.start_packages: Checked cron job for /usr/local/sbin/squid -k rotate -f /usr/local/etc/squid/squid.conf, no change needed
May 15 03:30:28 php-fpm 65608 /rc.start_packages: [squid] Adding cronjobs …
May 15 03:30:16 php-fpm 65608 /rc.start_packages: [squid] - squid_resync function call pr:1 bp: rpc:no
May 15 03:30:16 php-fpm 97269 /rc.start_packages: Skipping STARTing packages process because previous/another instance is already running
May 15 03:30:16 php-fpm 65608 /rc.start_packages: Restarting/Starting all packages.
May 15 03:30:15 check_reload_status Starting packages
May 15 03:30:15 php-fpm 60012 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 217.248.XXX.XXX -> 93.208.XXX.XXX - Restarting packages.
May 15 03:30:15 xinetd 11808 Reconfigured: new=0 old=1 dropped=0 (services)
May 15 03:30:15 xinetd 11808 readjusting service 6969-udp
May 15 03:30:15 xinetd 11808 Swapping defaults
May 15 03:30:15 xinetd 11808 Starting reconfiguration
May 15 03:30:15 check_reload_status Starting packages
May 15 03:30:15 php-fpm 65608 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 10.21.31.1 - Restarting packages.
May 15 03:30:15 check_reload_status Reloading filter
May 15 03:30:15 php-fpm 65608 /rc.newwanip: rc.newwanip: on (IP address: 10.21.31.1) (interface: []) (real interface: ovpns1).
May 15 03:30:15 php-fpm 65608 /rc.newwanip: rc.newwanip: Info: starting on ovpns1.
May 15 03:30:14 xinetd 11808 Reconfigured: new=0 old=1 dropped=0 (services)
May 15 03:30:14 xinetd 11808 readjusting service 6969-udp
May 15 03:30:14 xinetd 11808 Swapping defaults
May 15 03:30:14 xinetd 11808 Starting reconfiguration
May 15 03:30:13 check_reload_status rc.newwanip starting ovpns1
May 15 03:30:12 kernel ovpns1: link state changed to UP
May 15 03:30:12 php-fpm 60012 /rc.newwanip: Creating rrd update script
May 15 03:30:12 check_reload_status Reloading filter
May 15 03:30:12 kernel ovpns1: link state changed to DOWN
May 15 03:30:12 php-fpm 60012 /rc.newwanip: Resyncing OpenVPN instances for interface WAN.
May 15 03:30:12 check_reload_status Reloading filter
May 15 03:30:12 php-fpm 60012 /rc.newwanip: Forcefully reloading IPsec
May 15 03:30:11 php-fpm 60012 /rc.newwanip: phpDynDNS (): (Success) IP Address Updated Successfully!
May 15 03:30:11 php-fpm 60012 /rc.newwanip: phpDynDNS: updating cache file /conf/dyndns_wancustom''0.cache: 93.208.XXX.XXX
May 15 03:30:11 php-fpm 60012 /rc.newwanip: Message sent to XXX@XXX.de OK
May 15 03:30:07 php-fpm 60012 /rc.newwanip: Removing static route for monitor 217.237.XXX.XXX and adding a new route through 87.186.XXX.XXX
May 15 03:30:07 php-fpm 60012 /rc.newwanip: ROUTING: setting default route to 87.186.XXX.XXX
May 15 03:30:07 xinetd 11808 Reconfigured: new=0 old=1 dropped=0 (services)
May 15 03:30:07 xinetd 11808 readjusting service 6969-udp
May 15 03:30:07 xinetd 11808 Swapping defaults
May 15 03:30:07 xinetd 11808 Starting reconfiguration
May 15 03:30:06 php-fpm 60012 /rc.newwanip: IP has changed, killing states on former IP 217.248.XXX.XXX.
May 15 03:30:06 php-fpm 60012 /rc.newwanip: rc.newwanip: on (IP address: 93.208.XXX.XXX) (interface: WAN[wan]) (real interface: pppoe0).
May 15 03:30:06 php-fpm 60012 /rc.newwanip: rc.newwanip: Info: starting on pppoe0.
May 15 03:30:05 ppp [wan] IFACE: Rename interface ng0 to pppoe0
May 15 03:30:05 ppp [wan] IFACE: Up event
May 15 03:30:05 check_reload_status rc.newwanip starting pppoe0
May 15 03:30:04 check_reload_status Rewriting resolv.conf
May 15 03:30:04 ppp [wan] 93.208.XXX.XXX -> 87.186.XXX.XXX
May 15 03:30:04 ppp [wan] IPCP: LayerUp
May 15 03:30:04 ppp [wan] IPCP: state change Ack-Sent –> Opened
May 15 03:30:04 ppp [wan] SECDNS 217.237.151.142
May 15 03:30:04 ppp [wan] PRIDNS 217.237.150.188
May 15 03:30:04 ppp [wan] IPADDR 93.208.XXX.XXX
May 15 03:30:04 ppp [wan] IPCP: rec'd Configure Ack #3 (Ack-Sent)
May 15 03:30:04 ppp [wan] SECDNS 217.237.151.142
May 15 03:30:04 ppp [wan] PRIDNS 217.237.150.188
May 15 03:30:04 ppp [wan] IPADDR 93.208.XXX.XXX
May 15 03:30:04 ppp [wan] IPCP: SendConfigReq #3
May 15 03:30:04 ppp [wan] SECDNS 217.237.151.142
May 15 03:30:04 ppp [wan] PRIDNS 217.237.150.188
May 15 03:30:04 ppp [wan] 93.208.XXX.XXX is OK
May 15 03:30:04 ppp [wan] IPADDR 93.208.XXX.XXX
May 15 03:30:04 ppp [wan] IPCP: rec'd Configure Nak #2 (Ack-Sent)
May 15 03:30:04 ppp [wan] IPV6CP: LayerFinish
May 15 03:30:04 ppp [wan] IPV6CP: state change Req-Sent –> Stopped
May 15 03:30:04 ppp [wan] IPV6CP: protocol was rejected by peer
May 15 03:30:04 ppp [wan_link0] LCP: protocol IPV6CP was rejected
May 15 03:30:04 ppp [wan_link0] LCP: rec'd Protocol Reject #74 (Opened)
May 15 03:30:04 ppp [wan] SECDNS 0.0.0.0
May 15 03:30:04 ppp [wan] PRIDNS 0.0.0.0
May 15 03:30:04 ppp [wan] IPADDR 0.0.0.0
May 15 03:30:04 ppp [wan] IPCP: SendConfigReq #2
May 15 03:30:04 ppp [wan] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid
May 15 03:30:04 ppp [wan] IPCP: rec'd Configure Reject #1 (Ack-Sent)
May 15 03:30:04 ppp [wan] IPCP: state change Req-Sent –> Ack-Sent
May 15 03:30:04 ppp [wan] IPADDR 87.186.XXX.XXX
May 15 03:30:04 ppp [wan] IPCP: SendConfigAck #28
May 15 03:30:04 ppp [wan] 87.186.XXX.XXX is OK
May 15 03:30:04 ppp [wan] IPADDR 87.186.XXX.XXX
May 15 03:30:04 ppp [wan] IPCP: rec'd Configure Request #28 (Req-Sent)
May 15 03:30:04 ppp [wan] IPV6CP: SendConfigReq #1
May 15 03:30:04 ppp [wan] IPV6CP: state change Starting –> Req-Sent
May 15 03:30:04 ppp [wan] IPV6CP: Up event
May 15 03:30:04 ppp [wan] SECDNS 0.0.0.0
May 15 03:30:04 ppp [wan] PRIDNS 0.0.0.0
May 15 03:30:04 ppp [wan] COMPPROTO VJCOMP, 16 comp. channels, no comp-cid
May 15 03:30:04 ppp [wan] IPADDR 0.0.0.0
May 15 03:30:04 ppp [wan] IPCP: SendConfigReq #1
May 15 03:30:04 ppp [wan] IPCP: state change Starting –> Req-Sent
May 15 03:30:04 ppp [wan] IPCP: Up event
May 15 03:30:04 ppp [wan] IPV6CP: LayerStart
May 15 03:30:04 ppp [wan] IPV6CP: state change Initial –> Starting
May 15 03:30:04 ppp [wan] IPV6CP: Open event
May 15 03:30:04 ppp [wan] IPCP: LayerStart
May 15 03:30:04 ppp [wan] IPCP: state change Initial –> Starting
May 15 03:30:04 ppp [wan] IPCP: Open event
May 15 03:30:04 ppp [wan] Bundle: Status update: up 1 link, total bandwidth 64000 bps
May 15 03:30:04 ppp [wan_link0] Link: Join bundle "wan"
May 15 03:30:04 ppp [wan_link0] Link: Matched action 'bundle "wan" ""'
May 15 03:30:04 ppp [wan_link0] LCP: authorization successful
May 15 03:30:04 ppp [wan_link0] PAP: rec'd ACK #1 len: 5
May 15 03:30:04 ppp [wan_link0] LCP: LayerUp
May 15 03:30:04 ppp [wan_link0] PAP: sending REQUEST #1 len: 60
May 15 03:30:04 ppp [wan_link0] PAP: using authname "XXXXXXXXX@t-online-com.de"
May 15 03:30:04 ppp [wan_link0] LCP: auth: peer wants PAP, I want nothing
May 15 03:30:04 ppp [wan_link0] LCP: state change Ack-Sent –> Opened
May 15 03:30:04 ppp [wan_link0] MAGICNUM 0xeb21b71a
May 15 03:30:04 ppp [wan_link0] MRU 1492
May 15 03:30:04 ppp [wan_link0] LCP: rec'd Configure Ack #2 (Ack-Sent)
May 15 03:30:04 ppp [wan_link0] MAGICNUM 0xeb21b71a
May 15 03:30:04 ppp [wan_link0] MRU 1492
May 15 03:30:04 ppp [wan_link0] LCP: SendConfigReq #2
May 15 03:30:04 ppp [wan_link0] PROTOCOMP
May 15 03:30:04 ppp [wan_link0] LCP: rec'd Configure Reject #1 (Ack-Sent)
May 15 03:30:04 ppp [wan_link0] LCP: state change Req-Sent –> Ack-Sent
May 15 03:30:04 ppp [wan_link0] MAGICNUM 0x4b0a8b6e
May 15 03:30:04 ppp [wan_link0] AUTHPROTO PAP
May 15 03:30:04 ppp [wan_link0] MRU 1492
May 15 03:30:04 ppp [wan_link0] LCP: SendConfigAck #73
May 15 03:30:04 ppp [wan_link0] MAGICNUM 0x4b0a8b6e
May 15 03:30:04 ppp [wan_link0] AUTHPROTO PAP
May 15 03:30:04 ppp [wan_link0] MRU 1492
May 15 03:30:04 ppp [wan_link0] LCP: rec'd Configure Request #73 (Req-Sent)
May 15 03:30:04 ppp [wan_link0] MAGICNUM 0xeb21b71a
May 15 03:30:04 ppp [wan_link0] MRU 1492
May 15 03:30:04 ppp [wan_link0] PROTOCOMP
May 15 03:30:04 ppp [wan_link0] LCP: SendConfigReq #1
May 15 03:30:04 ppp [wan_link0] LCP: state change Starting –> Req-Sent
May 15 03:30:04 ppp [wan_link0] LCP: Up event
May 15 03:30:04 ppp [wan_link0] Link: UP event
May 15 03:30:04 ppp [wan_link0] PPPoE: connection successful
May 15 03:30:04 ppp PPPoE: rec'd ACNAME "XXXR73-se800-B222E170705301"
May 15 03:30:04 ppp [wan_link0] PPPoE: Connecting to ''
May 15 03:30:04 ppp [wan_link0] LCP: LayerStart
May 15 03:30:04 ppp [wan_link0] LCP: state change Initial –> Starting
May 15 03:30:04 ppp [wan_link0] LCP: Open event
May 15 03:30:04 kernel ng0: changing name to 'pppoe0'
May 15 03:30:04 ppp [wan_link0] Link: OPEN event
May 15 03:30:04 ppp [wan] Bundle: Interface ng0 created
May 15 03:30:04 ppp web: web is not running
May 15 03:30:03 ppp process 19511 terminated
May 15 03:30:03 ppp [wan_link0] Link: Shutdown
May 15 03:30:03 kernel pppoe0: promiscuous mode disabled
May 15 03:30:03 ppp [wan] Bundle: Shutdown
May 15 03:30:03 ppp [wan_link0] LCP: LayerFinish
May 15 03:30:03 ppp [wan_link0] LCP: state change Stopping –> Stopped
May 15 03:30:03 ppp waiting for process 19511 to die...
May 15 03:30:02 ppp waiting for process 19511 to die...
May 15 03:30:01 ppp [wan_link0] LCP: LayerDown
May 15 03:30:01 ppp [wan_link0] LCP: SendTerminateAck #3
May 15 03:30:01 ppp [wan] IPV6CP: state change Closed –> Initial
May 15 03:30:01 ppp [wan] IPV6CP: Down event
May 15 03:30:01 ppp [wan] IPCP: state change Closing –> Initial
May 15 03:30:01 ppp [wan] Bundle: No NCPs left. Closing links…
May 15 03:30:01 ppp [wan] IPCP: LayerFinish
May 15 03:30:01 ppp [wan] IPCP: Down event
May 15 03:30:01 ppp [wan] IPV6CP: Close event
May 15 03:30:01 ppp [wan] IPCP: Close event
May 15 03:30:01 ppp [wan] Bundle: Status update: up 0 links, total bandwidth 9600 bps
May 15 03:30:01 ppp [wan_link0] Link: Leave bundle "wan"
May 15 03:30:01 ppp [wan_link0] LCP: state change Opened –> Stopping
May 15 03:30:01 ppp [wan_link0] LCP: rec'd Terminate Request #180 (Opened)
May 15 03:30:01 ppp [wan] IPV6CP: state change Stopped –> Closed
May 15 03:30:01 ppp [wan] IPV6CP: Close event
May 15 03:30:01 ppp [wan] IFACE: Rename interface pppoe0 to pppoe0
May 15 03:30:01 ppp [wan] IFACE: Down event
May 15 03:30:01 ppp [wan] IFACE: Removing IPv4 address from pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): Can't assign requested address
May 15 03:30:01 check_reload_status Rewriting resolv.conf
May 15 03:30:01 ppp [wan] IPCP: LayerDown
May 15 03:30:01 ppp [wan] IPCP: SendTerminateReq #4
May 15 03:30:01 ppp [wan] IPCP: state change Opened –> Closing
May 15 03:30:01 ppp [wan] IPCP: Close event
May 15 03:30:01 ppp [wan] IFACE: Close event
May 15 03:30:01 ppp caught fatal signal TERM
May 15 03:30:01 ppp waiting for process 19511 to die…
May 15 03:30:01 ppp process 57918 started, version 5.8 (root@pfSense_v2_3_0_amd64-pfSense_v2_3_0-job-14 22:52 6-Apr-2016)
May 15 03:30:01 ppp Multi-link PPP daemon for FreeBSD
May 15 03:30:00 check_reload_status Configuring interface wan
May 15 03:30:00 pppoe0 PPPoE periodic reset executed on wan
May 15 03:24:32 snort 16007 [1:2008578:6] ET SCAN Sipvicious Scan [Classification: Attempted Information Leak] [Priority: 2] {UDP} 62.138.0.118:5277 -> 217.248.XXX.XXX:5060
May 15 03:24:32 snort 16007 [1:2011716:4] ET SCAN Sipvicious User-Agent Detected (friendly-scanner) [Classification: Attempted Information Leak] [Priority: 2] {UDP} 62.138.0.118:5277 -> 217.248.XXX.XXX:5060
May 15 03:19:21 snort 16007 [1:2008578:6] ET SCAN Sipvicious Scan [Classification: Attempted Information Leak] [Priority: 2] {UDP} 85.114.135.163:5084 -> 217.248.XXX.XXX:5060
May 15 03:19:21 snort 16007 [1:2011716:4] ET SCAN Sipvicious User-Agent Detected (friendly-scanner) [Classification: Attempted Information Leak] [Priority: 2] {UDP} 85.114.135.163:5084 -> 217.248.XXX.XXX:5060 -
What packages are you running? I see your running squid remember on the local cache area dont give more ram then what you have.
-
Thank you for your statement. I am essentially running squid, lightsquid and snort.
My local hdd cache is 6GB, files stored in RAM 256MB and system memory is 4GB and usually only ~30-40% are in use by the system. So this should not be the problem and I am aware of the circumstance that files stored in ram by squid for caching is not the only ram squid uses.
Are there more detailed system logs that I can access via SSH?
-
on ssh can you run
top -aSH
-
Which process is responsible for the CPU usage? You can run top at the VM's console if SSH isn't working at the time. Need to know that before being able to suggest anything further.
-
The problem is neither the console nor the SSH is working properly in this case so I am unable to run “top -aSH“. The console still displays new messages like a login (although the ssh login stops at the welcome line) but does not allow any selection (shell access e.g.).
That's the point I don't know how to solve the problem ;) -
Maybe add a second core to your VM and it won't be completely stuck. Though that might not help, it's worth a shot at least.
Or just leave 'top -SH' running in the VM console, then it'll be there to review when it dies. Even if it's so hung up that it can't update top, you'll at least have the last update it got which will likely show the culprit.
-
That is the conclusion I came up with, too. Thanks. I will try that and post updates when the CPU problem will occur again.
One more question by the way. I also disabled planned PPPoE reconnection for testing purposes because I believe that restarting all services will lead to the CPU problem. I had a reconnect (due to the provider) about on hour ago on 13.30 pm.
I noticed in the system logs that SNORT did not reload automatically for the "new" PPPoE connection.
I checked the (automated) cron entries regarding snort:
*/5 * * * * root /usr/bin/nice -n20 /usr/local/bin/php -f /usr/local/pkg/snort/snort_check_cron_misc.inc
0 6,18 * * * root /usr/bin/nice -n20 /usr/local/bin/php -f /usr/local/pkg/snort/snort_check_for_rule_updates.php
2 */2 * * * root /usr/bin/nice -n20 /sbin/pfctl -q -t snort2c -T expire 86400In the first system log I posted above SNORT was renewed for the new PPPoE connection at the end of the process (although it didn`t seem to work because there were no new entries in the system log regarding snort afterwards):
May 15 03:30:48 SnortStartup 34071 Snort START for WAN(61399_pppoe0)…
Is this a bug (of rc.newwanip or rc.start_packages) or a misconfiguration in my case because I do not find a appropriate setting under snort services?
The snort service is still running according to Status->Services and according to Services->Snort but I think only for the "old" PPPoE connection that is not valid anymore.
-
I think I could locate the problem in connection with snort. As I stated above snort was not working after PPPoE refresh although the service seemed to be running. I waited for a couple of hours but snort did not re-activate for the "new" PPPoE connection. That is why I restarted the service manually. Having done this the CPU went to 100% and the problem with nonfunctional SSH/Console was back. Unfortunately I did not ran "top -aSH" in advance to verify this which I should have done.
-
I think I could locate the problem in connection with snort. As I stated above snort was not working after PPPoE refresh although the service seemed to be running. I waited for a couple of hours but snort did not re-activate for the "new" PPPoE connection. That is why I restarted the service manually. Having done this the CPU went to 100% and the problem with nonfunctional SSH/Console was back. Unfortunately I did not ran "top -aSH" in advance to verify this which I should have done.
How large is the system disk partition where /tmp and /var/log are located? Is one or more of these on a RAM disk? If so, Snort will give you fits when any of its "disk space" is on a RAM disk because they almost can never be large enough. Doing so then limits the working memory Snort has for loading rules. Snort needs lot of free disk space in /tmp and /var/log. Lots as in at least 256 MB free in /tmp and 1 GB or more free in /var/log (preferably a lot more to hold the Snort log files).
You gave some information in one of your earlier posts, but I'm not clear if that is for all packages or just Squid.
Bill
-
The SSD disk partition is 15GB with 18% in use. RAM disks are not in use neither for /tmp nor /var.
Here are actual stats of my pfsense having a good life ;) :MBUF Usage 4% 1016/26584
Load average 0.12, 0.07, 0.07
CPU usage 6%
Memory usage 23% of 3037 MiB
SWAP usage 0% of 4095 MiB
Disk usage ( / ) 18% of 15GiB - ufs
Disk usage ( /var/run ) 4% of 3.4MiB - ufs in RAMI reduced the RAM to 3GB to try something but as I found out RAM is not the issue. The issue is mostly connected when pfSense reboots / WAN Ip changes or when I re-activate snort services (because of not being "really" active).
A backup of the pfSense VM is done every workday, like last night. The VM is powered off in advance (Open-VM-Tools installed) und is being restarted after the Backup. When I checked it last night after the backup I noticed that the CPU hits again 100%. So I rebooted via Vmware ESXI Manager (connected via IPSec) and quickly started "top -aSH" in the shell and the CPU went striaght to 100% again.
I noticed that there were two identical snort processes running which is the first point I cannot explain. And the CPU main usage changed between the two snort processes and the process "currentipsecpinghosts" and "netstat". I took a picture of that: http://fs5.directupload.net/images/160517/zyfl9k8q.jpg
As I said sometimes the both snort processes mainly used the cpu and then it changed to the other mentioned processes and back again, but CPU was always 100%.
-
Snort can sometimes get double-started by the firewall system processes during boot up or when the WAN IP address changes. I put some code into the /usr/local/etc/rc.d/snort.sh shell script to try and prevent that, but it still sometimes happens. On an IP change (can be during boot as the interface comes up then gets say a DHCP or PPPoE address), the system issues a few "restart all packages" commands in quick succession. You can see this if you examine the system log. It takes Snort quite a while to startup, and those multiple "restart all packages" commands sometimes result in duplicate Snort processes getting started.
This does not always happen, and for some users it almost never happens, but others see it frequently. On my personal firewall, it happens very rarely if my cable connection bounces several times in quick succession and my WAN IP gets toggled between the modem's private address space and the actual public IP assigned by my cable provider.
Bill
-
Thank you for your explanation. Some days have passed since then and I have the following experiences:
1. I generally run into the 100% CPU problem when pfSense is shutdown and restarted after a couple of minutes. I changed my sort of backup (pfSense is a VM) that it backups from a snapshot and does not power off in advance and restarts pfSense after backup. But this just a workaround and not a permanent solution in my opinion. I strongly believe the CPU problem is mainly due to SNORT (see 3.)
2. The (2nd) problem still remains that SNORT is not properly re-activated when WAN IP changes. E.g. I just looked in the system logs and the WAN IP changed 3 hours ago because of the 24h provider disconnection. Since then there is no SNORT alert (which usually are in an interval of approx. 5 min) in the system logs. The script /rc.newwanip did not restart snort at all, there is no entry for snort in the system logs but the ("old") snort process is still running according to TOP.
3. After snort did not re-activate I manually restarted snort via Services->Snort->Reload. And the CPU went to 100% again. I noticed that a second snort service started and CPU is almost 100% for this process. The second snort process is 0% CPU. But after minute it changed and two processes "sh" took almost 80% of the CPU and the remaining 20% by snort. Some minutes later one "sh" process is 0% but a "cat /tmp/tmpHOSTS" process is using 70% (20% snort, 10% remaining sh process) and a further couple of minutes later the second sh process is 0% too and a "sleep 55" process is using 70%CPU (20% snort, 10% cat). And then CPU usage changes between cat, sleep and snort and so on. Sometimes there are a couple of processes "/usr/local/bin/php -f /usr/local/pkg/snort/snort_check_cron_misc.inc" active, too (in total >10%). The CPU does not lower but when I kill the snort process with its PID (kill -9 PID) the CPU goes down to 0%.
Maybe this information helps with improving the scripts.
-
I suspect the cronjobs aren't checking wheter they're already running maybe?
Had a similar problem with pfblockerng cronjobs.
Spawning every hour for updates and having selected 1M hosts in the "Filter with Alexa"-subconfig just wasn't something an Atom D525 could handle in 60 minutes
or less.
So an hour later another one spawned and so on …My first firewall ever with 2,5Gb swapped :o
-
I don`t belive so because the process connected with the WAN IP reconnect is a script that is loaded when IP changes. Also when the VM starts. I believe the problem is in the script or in the (startup/reload) script the script invokes itself (maybe in conjunction with the snort process).
I listed here: https://forum.pfsense.org/index.php?topic=111883.msg623353#msg623353 my cronjobs that are directly related with snort and snort2c. Perhaps you could compare to yours if they are identical because they were set automatically.
-
Hello,
just installed snort today on pfsense 2.3
cpu is up to 100%
cannot get the shell in the terminal window
is there any fix yet ??
-
ok I added a core and put 2gb instead of 512mb of ram, and now it seem to work fine
jee snort is such a resource hog
-
ok I added a core and put 2gb instead of 512mb of ram, and now it seem to work fine
jee snort is such a resource hog
Yes, all IDS/IPS systems are resource hogs because of what they have to do. If you start to run a full Snort or Suricata rule set, you may find even 2 GB of RAM can get a bit tight. 4 GB is a good RAM number for either Snort or Suricata in my view. I suggest at least 2 cores for CPU, and 4 is even better.
Bill