Multiple php-cgi's using all CPU
-
I have an APU2C4 with Suricata kinda running out of control. Running top shows
Message from syslogd@GateKeeper at Nov 9 16:41:25 ... C TIME WCPU COMMAND
94106 root 1 49 20 5951M 1184M biowr 3 5:04 17.60% php-cgi
72899 root 1 48 20 5951M 995M biowr 1 12:53 17.32% php-cgi
13071 root 1 52 20 5952M 97160K biowr 2 0:05 15.49% php-cgi
16399 root 1 48 20 5952M 1277M biowr 3 3:28 15.23% php-cgi
37062 root 1 52 20 5952M 547M biowr 2 0:48 14.62% php-cgi
98370 root 1 52 20 5952M 1017M biowr 3 1:27 14.00% php-cgi
58938 root 1 49 20 5952M 974M bo_wwa 1 2:08 13.68% php-cgi
53435 root 1 52 20 5951M 897M biowr 2 7:21 13.12% php-cgi
13920 root 1 49 20 5951M 995M bo_wwa 3 10:18 12.89% php-cgiwhile top -aSH shows /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
37062 root 41 20 5952M 620M biowr 2 0:54 1.40% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
94106 root 41 20 5951M 1179M biowr 1 5:11 1.31% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
13920 root 41 20 5951M 947M biowr 2 10:25 1.30% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
13071 root 41 20 5952M 168M biowr 2 0:12 1.26% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
72899 root 40 20 5951M 947M biowr 0 13:00 1.25% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
53435 root 40 20 5951M 892M biowr 2 7:27 1.25% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
16399 root 41 20 5952M 1270M biowr 2 3:34 1.22% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
58938 root 40 20 5952M 1007M biowr 0 2:14 1.21% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
98370 root 41 20 5952M 1054M bo_wwa 1 1:33 1.19% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.incpfSense gives me a ton of these errors:
PHP ERROR: Type: 1, File: /usr/local/pkg/suricata/suricata_check_cron_misc.inc, Line: 165, Message: Maximum execution time of 900 seconds exceeded @ 2018-11-06 13:08:13Any idea on how to fix it? I suppose I could uninstall and reinstall the package but I'd like to know what's going on so I have a better understanding. I have many other units out using the same config on the same hardware without any issues. Thanks!
-
That process is associated with rotation and pruning of log files using the limits and retention periods specified on the LOG MGMT tab. Looks like that process is hanging up and never terminating, and so after the designated interval expires, cron fires up another copy. I think the cron interval is 5 minutes, so that means every 5 minutes it's going to launch another copy of the PHP file.
You first need to kill all of those processes manually. Then look in /var/log/suricata and all the sub-directories there to see what's going on. You can try manually removing and/or rotating some of the log files.
Removing and reinstalling the package is not a bad idea either, but I suggest deleting all the log files before installing again. I'm thinking something is amiss with one or more of those files and that's why the pruning PHP process is hanging up.
-
The only issue I found in the log files was a bunch of errors stating that about 30 of the IDs in my suppress file weren't valid. I uninstalled Suricata, updated pfSense to 2.4.4, and reinstalled. That issue isn't reoccurring but the WAN IP is getting listed in the Blocks list. If I go to the WAN interface I can see the IP in the View PASS LIST area so I'm not sure why. It's a static IP so it isn't changing.
-
@stewart said in Multiple php-cgi's using all CPU:
The only issue I found in the log files was a bunch of errors stating that about 30 of the IDs in my suppress file weren't valid. I uninstalled Suricata, updated pfSense to 2.4.4, and reinstalled. That issue isn't reoccurring but the WAN IP is getting listed in the Blocks list. If I go to the WAN interface I can see the IP in the View PASS LIST area so I'm not sure why. It's a static IP so it isn't changing.
Do you mean the WAN IP is in the actual BLOCKED IPs list, or did you mean to say it's listed on the ALERTS tab. It would be expected to see it on the ALERTS tab, but it would not actually be blocked if it is on the Pass List. Anything you see in the BLOCKED IPs list on the BLOCKED tab should acutally be getting blocked, so if your WAN IP is listed there I would expect all Internet access to be blocked. Is that what you are experiencing?
-
It was actually in the blocked list. Some oddities about that. We use LabTech to remotely monitor the computers and could see them all going offline as the traffic was blocked. We do have redirects and whitelists for our network IP range x.x.x.34-x.x.x.62 (remote to the pfSense router) which also includes the LabTech server they were connecting to and we were still able to get into the GUI and SSH so we were able to clear the block. Very bizarre.
-
@stewart said in Multiple php-cgi's using all CPU:
It was actually in the blocked list. Some oddities about that. We use LabTech to remotely monitor the computers and could see them all going offline as the traffic was blocked. We do have redirects and whitelists for our network IP range x.x.x.34-x.x.x.62 (remote to the pfSense router) which also includes the LabTech server they were connecting to and we were still able to get into the GUI and SSH so we were able to clear the block. Very bizarre.
So you have a completely separate method via a different WAN IP some place that let's you interface with the LAN behind the pfSense box? I'm not familiar with LabTech software itself, but a quick look at their web site and some YouTube videos indicates it's just another in the product family of remote management agents (not too much different from the old Microsoft Systems Management Server). So how do you remotely connect with the LabTech clients if not through the WAN interface on the pfSense box?
I'm trying to understand how you have remote access into the LAN environment if the pfSense WAN IP is actually being blocked.
-
First, a quick rundown:
LabTech, Kaseya, Solarwinds (to an extent) etc. are all RMM tools for MSPs to connect to and remotely manage clients. Inside of the software we can see all of the companies that we service and inside each company group we can see all of the PCs and other devices. Inside of the RMM software we can integrate patching, AV, backups, etc. In short, we have a single portal that we can see the health and status of a specific company or all companies as an aggregate. For example, we can go to the AV dashboard and see the AV status of all the machines across all of our clients for us to know who is getting updates, infections, etc. Or we can go to the patching page and see the patching state of all the machines for a particular client, or all as a whole. If there's any issue we can issue remote commands, connect to a virtual Command Prompt, modify services and processes, edit the registry, issue lockdowns, all without disturbing the user. (These also bypass the normal lockdowns imposed by infections.) If we need to take over the computer, we can immediately connect and do so.How it functions:
All of those PCs and servers connect back to a LabTech server to maintain communication. That LabTech server has it's own Public IP. In our case x.x.x.46. The network I sit in is x.x.x.34. Same subnet but different IP. The pfSense router is at a client's location. Behind it are all of their PCs. I have a firewall rule that allows our Public subnet to connect to the GUI of the pfSense router by going to the WAN IP over an off port that redirects to the internal IP. That way we can manage it without needing to utilize any other services and only if we connect from inside of our network.What I saw:
I noticed that all of their PCs went offline to our LabTech and ScreenConnect servers once Suricata was brought online. I was able to connect by going directly to their public IP in my browser on the off port. From there I was able to see the block list. None of my IPs were in there, but their Public IP was. I don't know if the way we redirect into the system on an off port into the LAN IP bypassed the Suricata check or what but the WAN IP was listed in Suricata and it was blocking traffic, just not my GUI connection. I was able to clear the block and so far it hasn't come back but it was very odd. I've never seen Suricata place an Interface IP on the block list and I'm not sure how I was able to still connect via the GUI while Suricata was blocking traffic like that. -
@stewart said in Multiple php-cgi's using all CPU:
What I saw:
I noticed that all of their PCs went offline to our LabTech and ScreenConnect servers once Suricata was brought online. I was able to connect by going directly to their public IP in my browser on the off port. From there I was able to see the block list. None of my IPs were in there, but their Public IP was. I don't know if the way we redirect into the system on an off port into the LAN IP bypassed the Suricata check or what but the WAN IP was listed in Suricata and it was blocking traffic, just not my GUI connection. I was able to clear the block and so far it hasn't come back but it was very odd. I've never seen Suricata place an Interface IP on the block list and I'm not sure how I was able to still connect via the GUI while Suricata was blocking traffic like that.Thanks for the explanation up above for how the LabTech setup works. I understand the concept, but I was not familiar with that particular software package.
I, too, am puzzled how you could connect to the clients remotely if the WAN IP was supposedly blocked. That would indicate it obviously was not blocked even though it was listed in the Block List. Very strange! It might have been some kind of race condition that led to that. The Suricata binary, upon startup, will query FreeBSD asking for all of the firewall interface IP addresses. It will automatically add those to an internal automatic pass list for firewall interfaces. The GUI package, though, works a bit differently but very similar in concept. When you click the button to display a Pass List, the GUI code builds the list by querying pfSense using system calls (to pfSense APIs). You would think the two should always yield the same results, but perhaps for some reason at startup it failed to work that way on that firewall. The fact it appears to have been a one-off event makes it difficult to troubleshoot. You can find the list of IP addresses auto-added to the internal interface Pass List by examining the suricata.log file for the WAN interface. You can access it via the LOGS VIEW tab. Choose the interface whose log you want to examine, and then choose which log file in the drop-down choices. See if Suricata did in fact add the correct WAN interface static IP to its internal list. This list is actually generated by a thread that spawns to continually watch the firewall interfaces for IP changes. When it senses a changed local interface IP, it updates that internal pass list accordingly. Those updates, if any occurred, will also be logged in the suricata.log file.
Post back here if it recurs.
-
No problem. Always helps to know all the aspects when figuring it out. I'll let you know if the WAN gets listed again. Over the past 2 years or so we now have 30 units in production. We review logs daily and log into each one at least once a quarter (usually monthly) to ensure everything is fine. So far I've found 6 (out of 15 so far) having the same issue of having /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc with high CPU. I'll try these fixes on them as well but it's very unusual. I've not had any issues and now I'm seeing it on many units. The affected ones are all 2.4.3 or 2.4.4. Don't know if that has any bearing. I'll update as I find more.
-
I've got another instance of Suricata blocking its own Public IP on the WAN interface at a different client. This one is a dynamic IP, not static. At 9:01am the System Logs show/rc.start_packages restarting everything. I don't know if that is normal. At roughly the same time, Suricata blocked the WAN IP under "ET DROP Dshield Block Listed Source group 1 - 11/21/2018-08:56:19". Traffic would not pass and I couldn't ping the WAN IP from my remote location. I was, however, still able to go to the GUI through our remote forward. This is the second time on 2 units I've faced this issue and the results have been identical. Is there something else I can/should check to figure this out? At clients with Static IPs I could add them to the pass list. Not so with Dynamic ones.
-
@stewart said in Multiple php-cgi's using all CPU:
I've got another instance of Suricata blocking its own Public IP on the WAN interface at a different client. This one is a dynamic IP, not static. At 9:01am the System Logs show/rc.start_packages restarting everything. I don't know if that is normal. At roughly the same time, Suricata blocked the WAN IP under "ET DROP Dshield Block Listed Source group 1 - 11/21/2018-08:56:19". Traffic would not pass and I couldn't ping the WAN IP from my remote location. I was, however, still able to go to the GUI through our remote forward. This is the second time on 2 units I've faced this issue and the results have been identical. Is there something else I can/should check to figure this out? At clients with Static IPs I could add them to the pass list. Not so with Dynamic ones.
Well, one of my first questions would be why is your public WAN IP on an Emerging Threats Dshield Block list of malicious IP addresses? That would give me pause about my choice of ISP.
However, that notwithstanding, I added code in the recent past that lets Suricata monitor all the firewall interfaces for IP address changes. It is supposed to automatically update its internal automatic interface pass list IP address (in your case, the WAN). Maybe there is a race condition where a packet gets through before the monitoring thread has updated the internal table. At any rate, look in the suricata.log file for the WAN interface (it can be found on the LOGS VIEW tab by selecting the suricata.log file in the drop-down). You should see some lines in there showing each firewall interface IP address that the monitoring thread added to its internal table. You should also see a line where it updated one or more of those addresses when your WAN dynamic IP changed. Now, if you have restarted Suricata (or if pfSense has done it automatically via the restart_all_packages script) since this blocking happened, then the data I'm looking for will have been wiped.
-
That log file goes back to 13/7/2017 so it might have what you need. It also has IPs and stuff in there (as well as some other things I guess I should look at). Do you want me to send you the file?
-
@stewart said in Multiple php-cgi's using all CPU:
That log file goes back to 13/7/2017 so it might have what you need. It also has IPs and stuff in there (as well as some other things I guess I should look at). Do you want me to send you the file?
No, I don't really need the file. Just look through it for entries around the time of the dynamic IP change on your WAN. I would expect to see a line in there saying something like "IP address change detected on WAN". The exact wording may be different, but it will be obvious when you see the line it's the one I'm talking about. Along with that, at the last startup of Suricata, you should also see entries in there saying "IP address xxxx added to internal pass list". Again, the exact wording will be different, but the lines I'm talking about will be obvious. These are all log entries made by that interface IP change monitoring thread I mentioned earlier. I'm just wanting to see if it was running and if it saw the WAN IP change. If it did see it, it should have created a log entry about it.
-
There were 28 of these exchanges from 8:36 to 8:56 for a total of 56 entries. At one point we saw it give us a private IP:
21/11/2018 -- 08:50:32 - <Info> -- alert-pf -> added address 108.189.X.X to automatic interface IP Pass List. 21/11/2018 -- 08:50:49 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:50:49 - <Info> -- alert-pf -> deleted address 108.189.X.X from automatic interface IP Pass List. 21/11/2018 -- 08:52:50 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:52:50 - <Info> -- alert-pf -> added address 192.168.100.20 to automatic interface IP Pass List. 21/11/2018 -- 08:52:54 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:52:54 - <Info> -- alert-pf -> deleted address 192.168.100.20 from automatic interface IP Pass List. 21/11/2018 -- 08:53:49 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:53:49 - <Info> -- alert-pf -> added address 108.189.X.X to automatic interface IP Pass List. 21/11/2018 -- 08:54:12 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:54:12 - <Info> -- alert-pf -> deleted address 108.189.X.X from automatic interface IP Pass List.
-
@stewart said in Multiple php-cgi's using all CPU:
There were 28 of these exchanges from 8:36 to 8:56 for a total of 56 entries. At one point we saw it give us a private IP:
21/11/2018 -- 08:50:32 - <Info> -- alert-pf -> added address 108.189.X.X to automatic interface IP Pass List. 21/11/2018 -- 08:50:49 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:50:49 - <Info> -- alert-pf -> deleted address 108.189.X.X from automatic interface IP Pass List. 21/11/2018 -- 08:52:50 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:52:50 - <Info> -- alert-pf -> added address 192.168.100.20 to automatic interface IP Pass List. 21/11/2018 -- 08:52:54 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:52:54 - <Info> -- alert-pf -> deleted address 192.168.100.20 from automatic interface IP Pass List. 21/11/2018 -- 08:53:49 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:53:49 - <Info> -- alert-pf -> added address 108.189.X.X to automatic interface IP Pass List. 21/11/2018 -- 08:54:12 - <Info> -- alert-pf -> Received notification of IP address change on interface igb1. 21/11/2018 -- 08:54:12 - <Info> -- alert-pf -> deleted address 108.189.X.X from automatic interface IP Pass List.
The key is seeing if your "blocked" WAN IP address ever showed up in any of those messages. Is that 108.189.x.x IP the WAN that got blocked? The log entry you posted up earlier indicated the block happened at 08:56:19, but the latest entry I see in the suricata.log data you posted in 08:54:12.
What I would expect is that as your WAN interface flapped up and down and received different IP addresses, the log entries you pulled from the suricata.log file should have followed the changes. What this thread is doing is looking for dynamic IP address changes and updating an internal table in the blocking plugin that holds the list of IP addresses to never block. It does this by subscribing to FreeBSD kernel routing messages. So if this thread works properly your current WAN IP should always be in that table and thus never blocked. I'm trying to see why it got blocked anyway. That's why I wanted to see the entries from suricata.log that match up with the 08:56:19 blocking event.