Multiple php-cgi's using all CPU



  • I have an APU2C4 with Suricata kinda running out of control. Running top shows

    Message from syslogd@GateKeeper at Nov 9 16:41:25 ... C TIME WCPU COMMAND
    94106 root 1 49 20 5951M 1184M biowr 3 5:04 17.60% php-cgi
    72899 root 1 48 20 5951M 995M biowr 1 12:53 17.32% php-cgi
    13071 root 1 52 20 5952M 97160K biowr 2 0:05 15.49% php-cgi
    16399 root 1 48 20 5952M 1277M biowr 3 3:28 15.23% php-cgi
    37062 root 1 52 20 5952M 547M biowr 2 0:48 14.62% php-cgi
    98370 root 1 52 20 5952M 1017M biowr 3 1:27 14.00% php-cgi
    58938 root 1 49 20 5952M 974M bo_wwa 1 2:08 13.68% php-cgi
    53435 root 1 52 20 5951M 897M biowr 2 7:21 13.12% php-cgi
    13920 root 1 49 20 5951M 995M bo_wwa 3 10:18 12.89% php-cgi

    while top -aSH shows /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc

    37062 root 41 20 5952M 620M biowr 2 0:54 1.40% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
    94106 root 41 20 5951M 1179M biowr 1 5:11 1.31% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
    13920 root 41 20 5951M 947M biowr 2 10:25 1.30% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
    13071 root 41 20 5952M 168M biowr 2 0:12 1.26% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
    72899 root 40 20 5951M 947M biowr 0 13:00 1.25% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
    53435 root 40 20 5951M 892M biowr 2 7:27 1.25% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
    16399 root 41 20 5952M 1270M biowr 2 3:34 1.22% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
    58938 root 40 20 5952M 1007M biowr 0 2:14 1.21% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc
    98370 root 41 20 5952M 1054M bo_wwa 1 1:33 1.19% /usr/local/bin/php-cgi -f /usr/local/pkg/suricata/suricata_check_cron_misc.inc

    pfSense gives me a ton of these errors:
    PHP ERROR: Type: 1, File: /usr/local/pkg/suricata/suricata_check_cron_misc.inc, Line: 165, Message: Maximum execution time of 900 seconds exceeded @ 2018-11-06 13:08:13

    Any idea on how to fix it? I suppose I could uninstall and reinstall the package but I'd like to know what's going on so I have a better understanding. I have many other units out using the same config on the same hardware without any issues. Thanks!



  • That process is associated with rotation and pruning of log files using the limits and retention periods specified on the LOG MGMT tab. Looks like that process is hanging up and never terminating, and so after the designated interval expires, cron fires up another copy. I think the cron interval is 5 minutes, so that means every 5 minutes it's going to launch another copy of the PHP file.

    You first need to kill all of those processes manually. Then look in /var/log/suricata and all the sub-directories there to see what's going on. You can try manually removing and/or rotating some of the log files.

    Removing and reinstalling the package is not a bad idea either, but I suggest deleting all the log files before installing again. I'm thinking something is amiss with one or more of those files and that's why the pruning PHP process is hanging up.



  • The only issue I found in the log files was a bunch of errors stating that about 30 of the IDs in my suppress file weren't valid. I uninstalled Suricata, updated pfSense to 2.4.4, and reinstalled. That issue isn't reoccurring but the WAN IP is getting listed in the Blocks list. If I go to the WAN interface I can see the IP in the View PASS LIST area so I'm not sure why. It's a static IP so it isn't changing.



  • @stewart said in Multiple php-cgi's using all CPU:

    The only issue I found in the log files was a bunch of errors stating that about 30 of the IDs in my suppress file weren't valid. I uninstalled Suricata, updated pfSense to 2.4.4, and reinstalled. That issue isn't reoccurring but the WAN IP is getting listed in the Blocks list. If I go to the WAN interface I can see the IP in the View PASS LIST area so I'm not sure why. It's a static IP so it isn't changing.

    Do you mean the WAN IP is in the actual BLOCKED IPs list, or did you mean to say it's listed on the ALERTS tab. It would be expected to see it on the ALERTS tab, but it would not actually be blocked if it is on the Pass List. Anything you see in the BLOCKED IPs list on the BLOCKED tab should acutally be getting blocked, so if your WAN IP is listed there I would expect all Internet access to be blocked. Is that what you are experiencing?



  • It was actually in the blocked list. Some oddities about that. We use LabTech to remotely monitor the computers and could see them all going offline as the traffic was blocked. We do have redirects and whitelists for our network IP range x.x.x.34-x.x.x.62 (remote to the pfSense router) which also includes the LabTech server they were connecting to and we were still able to get into the GUI and SSH so we were able to clear the block. Very bizarre.



  • @stewart said in Multiple php-cgi's using all CPU:

    It was actually in the blocked list. Some oddities about that. We use LabTech to remotely monitor the computers and could see them all going offline as the traffic was blocked. We do have redirects and whitelists for our network IP range x.x.x.34-x.x.x.62 (remote to the pfSense router) which also includes the LabTech server they were connecting to and we were still able to get into the GUI and SSH so we were able to clear the block. Very bizarre.

    So you have a completely separate method via a different WAN IP some place that let's you interface with the LAN behind the pfSense box? I'm not familiar with LabTech software itself, but a quick look at their web site and some YouTube videos indicates it's just another in the product family of remote management agents (not too much different from the old Microsoft Systems Management Server). So how do you remotely connect with the LabTech clients if not through the WAN interface on the pfSense box?

    I'm trying to understand how you have remote access into the LAN environment if the pfSense WAN IP is actually being blocked.



  • First, a quick rundown:
    LabTech, Kaseya, Solarwinds (to an extent) etc. are all RMM tools for MSPs to connect to and remotely manage clients. Inside of the software we can see all of the companies that we service and inside each company group we can see all of the PCs and other devices. Inside of the RMM software we can integrate patching, AV, backups, etc. In short, we have a single portal that we can see the health and status of a specific company or all companies as an aggregate. For example, we can go to the AV dashboard and see the AV status of all the machines across all of our clients for us to know who is getting updates, infections, etc. Or we can go to the patching page and see the patching state of all the machines for a particular client, or all as a whole. If there's any issue we can issue remote commands, connect to a virtual Command Prompt, modify services and processes, edit the registry, issue lockdowns, all without disturbing the user. (These also bypass the normal lockdowns imposed by infections.) If we need to take over the computer, we can immediately connect and do so.

    How it functions:
    All of those PCs and servers connect back to a LabTech server to maintain communication. That LabTech server has it's own Public IP. In our case x.x.x.46. The network I sit in is x.x.x.34. Same subnet but different IP. The pfSense router is at a client's location. Behind it are all of their PCs. I have a firewall rule that allows our Public subnet to connect to the GUI of the pfSense router by going to the WAN IP over an off port that redirects to the internal IP. That way we can manage it without needing to utilize any other services and only if we connect from inside of our network.

    What I saw:
    I noticed that all of their PCs went offline to our LabTech and ScreenConnect servers once Suricata was brought online. I was able to connect by going directly to their public IP in my browser on the off port. From there I was able to see the block list. None of my IPs were in there, but their Public IP was. I don't know if the way we redirect into the system on an off port into the LAN IP bypassed the Suricata check or what but the WAN IP was listed in Suricata and it was blocking traffic, just not my GUI connection. I was able to clear the block and so far it hasn't come back but it was very odd. I've never seen Suricata place an Interface IP on the block list and I'm not sure how I was able to still connect via the GUI while Suricata was blocking traffic like that.



  • @stewart said in Multiple php-cgi's using all CPU:

    What I saw:
    I noticed that all of their PCs went offline to our LabTech and ScreenConnect servers once Suricata was brought online. I was able to connect by going directly to their public IP in my browser on the off port. From there I was able to see the block list. None of my IPs were in there, but their Public IP was. I don't know if the way we redirect into the system on an off port into the LAN IP bypassed the Suricata check or what but the WAN IP was listed in Suricata and it was blocking traffic, just not my GUI connection. I was able to clear the block and so far it hasn't come back but it was very odd. I've never seen Suricata place an Interface IP on the block list and I'm not sure how I was able to still connect via the GUI while Suricata was blocking traffic like that.

    Thanks for the explanation up above for how the LabTech setup works. I understand the concept, but I was not familiar with that particular software package.

    I, too, am puzzled how you could connect to the clients remotely if the WAN IP was supposedly blocked. That would indicate it obviously was not blocked even though it was listed in the Block List. Very strange! It might have been some kind of race condition that led to that. The Suricata binary, upon startup, will query FreeBSD asking for all of the firewall interface IP addresses. It will automatically add those to an internal automatic pass list for firewall interfaces. The GUI package, though, works a bit differently but very similar in concept. When you click the button to display a Pass List, the GUI code builds the list by querying pfSense using system calls (to pfSense APIs). You would think the two should always yield the same results, but perhaps for some reason at startup it failed to work that way on that firewall. The fact it appears to have been a one-off event makes it difficult to troubleshoot. You can find the list of IP addresses auto-added to the internal interface Pass List by examining the suricata.log file for the WAN interface. You can access it via the LOGS VIEW tab. Choose the interface whose log you want to examine, and then choose which log file in the drop-down choices. See if Suricata did in fact add the correct WAN interface static IP to its internal list. This list is actually generated by a thread that spawns to continually watch the firewall interfaces for IP changes. When it senses a changed local interface IP, it updates that internal pass list accordingly. Those updates, if any occurred, will also be logged in the suricata.log file.

    Post back here if it recurs.