php find_rule_by_number function super slow due to exec calls, why?
-
pfSense 24.03. I made my own copy of /usr/local/bin/filterparser.php to use in a mail reporting script, and then modified it so it would add the human label for a firewall rule to the end of the output, eg:
Jul 16 10:25:45 block mvneta0.4091 TCP:SEC 192.168.1.5:49706 157.240.245.8:443 "Facebook v4 Outbound"
In my mod to filterparser.php, I used a call to find_rule_by_number(), from /etc/inc/syslog.inc to get the human label string. This php function is super slow due to the exec() of pfctl within. I tried using buffer_rules_load() and find_rule_by_number_buffer() from syslog.inc to speed things up, but could not get them to give me the human-readable string. (Advice please?)
Then I noticed that /tmp/rules.debug has the info I need: the tracker integer for the rule number, plus the label/USER_RULE string.
Even inefficient coding of : open("tmp/rules.debug, "r"), read the file until you find the tracker number, save that line for further parsing, close file within the larger while loop that reads syslog lines is an order of magnitude faster than using find_rule_by_number.
Why oh why does find_rule_by_number do such an expensive exec() operation when the data is already there in /tmp? How much is this POS function used in actual pfSense code?
-
-
Some more testing here: The basic test script is:
/usr/bin/time /bin/cat /var/log/filter.log.6 | /root/myfilterparser[version].php > /dev/null
where filter.log.6 has 3442 lines of syslog output.
Versions of myfilterparser used here (code files attached, suffix changed to txt)
myfilterparser.php myfilterparser.txt : used find_rule_by_number() calls, which execs pfctl for each syslog line.
myfilterparser2.php myfilterparser2.txt : stupid method of reading /tmp/rules.debug. Open the file, read thru it to find tracker number and human-readable text, close file. Do this for every syslog line.
myfilterparser3.php myfilterparser3.txt : smart method of reading /tmp/rules.debug via the file() command once at the beginning. Parse this data in memory for each syslog line to get the human-readable text.Outputs from the time command:
myfilterparser.php: 1751.5 seconds real time (29+ minutes!!) It also pounded on my 1100's cpu and memory the whole time. "top -C" showed a lot of pfctl processes using boatloads of cpu.
myfilterparser2.php: 4.9 second real time.
myfilterparser3.php: 3.2 seconds real time.This experiment tells me that execing pfctl from within pfSense php code is a supremely stupid thing to do, if you can get the same information from a pre-existing file like /tmp/rules.debug.
-
A final harrumph from an old-time C programmer on UNIX systems... I quote from "Advanced UNIX Programming" by Marc J. Rochkind, first edition 1985, page 112: "The cost of a fork [system call in C] is enormous". I suppose this book dates me, but it was a classic in its day.
Modern PHP code is doing the C sequence of fork/exec/wait for the UNIX command under the covers (maybe with the modern advantages of COW memory management), so an "exec" in PHP is a really expensive operation. So minimize exec calls, especially in loops. If you can get the same data by accessing memory or reading a file someplace, then do it.