[SOLVED]egrep used lots of CPU,what did it grep ?
-
I found that the "egrep command" used lots of CPU and it runs onece a minute.
How can I stop it ?I use lsof to trace the process,but I can not find what it grep.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME grep 64277 root txt VREG 1591211450,3983924771 29368 40713 /usr/bin/grep (pfSense/ROOT/default) grep 64277 root cwd VDIR 1591211450,3983924771 262 5456 /usr/local/www (pfSense/ROOT/default) grep 64277 root rtd VDIR 1591211450,3983924771 28 34 / (pfSense/ROOT/default) grep 64277 root 0u PIPE 0xfffffe00c7c322d8 16384 ->0xfffffe00c7c32430 grep 64277 root 1u PIPE 0xfffffe00c7c21cb8 4096 ->0xfffffe00c7c21b60 grep 64277 root 2w VCHR 0,30 0t0 30 /dev/null (devfs) grep 64277 root 8r VREG 1591211450,3983924771 14843 33500 /etc/rc.bootup (pfSense/ROOT/default) COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME egrep 55520 root txt VREG 1591211450,3983924771 29368 40713 /usr/bin/grep (pfSense/ROOT/default) egrep 55520 root cwd VDIR 1591211450,3983924771 262 5456 /usr/local/www (pfSense/ROOT/default) egrep 55520 root rtd VDIR 1591211450,3983924771 28 34 / (pfSense/ROOT/default) egrep 55520 root 0u PIPE 0xfffffe00caf522d8 65536 ->0xfffffe00caf52430, cnt=32768, out=32768 egrep 55520 root 1u PIPE 0xfffffe00c7c20540 4096 ->0xfffffe00c7c203e8 egrep 55520 root 2w VCHR 0,30 0t0 30 /dev/null (devfs) egrep 55520 root 8r VREG 1591211450,3983924771 14843 33500 /etc/rc.bootup (pfSense/ROOT/default)
-
Try running
ps -auxwwd
to see what is calling it. -
@stephenw10 said in egrep used lots of CPU,what did it grep ?:
ps -auxwwd
[root@GW /var/db/rrd]# grep sleep updaterrd.sh sleep 60
I found the pfctl -ss and grep command cost lot of CPU to get the state table in a busy server,I comment out the line 47 in /var/db/rrd/updaterrd.sh file and zero the /tmp/pfctl_ss_out,then run updaterrd.sh again, it frees much more CPU.
[23.05-RELEASE][root@GW.Tel]/root: /sbin/pfctl -ss|wc -l 220823
#pfctl_ss_out="` /sbin/pfctl -ss > /tmp/pfctl_ss_out`" pfrate="` cat /tmp/pfctl_si_out | egrep "inserts|removals" | awk '{ pfrate = $3 + pfrate } {print pfrate}'|tail -1 `" pfstates="` cat /tmp/pfctl_ss_out | egrep -v '\(([0-9a-f:.]|[|])+\) (\->|<\-)' | wc -l|sed 's/ //g'`" pfnat="` cat /tmp/pfctl_ss_out | egrep '\(([0-9a-f:.]|[|])+\) (\->|<\-)' | wc -l|sed 's/ //g' `" srcip="` cat /tmp/pfctl_ss_out | egrep -v '\(([0-9a-f:.]|[|])+\) (\->|<\-)' | grep '\->' | awk '{print $3}' | awk -F: '{print $1}' | sort -u|wc -l|sed 's/ //g' `" dstip="` cat /tmp/pfctl_ss_out | egrep -v '\(([0-9a-f:.]|[|])+\) (\->|<\-)' | grep '<\-' | awk '{print $3}' | awk -F: '{print $1}' | sort -u|wc -l|sed 's/ //g' `" /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-states.rrd N:$pfrate:$pfstates:$pfnat:$srcip:$dstip
I found it only impact the state of status_monitoring.php, it matters not for me.
-
Hmm, 125K states is not that much.
-
@stephenw10 I will double it in next month maybe :)
The cpu was poor .
CPU: Intel(R) Atom(TM) Processor E3940 @ 1.60GHz (1593.60-MHz K8-class CPU) -
Hmm, not a powerful CPU to be sure. Still surprised that it would have a problem with 125K states...
-
@stephenw10 I take a look at the CPU cost,system cost 15%,interrupt cost 14%,nice cost 5%.
I guess that the cost of system was made by pf when NAT, the cost of interrupt was made by network interface when revieved each packet, the cost of nice was made by rrdtool.
I can only reduce the cost of nice by comment out the updaterrd.sh script except wan-traffice.rrd, but I can not find a way to reduce the cost of system and interrupt, would you like to give some suggestion ?Thanks.
[23.05-RELEASE][root@PF.SH]/var/db/rrd: grep nice * Binary file system-processor.rrd matches updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/wan-traffic.rrd N:`/sbin/pfctl -vvsI -i igb0 | awk '\ updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/wan-packets.rrd N:`/sbin/pfctl -vvsI -i igb0 | awk '\ updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/lan-traffic.rrd N:`/sbin/pfctl -vvsI -i igb1 | awk '\ updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/lan-packets.rrd N:`/sbin/pfctl -vvsI -i igb1 | awk '\ updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/ipsec-traffic.rrd N:`/sbin/pfctl -vvsI -i enc0 | awk '\ updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/ipsec-packets.rrd N:`/sbin/pfctl -vvsI -i enc0 | awk '\ updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-states.rrd N:$pfrate:$pfstates:$pfnat:$srcip:$dstip updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-processor.rrd N:${CPU}:${PROCS} updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-memory.rrd N:${MEM} updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-mbuf.rrd N:${MBUF} updaterrd.sh:/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-sensors.rrd N:$CPU_15:$CPU_14:$CPU_13:$CPU_12:$CPU_11:$CPU_10:$CPU_9:$CPU_8:$CPU_7:$CPU_6:$CPU_5:$CPU_4:$CPU_3:$CPU_2:$CPU_1:$CPU_0 updaterrd.sh: /usr/bin/nice -n20 /usr/local/bin/rrdtool create /var/db/rrd/$gw-quality.rrd --step 60 \ updaterrd.sh: /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/$gw-quality.rrd -t loss:delay:stddev N:U:U:U updaterrd.sh: /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/$gw-quality.rrd -t loss:delay:stddev N:$loss:$delay:$stddev
-
None of those seem very high. I'm just surprised that the egrep function was having such a hard time.
Perhaps the the file /tmp/pfctl_ss_out was corrupted or far larger than expected? -
@stephenw10 Yes,pfctl_ss_out is too large to grep.I've mounted /tmp/ as memdisk , it should be zero iowait ,just the file is too large.
I'll add another PPPOE with the same bandwidth in the pfSense with "Multi-WAN on a Stick" solution.
I guess the CPU cost will be double in next week. -
@stephenw10 said in [SOLVED]egrep used lots of CPU,what did it grep ?:
/tmp/pfctl_ss_out
How big is it?
-
@stephenw10
[23.05-RELEASE][root@GW.Tel]/root: pfctl -ss |wc -l
166370 -
Hmm, also not especially large. Is this something you noticed after upgrading to 23.05?
-
@stephenw10 It's running 23.05 now.
-
Yes, but did it start showing the excess CPU usage after updating to 23.05? I.E is this new behaviour in 23.05?
Or perhaps this was never running anything else...
-
@stephenw10 After I install the pfSense, I upgrade it to Plus at the first time ...
I' not sure whether pfSense CE will be the same,but I guess the /var/db/rrd/updaterrd.sh maybe the same.
[23.05-RELEASE][root@GW.Tel]/root: cat /var/db/rrd/updaterrd.sh#!/bin/sh export TERM=dumb echo $$ > /var/run/updaterrd.sh.pid counter=1 while [ "$counter" -ne 0 ] do # polling traffic for interface wan pppoe0 IPv4/IPv6 counters /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/wan-traffic.rrd N:`/sbin/pfctl -vvsI -i pppoe0 | awk '\ /In4\/Pass/ { b4pi = $6 };/Out4\/Pass/ { b4po = $6 };/In4\/Block/ { b4bi = $6 };/Out4\/Block/ { b4bo = $6 };\ /In6\/Pass/ { b6pi = $6 };/Out6\/Pass/ { b6po = $6 };/In6\/Block/ { b6bi = $6 };/Out6\/Block/ { b6bo = $6 };\ END {print b4pi ":" b4po ":" b4bi ":" b4bo ":" b6pi ":" b6po ":" b6bi ":" b6bo};'` # polling packets for interface wan pppoe0 /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/wan-packets.rrd N:`/sbin/pfctl -vvsI -i pppoe0 | awk '\ /In4\/Pass/ { b4pi = $4 };/Out4\/Pass/ { b4po = $4 };/In4\/Block/ { b4bi = $4 };/Out4\/Block/ { b4bo = $4 };\ /In6\/Pass/ { b6pi = $4 };/Out6\/Pass/ { b6po = $4 };/In6\/Block/ { b6bi = $4 };/Out6\/Block/ { b6bo = $4 };\ END {print b4pi ":" b4po ":" b4bi ":" b4bo ":" b6pi ":" b6po ":" b6bi ":" b6bo};'` # polling traffic for interface lan igb1 IPv4/IPv6 counters /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/lan-traffic.rrd N:`/sbin/pfctl -vvsI -i igb1 | awk '\ /In4\/Pass/ { b4pi = $6 };/Out4\/Pass/ { b4po = $6 };/In4\/Block/ { b4bi = $6 };/Out4\/Block/ { b4bo = $6 };\ /In6\/Pass/ { b6pi = $6 };/Out6\/Pass/ { b6po = $6 };/In6\/Block/ { b6bi = $6 };/Out6\/Block/ { b6bo = $6 };\ END {print b4pi ":" b4po ":" b4bi ":" b4bo ":" b6pi ":" b6po ":" b6bi ":" b6bo};'` # polling packets for interface lan igb1 /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/lan-packets.rrd N:`/sbin/pfctl -vvsI -i igb1 | awk '\ /In4\/Pass/ { b4pi = $4 };/Out4\/Pass/ { b4po = $4 };/In4\/Block/ { b4bi = $4 };/Out4\/Block/ { b4bo = $4 };\ /In6\/Pass/ { b6pi = $4 };/Out6\/Pass/ { b6po = $4 };/In6\/Block/ { b6bi = $4 };/Out6\/Block/ { b6bo = $4 };\ END {print b4pi ":" b4po ":" b4bi ":" b4bo ":" b6pi ":" b6po ":" b6bi ":" b6bo};'` # polling traffic for interface opt1 pppoe1 IPv4/IPv6 counters /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/opt1-traffic.rrd N:`/sbin/pfctl -vvsI -i pppoe1 | awk '\ /In4\/Pass/ { b4pi = $6 };/Out4\/Pass/ { b4po = $6 };/In4\/Block/ { b4bi = $6 };/Out4\/Block/ { b4bo = $6 };\ /In6\/Pass/ { b6pi = $6 };/Out6\/Pass/ { b6po = $6 };/In6\/Block/ { b6bi = $6 };/Out6\/Block/ { b6bo = $6 };\ END {print b4pi ":" b4po ":" b4bi ":" b4bo ":" b6pi ":" b6po ":" b6bi ":" b6bo};'` # polling packets for interface opt1 pppoe1 /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/opt1-packets.rrd N:`/sbin/pfctl -vvsI -i pppoe1 | awk '\ /In4\/Pass/ { b4pi = $4 };/Out4\/Pass/ { b4po = $4 };/In4\/Block/ { b4bi = $4 };/Out4\/Block/ { b4bo = $4 };\ /In6\/Pass/ { b6pi = $4 };/Out6\/Pass/ { b6po = $4 };/In6\/Block/ { b6bi = $4 };/Out6\/Block/ { b6bo = $4 };\ END {print b4pi ":" b4po ":" b4bi ":" b4bo ":" b6pi ":" b6po ":" b6bi ":" b6bo};'` # polling traffic for interface ipsec enc0 IPv4/IPv6 counters /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/ipsec-traffic.rrd N:`/sbin/pfctl -vvsI -i enc0 | awk '\ /In4\/Pass/ { b4pi = $6 };/Out4\/Pass/ { b4po = $6 };/In4\/Block/ { b4bi = $6 };/Out4\/Block/ { b4bo = $6 };\ /In6\/Pass/ { b6pi = $6 };/Out6\/Pass/ { b6po = $6 };/In6\/Block/ { b6bi = $6 };/Out6\/Block/ { b6bo = $6 };\ END {print b4pi ":" b4po ":" b4bi ":" b4bo ":" b6pi ":" b6po ":" b6bi ":" b6bo};'` # polling packets for interface ipsec enc0 /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/ipsec-packets.rrd N:`/sbin/pfctl -vvsI -i enc0 | awk '\ /In4\/Pass/ { b4pi = $4 };/Out4\/Pass/ { b4po = $4 };/In4\/Block/ { b4bi = $4 };/Out4\/Block/ { b4bo = $4 };\ /In6\/Pass/ { b6pi = $4 };/Out6\/Pass/ { b6po = $4 };/In6\/Block/ { b6bi = $4 };/Out6\/Block/ { b6bo = $4 };\ END {print b4pi ":" b4po ":" b4bi ":" b4bo ":" b6pi ":" b6po ":" b6bi ":" b6bo};'` #pfctl_si_out="` /sbin/pfctl -si > /tmp/pfctl_si_out `" #pfctl_ss_out="` /sbin/pfctl -ss > /tmp/pfctl_ss_out`" #pfrate="` cat /tmp/pfctl_si_out | egrep "inserts|removals" | awk '{ pfrate = $3 + pfrate } {print pfrate}'|tail -1 `" #pfstates="` cat /tmp/pfctl_ss_out | egrep -v '\(([0-9a-f:.]|[|])+\) (\->|<\-)' | wc -l|sed 's/ //g'`" #pfnat="` cat /tmp/pfctl_ss_out | egrep '\(([0-9a-f:.]|[|])+\) (\->|<\-)' | wc -l|sed 's/ //g' `" #srcip="` cat /tmp/pfctl_ss_out | egrep -v '\(([0-9a-f:.]|[|])+\) (\->|<\-)' | grep '\->' | awk '{print $3}' | awk -F: '{print $1}' | sort -u|wc -l|sed 's/ //g' `" #dstip="` cat /tmp/pfctl_ss_out | egrep -v '\(([0-9a-f:.]|[|])+\) (\->|<\-)' | grep '<\-' | awk '{print $3}' | awk -F: '{print $1}' | sort -u|wc -l|sed 's/ //g' `" #/usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-states.rrd N:$pfrate:$pfstates:$pfnat:$srcip:$dstip CPU=`/usr/local/sbin/cpustats | cut -f1-4 -d':'` PROCS=`ps uxaH | wc -l | awk '{print $1;}'` /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-processor.rrd N:${CPU}:${PROCS} MEM=`/sbin/sysctl -qn vm.stats.vm.v_page_count vm.stats.vm.v_active_count vm.stats.vm.v_inactive_count vm.stats.vm.v_free_count kstat.zfs.misc.arcstats.size vm.stats.vm.v_wire_count vm.stats.vm.v_user_wire_count vm.stats.vm.v_laundry_count vfs.bufspace hw.pagesize | /usr/bin/awk '{getline active;getline inactive;getline free;getline cache;getline wire;getline userwire;getline laundry;getline buffers;getline pagesize;cache=(cache/pagesize);buffers=(buffers/pagesize);printf ((active/$0) * 100)":"((inactive/$0) * 100)":"((free/$0) * 100)":"((cache/$0) * 100)":"((wire - (cache + buffers))/$0 * 100)":"((userwire/$0) * 100)":"((laundry/$0) * 100)":"((buffers/$0) * 100)}'` /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-memory.rrd N:${MEM} MBUF=`/usr/bin/netstat -m | /usr/bin/awk '/mbuf clusters in use/ { gsub(/\//, ":", $1); print $1; }'` /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-mbuf.rrd N:${MBUF} THERMAL_TZ0=`/sbin/sysctl -qn hw.acpi.thermal.tz0.temperature | /usr/bin/sed 's/C//'` CPU_3=`/sbin/sysctl -qn dev.cpu.3.temperature | /usr/bin/sed 's/C//'` CPU_2=`/sbin/sysctl -qn dev.cpu.2.temperature | /usr/bin/sed 's/C//'` CPU_1=`/sbin/sysctl -qn dev.cpu.1.temperature | /usr/bin/sed 's/C//'` CPU_0=`/sbin/sysctl -qn dev.cpu.0.temperature | /usr/bin/sed 's/C//'` /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/system-sensors.rrd N:$THERMAL_TZ0:$CPU_3:$CPU_2:$CPU_1:$CPU_0 # Gateway quality graphs for sock in /var/run/dpinger_*.sock; do if [ ! -S "$sock" ]; then continue fi t=$(/usr/bin/nc -U $sock) if [ -z "$t" ]; then continue fi gw=$(echo "$t" | awk '{ print $1 }') delay=$(echo "$t" | awk '{ print $2 }') stddev=$(echo "$t" | awk '{ print $3 }') loss=$(echo "$t" | awk '{ print $4 }') if echo "$loss" | grep -Eqv '^[0-9]+$'; then loss="U" fi if echo "$delay" | grep -Eqv '^[0-9]+$'; then delay="U" else # Convert delay from microseconds to seconds delay=$(echo "scale=7; $delay / 1000 / 1000" | /usr/bin/bc) fi if echo "$stddev" | grep -Eqv '^[0-9]+$'; then stddev="U" else # Convert stddev from microseconds to seconds stddev=$(echo "scale=7; $stddev / 1000 / 1000" | /usr/bin/bc) fi if [ ! -f /var/db/rrd/$gw-quality.rrd ]; then /usr/bin/nice -n20 /usr/local/bin/rrdtool create /var/db/rrd/$gw-quality.rrd --step 60 \ DS:loss:GAUGE:120:0:100 \ DS:delay:GAUGE:120:0:100000 \ DS:stddev:GAUGE:120:0:100000 \ RRA:AVERAGE:0.5:1:1200 \ RRA:AVERAGE:0.5:5:720 \ RRA:AVERAGE:0.5:60:1860 \ RRA:AVERAGE:0.5:1440:2284 /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/$gw-quality.rrd -t loss:delay:stddev N:U:U:U fi /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/$gw-quality.rrd -t loss:delay:stddev N:$loss:$delay:$stddev done sleep 60 done
-
Ok, thanks. Let me see what I can find....
-
@stephenw10 I find after I reboot the pfSense,the /var/db/rrd/updaterrd.sh file will be restore :(
-
Yes it's generated based on the data sources you have.
Are you able to test some other code? A patch?
-
@stephenw10
Yes,I can test the patch about it. -
Ok, let me see what I can do here...