PFsense SLow and High CPU
-
Hi,
We have a pfsense 2.2.4-RELEASE (i386)
built on Sat Jul 25 19:56:41 CDT 2015
FreeBSD 10.1-RELEASE-p15The cpu has a load of 100%
Here is a top -S -H last pid: 35567; load averages: 5.68, 6.43, 6.58 up 3+10:16:58 11:20:16 165 processes: 6 running, 131 sleeping, 28 waiting CPU: 49.2% user, 0.0% nice, 18.1% system, 26.5% interrupt, 6.2% idle Mem: 70M Active, 207M Inact, 116M Wired, 89M Buf, 1578M Free Swap: 4096M Total, 4096M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 34796 root 85 0 14640K 9044K RUN 1 0:20 41.70% pfctl 12 root -92 - 0K 224K WAIT 0 217:21 32.57% intr{irq259: em1:rx 0} 12 root -92 - 0K 224K WAIT 1 222:34 21.58% intr{irq262: em2:rx 0} 79377 root 52 0 82632K 28484K accept 1 0:23 18.46% php 29027 root 34 0 82632K 28492K accept 1 0:28 12.26% php 98220 root 47 0 82632K 28484K accept 0 2:04 10.16% php 54636 root 34 0 82632K 28488K accept 0 1:27 9.28% php 24411 root 37 0 86728K 36216K accept 1 1:55 9.18% php 4965 root 39 0 82632K 28496K accept 1 0:35 7.08% php 60367 root 20 0 86728K 35704K CPU0 0 2:09 6.79% php 59102 root 25 0 82632K 28484K RUN 0 0:25 5.08% php 29995 nobody 21 0 11400K 3908K select 1 0:35 3.27% dnsmasq 11 root 155 ki31 0K 16K RUN 0 41.2H 1.56% idle{idle: cpu0} 8330 root 31 0 86728K 36016K accept 0 1:16 1.56% php 11 root 155 ki31 0K 16K RUN 1 42.7H 1.46% idle{idle: cpu1} 12 root -92 - 0K 224K WAIT 1 11:07 0.88% intr{irq260: em1:tx 0} 85824 root 44 0 86728K 36256K accept 1 2:51 0.88% php
systat -vmstat 1 2 users Load 4.87 5.05 5.78 Sep 28 11:28 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 293116 15968 1327504 20168 1603584 count All 358248 30264 3676572 172716 pages Proc: Interrupts r p d s w Csw Trp Sys Int Sof Flt ioflt 10836 total 4 63 13k 1135 2497 8625 77 2190 cow 4 em5 uhci2 337 zfod 125 em4 uhci1 25.6%Sys 36.3%Intr 33.2%User 5.0%Nice 0.0%Idle ozfod uhci0 ehci | | | | | | | | | | %ozfod 1115 cpu0:timer =============++++++++++++++++++>>>>>>>>>>>>>>>>>-- daefr 2464 em1:rx 0 36 dtbuf prcfr 2073 em1:tx 0 Namei Name-cache Dir-cache 111445 desvn 1 totfr em1:link Calls hits % hits % 1471 numvn react 1857 em2:rx 0 885 885 100 229 frevn pdwak 2076 em2:tx 0 35 pdpgs em2:link Disks md0 ada0 ada1 pass0 pass1 intrn ahci0 266 KB/t 0.00 0.00 0.00 0.00 0.00 120160 wire 1122 cpu1:timer tps 1 0 0 0 0 88296 act MB/s 0.00 0.00 0.00 0.00 0.00 207240 inact %busy 0 0 0 0 0 cache 1603700 free
netstat -m 3688/2642/6330 mbufs in use (current/cache/total) 3673/1647/5320/26584 mbuf clusters in use (current/cache/total/max) 3673/1640 mbuf+clusters out of packet secondary zone in use (current/cache) 0/44/44/13291 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/3938 9k jumbo clusters in use (current/cache/total/max) 0/0/0/2215 16k jumbo clusters in use (current/cache/total/max) 8302K/4130K/12433K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/11/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile
-
0/ Please, use the CODE bbcodde (the fine # button) to post console output in a readable form.
1/ Post the output ofps auxw | grep '[p]hp'
-
Sorry for the code
Here is the output:
ps auxw | grep '[p]hp' root 54636 18.2 1.4 82632 28488 - S 10:50AM 2:34.58 /usr/local/bin/php root 79377 17.1 1.4 82632 28484 - R 11:15AM 2:05.14 /usr/local/bin/php root 85143 15.0 1.4 82632 28480 - S 11:31AM 0:32.80 /usr/local/bin/php root 15517 10.8 1.7 86728 35704 - S 11:25AM 1:27.37 /usr/local/bin/php root 77242 7.2 1.6 86728 33032 - D 11:37AM 0:04.15 /usr/local/bin/php -f /etc/rc.prunecaptiveportal test root 56754 7.0 1.4 82632 28488 - S 10:57AM 1:19.90 /usr/local/bin/php root 70124 7.0 1.8 86824 36256 - R 11:31AM 0:22.09 php-fpm: pool lighty (php-fpm) root 39322 6.8 1.5 86824 30476 - R 11:31AM 0:21.20 php-fpm: pool lighty (php-fpm) root 80658 6.6 1.7 86824 35824 - S 11:31AM 0:22.52 php-fpm: pool lighty (php-fpm) root 8330 3.6 1.8 86728 36020 - S 10:32AM 1:43.94 /usr/local/bin/php root 85706 3.5 1.4 82632 28476 - S 11:37AM 0:06.11 /usr/local/bin/php root 26757 3.2 1.4 82632 28480 - S 11:26AM 0:44.44 /usr/local/bin/php root 20387 0.8 1.5 86824 31508 - S 11:31AM 0:25.87 php-fpm: pool lighty (php-fpm) root 77335 0.4 1.4 82632 28476 - S 11:37AM 0:06.43 /usr/local/bin/php root 12429 0.1 1.4 82632 28484 - I 11:30AM 0:06.67 /usr/local/bin/php root 87715 0.1 1.4 82632 28484 - I 10:53AM 0:39.54 /usr/local/bin/php root 253 0.0 0.9 78632 18512 - Ss Fri01AM 0:21.82 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm) root 4965 0.0 1.4 82632 28496 - I 11:12AM 0:57.50 /usr/local/bin/php root 24411 0.0 1.8 86728 36220 - I 10:32AM 2:14.04 /usr/local/bin/php root 56709 0.0 0.1 10076 1704 - S Fri01AM 0:01.14 minicron: helper /usr/local/sbin/gmirror_status_check.php (minic root 80479 0.0 1.7 86728 35716 - I 10:49AM 0:38.18 /usr/local/bin/php root 82250 0.0 1.0 78536 20400 - Is 9:37AM 0:00.37 /usr/local/bin/php root 83172 0.0 1.0 78536 20400 - Is 9:37AM 0:00.39 /usr/local/bin/php root 83392 0.0 1.0 78536 20400 - Is 9:37AM 0:00.40 /usr/local/bin/php root 83704 0.0 1.0 78536 20400 - Is 9:37AM 0:00.40 /usr/local/bin/php root 83948 0.0 1.0 78536 20400 - Is 9:37AM 0:00.42 /usr/local/bin/php root 84168 0.0 1.0 78536 20400 - Is 9:37AM 0:00.44 /usr/local/bin/php root 84514 0.0 1.8 86728 36268 - I 9:37AM 2:44.40 /usr/local/bin/php root 84694 0.0 1.8 86728 35960 - I 9:37AM 2:29.28 /usr/local/bin/php root 84917 0.0 1.8 86728 36208 - I 9:37AM 1:35.46 /usr/local/bin/php root 85437 0.0 1.7 86728 35724 - I 9:37AM 2:37.51 /usr/local/bin/php
-
any chance you are getting attacked? what do the rrd graphs show for em1 & em2 ? are those wan's or lan's ?
-
em1= wan fiber Colt
em2= customer network -
Well, you have a shitton of php processes running for god knows what reason.
- Restart webConfigurator from console
- Restart PHP-FPM from console
- Limit the GUI processes if not done yet (System - Advanced - Max Processes).
Also look at the dashboard whether the GW monitor does not show some totally BS figured, like 109% packet loss. If it does, kill apinger.
-
- Restart webConfigurator from console
Whats the right command to do it?
- Restart PHP-FPM from console
Whats the right command to do it?
- Limit the GUI processes if not done yet (System - Advanced - Max Processes).
It's set to 2, do i have to change it ?
-
The right command is to use the fine console menu that's there (both via SSH and on physical console).
-
Hi,
I've restart the 2 still 100% cpu
last pid: 70334; load averages: 7.58, 6.90, 7.28 up 3+11:37:44 12:41:02 172 processes: 9 running, 135 sleeping, 28 waiting CPU: 38.2% user, 0.2% nice, 34.7% system, 26.9% interrupt, 0.0% idle Mem: 111M Active, 211M Inact, 119M Wired, 74M Buf, 1531M Free Swap: 4096M Total, 4096M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -92 - 0K 224K WAIT 0 235:41 24.27% [intr{irq259: 49598 root 45 0 86728K 35944K RUN 0 2:07 19.68% /usr/local/bin 93820 root 39 0 86728K 35680K accept 1 1:03 15.97% /usr/local/bin 12 root -92 - 0K 224K WAIT 0 239:00 11.67% [intr{irq262: 10691 root 29 0 86728K 35716K accept 1 1:03 11.38% /usr/local/bin 5449 root 46 0 82632K 28476K accept 1 0:05 9.57% /usr/local/bin 41372 root 42 0 86728K 35724K accept 1 2:20 7.76% /usr/local/bin 49355 root 74 0 86824K 33060K RUN 0 0:27 7.28% php-fpm: pool 14204 root 74 0 86824K 33940K RUN 1 0:30 6.88% php-fpm: pool 54828 root 52 0 86824K 33828K piperd 0 0:27 6.40% php-fpm: pool 96397 root 52 0 86728K 33052K sysctl 0 0:04 6.40% /usr/local/bin 48718 root 52 0 82632K 28496K accept 0 0:36 5.37% /usr/local/bin 2200 root 52 0 86824K 35476K piperd 1 0:35 5.08% php-fpm: pool 45926 root 25 0 17248K 6644K kqread 1 0:03 4.39% /usr/local/sbi 29995 nobody 22 0 11400K 4020K select 1 2:35 3.47% /usr/local/sbi 27091 root 26 0 82632K 28492K accept 0 0:09 2.49% /usr/local/bin 54198 admin-itp 20 0 11384K 2724K CPU1 1 0:00 0.59% top -aSH [/code]
-
The output with truncated paths is completely useless.
-
Sorry
last pid: 78757; load averages: 7.10, 7.19, 7.14 up 3+11:53:35 12:56:53 167 processes: 9 running, 132 sleeping, 26 waiting CPU: 38.8% user, 0.0% nice, 44.4% system, 16.2% interrupt, 0.6% idle Mem: 143M Active, 202M Inact, 119M Wired, 85M Buf, 1509M Free Swap: 4096M Total, 4096M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 26911 root 49 0 82632K 28484K accept 1 2:01 25.59% /usr/local/bin/php 12 root -92 - 0K 224K CPU0 0 241:42 21.78% [intr{irq262: em2:rx 0}] 90617 root 33 0 82632K 28484K CPU0 0 1:48 17.77% /usr/local/bin/php 12 root -92 - 0K 224K CPU0 0 238:40 14.06% [intr{irq259: em1:rx 0}] 14204 root 75 0 99112K 50224K RUN 1 1:40 11.47% php-fpm: pool lighty (php-fpm) 2200 root 75 0 101M 50364K RUN 1 1:50 10.69% php-fpm: pool lighty (php-fpm) 93820 root 40 0 86728K 35728K accept 1 1:24 10.16% /usr/local/bin/php 54828 root 75 0 101M 51096K RUN 0 1:42 9.57% php-fpm: pool lighty (php-fpm) 49355 root 52 0 99112K 47252K nanslp 1 1:40 9.38% php-fpm: pool lighty (php-fpm) 5449 root 27 0 82632K 28484K accept 1 1:40 7.76% /usr/local/bin/php 10691 root 33 0 86728K 35732K accept 1 1:26 5.66% /usr/local/bin/php 48718 root 52 0 82632K 28496K accept 1 1:55 1.46% /usr/local/bin/php 29995 nobody 21 0 11400K 4020K select 1 2:54 1.27% /usr/local/sbin/dnsmasq --all-servers --rebind-localhost-ok --stop-dns-rebind --dns-forward-max=5000 --cache-size=100 54198 admin-itp 21 0 11384K 2908K CPU1 1 0:11 1.17% top -aSH 89404 root 26 0 82632K 28488K accept 0 0:42 0.49% /usr/local/bin/php 12 root -92 - 0K 224K WAIT 0 12:20 0.39% [intr{irq260: em1:tx 0}] 47366 root 20 0 13148K 5484K kqread 0 0:06 0.20% /usr/local/sbin/lighttpd -f /var/etc/lighty-test-CaptivePortal.conf
We also have a lot of these logs, don't know if it's linked:
Sep 28 12:52:42 lighttpd[47366]: (connections.c.137) (warning) close: 21 Connection reset by peer Sep 28 12:52:51 lighttpd[47366]: (connections.c.137) (warning) close: 13 Connection reset by peer Sep 28 12:53:05 lighttpd[47366]: (connections.c.137) (warning) close: 13 Connection reset by peer Sep 28 12:53:15 lighttpd[47366]: (connections.c.137) (warning) close: 15 Connection reset by peer Sep 28 12:53:46 lighttpd[47366]: (connections.c.137) (warning) close: 15 Connection reset by peer Sep 28 12:53:49 lighttpd[47366]: (connections.c.137) (warning) close: 18 Connection reset by peer Sep 28 12:54:11 lighttpd[47366]: (connections.c.137) (warning) close: 18 Connection reset by peer Sep 28 12:54:35 lighttpd[47366]: (connections.c.137) (warning) close: 21 Connection reset by peer Sep 28 12:54:57 lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer Sep 28 12:55:02 lighttpd[47366]: (connections.c.137) (warning) close: 16 Connection reset by peer Sep 28 12:55:15 lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer Sep 28 12:55:33 lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer Sep 28 12:55:54 lighttpd[47366]: (connections.c.137) (warning) close: 13 Connection reset by peer Sep 28 12:55:57 lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer Sep 28 12:56:21 lighttpd[47366]: (connections.c.137) (warning) close: 22 Connection reset by peer Sep 28 12:56:23 lighttpd[47366]: (connections.c.137) (warning) close: 18 Connection reset by peer Sep 28 12:57:05 lighttpd[47366]: (connections.c.137) (warning) close: 16 Connection reset by peer Sep 28 12:57:18 lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
-
Disable the Captive portal thing.
-
Do you thin kthe Platform is not powerfull enough ? ( we have 100-150 users)
-
also might try
sysctl dev.em.0.debug=1 ```from console
-
karlzre - Apparently not in your case when idiots/broken mobile shit keeps hammering the CP over and over again. (Other than that, you told us exactly NOTHING about your "platform"). And yeah, there's also something very broken about the NICs. Did you enable some nonsense like polling there?
-
also might try
Code: [Select]
sysctl dev.em.0.debug=1
from consoledev.em.0.debug: -1 -> -1
Intel(R) Atom(TM) CPU N270 @ 1.60GHz Current: 800 MHz, Max: 1600 MHz 2 CPUs: 1 package(s) x 1 core(s) x 2 HTT threads
Memory usage
13% of 2006 MB
SWAP usage
0% of 4096 MB
Disk usage
/ (ufs): 2% of 23G
/var/run (ufs in RAM): 3% of 3.4M -
Did you enable some nonsense like polling there?
Enable device polling is disabled.
-
Intel(R) Atom(TM) CPU N270 @ 1.60GHz Current: 800 MHz, Max: 1600 MHz 2 CPUs: 1 package(s) x 1 core(s) x 2 HTT threads
This CPU is just old slow POS, but that thing is running at half the freq it's spec'ed for with 100% CPU usage? :o Try fiddling with the powerd settings.
Also, post some information about the NICs.
-
This CPU is just old slow POS, but that thing is running at half the freq it's spec'ed for with 100% CPU usage? :o Try fiddling with the powerd settings.
Also, post some information about the NICs.
I have a bit more power behind my box and Ive run into the same thing.. Is powerd on minimum capable of causing this issue?
my machine gets loaded up with 60mbits of torrent/netflix traffic (that is my connection speed) from time to time and I see that [kernel{em0 taskq}] and [intr{swi5: fast taskq}] tasks taking up the majority of the available CPU.. Polling is turned off..
This machine as built from spare parts that I had laying around is it possible that im just running into a limitation with the hardware?
IC2D 2.13ghz, Asus OEM shitty mobo, 3gb ram, Intel Pro/1000 Legacy (external), Realtek 8169 (Internal)
-
I have a bit more power behind my box and Ive run into the same thing.. Is powerd on minimum capable of causing this issue?
You need to fiddle with whatever settings are available there. People here mentioned boxes where CPU is never run at full freq unless powerd is enabled somehow. Sure if set @minimum, that's a good candidate.