PFsense SLow and High CPU

karlzre

Hi,

We have a pfsense 2.2.4-RELEASE (i386)
built on Sat Jul 25 19:56:41 CDT 2015
FreeBSD 10.1-RELEASE-p15

The cpu has a load of 100%

Here is a top -S -H

last pid: 35567;  load averages:  5.68,  6.43,  6.58                                        up 3+10:16:58  11:20:16
165 processes: 6 running, 131 sleeping, 28 waiting
CPU: 49.2% user,  0.0% nice, 18.1% system, 26.5% interrupt,  6.2% idle
Mem: 70M Active, 207M Inact, 116M Wired, 89M Buf, 1578M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
34796 root        85    0 14640K  9044K RUN     1   0:20  41.70% pfctl
   12 root       -92    -     0K   224K WAIT    0 217:21  32.57% intr{irq259: em1:rx 0}
   12 root       -92    -     0K   224K WAIT    1 222:34  21.58% intr{irq262: em2:rx 0}
79377 root        52    0 82632K 28484K accept  1   0:23  18.46% php
29027 root        34    0 82632K 28492K accept  1   0:28  12.26% php
98220 root        47    0 82632K 28484K accept  0   2:04  10.16% php
54636 root        34    0 82632K 28488K accept  0   1:27   9.28% php
24411 root        37    0 86728K 36216K accept  1   1:55   9.18% php
 4965 root        39    0 82632K 28496K accept  1   0:35   7.08% php
60367 root        20    0 86728K 35704K CPU0    0   2:09   6.79% php
59102 root        25    0 82632K 28484K RUN     0   0:25   5.08% php
29995 nobody      21    0 11400K  3908K select  1   0:35   3.27% dnsmasq
   11 root       155 ki31     0K    16K RUN     0  41.2H   1.56% idle{idle: cpu0}
 8330 root        31    0 86728K 36016K accept  0   1:16   1.56% php
   11 root       155 ki31     0K    16K RUN     1  42.7H   1.46% idle{idle: cpu1}
   12 root       -92    -     0K   224K WAIT    1  11:07   0.88% intr{irq260: em1:tx 0}
85824 root        44    0 86728K 36256K accept  1   2:51   0.88% php

systat -vmstat 1
 2 users    Load  4.87  5.05  5.78                  Sep 28 11:28

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  293116   15968  1327504    20168 1603584  count
All  358248   30264  3676572   172716          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt 10836 total
  4          63       13k 1135 2497 8625   77 2190        cow       4 em5 uhci2
                                                      337 zfod    125 em4 uhci1
25.6%Sys  36.3%Intr 33.2%User  5.0%Nice  0.0%Idle         ozfod       uhci0 ehci
|    |    |    |    |    |    |    |    |    |           %ozfod  1115 cpu0:timer
=============++++++++++++++++++>>>>>>>>>>>>>>>>>--        daefr  2464 em1:rx 0
                                        36 dtbuf          prcfr  2073 em1:tx 0
Namei     Name-cache   Dir-cache    111445 desvn        1 totfr       em1:link
   Calls    hits   %    hits   %      1471 numvn          react  1857 em2:rx 0
     885     885 100                   229 frevn          pdwak  2076 em2:tx 0
                                                       35 pdpgs       em2:link
Disks   md0  ada0  ada1 pass0 pass1                       intrn       ahci0 266
KB/t   0.00  0.00  0.00  0.00  0.00                120160 wire   1122 cpu1:timer
tps       1     0     0     0     0                 88296 act
MB/s   0.00  0.00  0.00  0.00  0.00                207240 inact
%busy     0     0     0     0     0                       cache
                                                  1603700 free

netstat -m

3688/2642/6330 mbufs in use (current/cache/total)
3673/1647/5320/26584 mbuf clusters in use (current/cache/total/max)
3673/1640 mbuf+clusters out of packet secondary zone in use (current/cache)
0/44/44/13291 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/3938 9k jumbo clusters in use (current/cache/total/max)
0/0/0/2215 16k jumbo clusters in use (current/cache/total/max)
8302K/4130K/12433K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/11/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

doktornotor

0/ Please, use the CODE bbcodde (the fine # button) to post console output in a readable form.
1/ Post the output of

ps auxw | grep '[p]hp'

karlzre

Sorry for the code

Here is the output:

ps auxw | grep '[p]hp'
root      54636 18.2  1.4 82632 28488  -  S    10:50AM    2:34.58 /usr/local/bin/php
root      79377 17.1  1.4 82632 28484  -  R    11:15AM    2:05.14 /usr/local/bin/php
root      85143 15.0  1.4 82632 28480  -  S    11:31AM    0:32.80 /usr/local/bin/php
root      15517 10.8  1.7 86728 35704  -  S    11:25AM    1:27.37 /usr/local/bin/php
root      77242  7.2  1.6 86728 33032  -  D    11:37AM    0:04.15 /usr/local/bin/php -f /etc/rc.prunecaptiveportal test
root      56754  7.0  1.4 82632 28488  -  S    10:57AM    1:19.90 /usr/local/bin/php
root      70124  7.0  1.8 86824 36256  -  R    11:31AM    0:22.09 php-fpm: pool lighty (php-fpm)
root      39322  6.8  1.5 86824 30476  -  R    11:31AM    0:21.20 php-fpm: pool lighty (php-fpm)
root      80658  6.6  1.7 86824 35824  -  S    11:31AM    0:22.52 php-fpm: pool lighty (php-fpm)
root       8330  3.6  1.8 86728 36020  -  S    10:32AM    1:43.94 /usr/local/bin/php
root      85706  3.5  1.4 82632 28476  -  S    11:37AM    0:06.11 /usr/local/bin/php
root      26757  3.2  1.4 82632 28480  -  S    11:26AM    0:44.44 /usr/local/bin/php
root      20387  0.8  1.5 86824 31508  -  S    11:31AM    0:25.87 php-fpm: pool lighty (php-fpm)
root      77335  0.4  1.4 82632 28476  -  S    11:37AM    0:06.43 /usr/local/bin/php
root      12429  0.1  1.4 82632 28484  -  I    11:30AM    0:06.67 /usr/local/bin/php
root      87715  0.1  1.4 82632 28484  -  I    10:53AM    0:39.54 /usr/local/bin/php
root        253  0.0  0.9 78632 18512  -  Ss   Fri01AM    0:21.82 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
root       4965  0.0  1.4 82632 28496  -  I    11:12AM    0:57.50 /usr/local/bin/php
root      24411  0.0  1.8 86728 36220  -  I    10:32AM    2:14.04 /usr/local/bin/php
root      56709  0.0  0.1 10076  1704  -  S    Fri01AM    0:01.14 minicron: helper /usr/local/sbin/gmirror_status_check.php  (minic
root      80479  0.0  1.7 86728 35716  -  I    10:49AM    0:38.18 /usr/local/bin/php
root      82250  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.37 /usr/local/bin/php
root      83172  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.39 /usr/local/bin/php
root      83392  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.40 /usr/local/bin/php
root      83704  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.40 /usr/local/bin/php
root      83948  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.42 /usr/local/bin/php
root      84168  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.44 /usr/local/bin/php
root      84514  0.0  1.8 86728 36268  -  I     9:37AM    2:44.40 /usr/local/bin/php
root      84694  0.0  1.8 86728 35960  -  I     9:37AM    2:29.28 /usr/local/bin/php
root      84917  0.0  1.8 86728 36208  -  I     9:37AM    1:35.46 /usr/local/bin/php
root      85437  0.0  1.7 86728 35724  -  I     9:37AM    2:37.51 /usr/local/bin/php

heper

any chance you are getting attacked? what do the rrd graphs show for em1 & em2 ? are those wan's or lan's ?

karlzre

em1= wan fiber Colt
em2= customer network

doktornotor

Well, you have a shitton of php processes running for god knows what reason.

Restart webConfigurator from console
Restart PHP-FPM from console
Limit the GUI processes if not done yet (System - Advanced - Max Processes).

Also look at the dashboard whether the GW monitor does not show some totally BS figured, like 109% packet loss. If it does, kill apinger.

karlzre

Restart webConfigurator from console

Whats the right command to do it?

Restart PHP-FPM from console

Whats the right command to do it?

Limit the GUI processes if not done yet (System - Advanced - Max Processes).

It's set to 2, do i have to change it ?

doktornotor

The right command is to use the fine console menu that's there (both via SSH and on physical console).

karlzre

Hi,

I've restart the 2 still 100% cpu

last pid: 70334;  load averages:  7.58,  6.90,  7.28    up 3+11:37:44  12:41:02
172 processes: 9 running, 135 sleeping, 28 waiting
CPU: 38.2% user,  0.2% nice, 34.7% system, 26.9% interrupt,  0.0% idle
Mem: 111M Active, 211M Inact, 119M Wired, 74M Buf, 1531M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   12 root       -92    -     0K   224K WAIT    0 235:41  24.27% [intr{irq259:
49598 root        45    0 86728K 35944K RUN     0   2:07  19.68% /usr/local/bin
93820 root        39    0 86728K 35680K accept  1   1:03  15.97% /usr/local/bin
   12 root       -92    -     0K   224K WAIT    0 239:00  11.67% [intr{irq262:
10691 root        29    0 86728K 35716K accept  1   1:03  11.38% /usr/local/bin
 5449 root        46    0 82632K 28476K accept  1   0:05   9.57% /usr/local/bin
41372 root        42    0 86728K 35724K accept  1   2:20   7.76% /usr/local/bin
49355 root        74    0 86824K 33060K RUN     0   0:27   7.28% php-fpm: pool
14204 root        74    0 86824K 33940K RUN     1   0:30   6.88% php-fpm: pool
54828 root        52    0 86824K 33828K piperd  0   0:27   6.40% php-fpm: pool
96397 root        52    0 86728K 33052K sysctl  0   0:04   6.40% /usr/local/bin
48718 root        52    0 82632K 28496K accept  0   0:36   5.37% /usr/local/bin
 2200 root        52    0 86824K 35476K piperd  1   0:35   5.08% php-fpm: pool
45926 root        25    0 17248K  6644K kqread  1   0:03   4.39% /usr/local/sbi
29995 nobody      22    0 11400K  4020K select  1   2:35   3.47% /usr/local/sbi
27091 root        26    0 82632K 28492K accept  0   0:09   2.49% /usr/local/bin
54198 admin-itp   20    0 11384K  2724K CPU1    1   0:00   0.59% top -aSH
[/code]

doktornotor

The output with truncated paths is completely useless.

karlzre

Sorry

last pid: 78757;  load averages:  7.10,  7.19,  7.14                                                                                                           up 3+11:53:35  12:56:53
167 processes: 9 running, 132 sleeping, 26 waiting
CPU: 38.8% user,  0.0% nice, 44.4% system, 16.2% interrupt,  0.6% idle
Mem: 143M Active, 202M Inact, 119M Wired, 85M Buf, 1509M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
26911 root        49    0 82632K 28484K accept  1   2:01  25.59% /usr/local/bin/php
   12 root       -92    -     0K   224K CPU0    0 241:42  21.78% [intr{irq262: em2:rx 0}]
90617 root        33    0 82632K 28484K CPU0    0   1:48  17.77% /usr/local/bin/php
   12 root       -92    -     0K   224K CPU0    0 238:40  14.06% [intr{irq259: em1:rx 0}]
14204 root        75    0 99112K 50224K RUN     1   1:40  11.47% php-fpm: pool lighty (php-fpm)
 2200 root        75    0   101M 50364K RUN     1   1:50  10.69% php-fpm: pool lighty (php-fpm)
93820 root        40    0 86728K 35728K accept  1   1:24  10.16% /usr/local/bin/php
54828 root        75    0   101M 51096K RUN     0   1:42   9.57% php-fpm: pool lighty (php-fpm)
49355 root        52    0 99112K 47252K nanslp  1   1:40   9.38% php-fpm: pool lighty (php-fpm)
 5449 root        27    0 82632K 28484K accept  1   1:40   7.76% /usr/local/bin/php
10691 root        33    0 86728K 35732K accept  1   1:26   5.66% /usr/local/bin/php
48718 root        52    0 82632K 28496K accept  1   1:55   1.46% /usr/local/bin/php
29995 nobody      21    0 11400K  4020K select  1   2:54   1.27% /usr/local/sbin/dnsmasq --all-servers --rebind-localhost-ok --stop-dns-rebind --dns-forward-max=5000 --cache-size=100
54198 admin-itp   21    0 11384K  2908K CPU1    1   0:11   1.17% top -aSH
89404 root        26    0 82632K 28488K accept  0   0:42   0.49% /usr/local/bin/php
   12 root       -92    -     0K   224K WAIT    0  12:20   0.39% [intr{irq260: em1:tx 0}]
47366 root        20    0 13148K  5484K kqread  0   0:06   0.20% /usr/local/sbin/lighttpd -f /var/etc/lighty-test-CaptivePortal.conf

We also have a lot of these logs, don't know if it's linked:

Sep 28 12:52:42	lighttpd[47366]: (connections.c.137) (warning) close: 21 Connection reset by peer
Sep 28 12:52:51	lighttpd[47366]: (connections.c.137) (warning) close: 13 Connection reset by peer
Sep 28 12:53:05	lighttpd[47366]: (connections.c.137) (warning) close: 13 Connection reset by peer
Sep 28 12:53:15	lighttpd[47366]: (connections.c.137) (warning) close: 15 Connection reset by peer
Sep 28 12:53:46	lighttpd[47366]: (connections.c.137) (warning) close: 15 Connection reset by peer
Sep 28 12:53:49	lighttpd[47366]: (connections.c.137) (warning) close: 18 Connection reset by peer
Sep 28 12:54:11	lighttpd[47366]: (connections.c.137) (warning) close: 18 Connection reset by peer
Sep 28 12:54:35	lighttpd[47366]: (connections.c.137) (warning) close: 21 Connection reset by peer
Sep 28 12:54:57	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
Sep 28 12:55:02	lighttpd[47366]: (connections.c.137) (warning) close: 16 Connection reset by peer
Sep 28 12:55:15	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
Sep 28 12:55:33	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
Sep 28 12:55:54	lighttpd[47366]: (connections.c.137) (warning) close: 13 Connection reset by peer
Sep 28 12:55:57	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
Sep 28 12:56:21	lighttpd[47366]: (connections.c.137) (warning) close: 22 Connection reset by peer
Sep 28 12:56:23	lighttpd[47366]: (connections.c.137) (warning) close: 18 Connection reset by peer
Sep 28 12:57:05	lighttpd[47366]: (connections.c.137) (warning) close: 16 Connection reset by peer
Sep 28 12:57:18	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer

doktornotor

Disable the Captive portal thing.

karlzre

Do you thin kthe Platform is not powerfull enough ? ( we have 100-150 users)

heper

also might try

sysctl dev.em.0.debug=1
```from console

doktornotor

karlzre - Apparently not in your case when idiots/broken mobile shit keeps hammering the CP over and over again. (Other than that, you told us exactly NOTHING about your "platform"). And yeah, there's also something very broken about the NICs. Did you enable some nonsense like polling there?

karlzre

also might try
Code: [Select]
sysctl dev.em.0.debug=1
from console

dev.em.0.debug: -1 -> -1

Intel(R) Atom(TM) CPU N270 @ 1.60GHz
Current: 800 MHz, Max: 1600 MHz
2 CPUs: 1 package(s) x 1 core(s) x 2 HTT threads

Memory usage
13% of 2006 MB
SWAP usage
0% of 4096 MB
Disk usage
/ (ufs): 2% of 23G
/var/run (ufs in RAM): 3% of 3.4M

karlzre

Did you enable some nonsense like polling there?

Enable device polling is disabled.

doktornotor

@karlzre:

Intel(R) Atom(TM) CPU N270 @ 1.60GHz
Current: 800 MHz, Max: 1600 MHz
2 CPUs: 1 package(s) x 1 core(s) x 2 HTT threads

This CPU is just old slow POS, but that thing is running at half the freq it's spec'ed for with 100% CPU usage? :o Try fiddling with the powerd settings.

Also, post some information about the NICs.

SplineZ

@doktornotor:

This CPU is just old slow POS, but that thing is running at half the freq it's spec'ed for with 100% CPU usage? :o Try fiddling with the powerd settings.

Also, post some information about the NICs.

I have a bit more power behind my box and Ive run into the same thing.. Is powerd on minimum capable of causing this issue?

my machine gets loaded up with 60mbits of torrent/netflix traffic (that is my connection speed) from time to time and I see that [kernel{em0 taskq}] and [intr{swi5: fast taskq}] tasks taking up the majority of the available CPU.. Polling is turned off..

This machine as built from spare parts that I had laying around is it possible that im just running into a limitation with the hardware?

IC2D 2.13ghz, Asus OEM shitty mobo, 3gb ram, Intel Pro/1000 Legacy (external), Realtek 8169 (Internal)

doktornotor

@SplineZ:

I have a bit more power behind my box and Ive run into the same thing.. Is powerd on minimum capable of causing this issue?

You need to fiddle with whatever settings are available there. People here mentioned boxes where CPU is never run at full freq unless powerd is enabled somehow. Sure if set @minimum, that's a good candidate.