PFsense SLow and High CPU



  • Hi,

    We have a pfsense 2.2.4-RELEASE (i386)
    built on Sat Jul 25 19:56:41 CDT 2015
    FreeBSD 10.1-RELEASE-p15

    The cpu has a load of 100%

    Here is a top -S -H
    
    last pid: 35567;  load averages:  5.68,  6.43,  6.58                                        up 3+10:16:58  11:20:16
    165 processes: 6 running, 131 sleeping, 28 waiting
    CPU: 49.2% user,  0.0% nice, 18.1% system, 26.5% interrupt,  6.2% idle
    Mem: 70M Active, 207M Inact, 116M Wired, 89M Buf, 1578M Free
    Swap: 4096M Total, 4096M Free
    
      PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
    34796 root        85    0 14640K  9044K RUN     1   0:20  41.70% pfctl
       12 root       -92    -     0K   224K WAIT    0 217:21  32.57% intr{irq259: em1:rx 0}
       12 root       -92    -     0K   224K WAIT    1 222:34  21.58% intr{irq262: em2:rx 0}
    79377 root        52    0 82632K 28484K accept  1   0:23  18.46% php
    29027 root        34    0 82632K 28492K accept  1   0:28  12.26% php
    98220 root        47    0 82632K 28484K accept  0   2:04  10.16% php
    54636 root        34    0 82632K 28488K accept  0   1:27   9.28% php
    24411 root        37    0 86728K 36216K accept  1   1:55   9.18% php
     4965 root        39    0 82632K 28496K accept  1   0:35   7.08% php
    60367 root        20    0 86728K 35704K CPU0    0   2:09   6.79% php
    59102 root        25    0 82632K 28484K RUN     0   0:25   5.08% php
    29995 nobody      21    0 11400K  3908K select  1   0:35   3.27% dnsmasq
       11 root       155 ki31     0K    16K RUN     0  41.2H   1.56% idle{idle: cpu0}
     8330 root        31    0 86728K 36016K accept  0   1:16   1.56% php
       11 root       155 ki31     0K    16K RUN     1  42.7H   1.46% idle{idle: cpu1}
       12 root       -92    -     0K   224K WAIT    1  11:07   0.88% intr{irq260: em1:tx 0}
    85824 root        44    0 86728K 36256K accept  1   2:51   0.88% php
    
    systat -vmstat 1
     2 users    Load  4.87  5.05  5.78                  Sep 28 11:28
    
    Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
            Tot   Share      Tot    Share    Free           in   out     in   out
    Act  293116   15968  1327504    20168 1603584  count
    All  358248   30264  3676572   172716          pages
    Proc:                                                            Interrupts
      r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt 10836 total
      4          63       13k 1135 2497 8625   77 2190        cow       4 em5 uhci2
                                                          337 zfod    125 em4 uhci1
    25.6%Sys  36.3%Intr 33.2%User  5.0%Nice  0.0%Idle         ozfod       uhci0 ehci
    |    |    |    |    |    |    |    |    |    |           %ozfod  1115 cpu0:timer
    =============++++++++++++++++++>>>>>>>>>>>>>>>>>--        daefr  2464 em1:rx 0
                                            36 dtbuf          prcfr  2073 em1:tx 0
    Namei     Name-cache   Dir-cache    111445 desvn        1 totfr       em1:link
       Calls    hits   %    hits   %      1471 numvn          react  1857 em2:rx 0
         885     885 100                   229 frevn          pdwak  2076 em2:tx 0
                                                           35 pdpgs       em2:link
    Disks   md0  ada0  ada1 pass0 pass1                       intrn       ahci0 266
    KB/t   0.00  0.00  0.00  0.00  0.00                120160 wire   1122 cpu1:timer
    tps       1     0     0     0     0                 88296 act
    MB/s   0.00  0.00  0.00  0.00  0.00                207240 inact
    %busy     0     0     0     0     0                       cache
                                                      1603700 free
    
    netstat -m
    
    3688/2642/6330 mbufs in use (current/cache/total)
    3673/1647/5320/26584 mbuf clusters in use (current/cache/total/max)
    3673/1640 mbuf+clusters out of packet secondary zone in use (current/cache)
    0/44/44/13291 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/3938 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/2215 16k jumbo clusters in use (current/cache/total/max)
    8302K/4130K/12433K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    0/11/6656 sfbufs in use (current/peak/max)
    0 requests for sfbufs denied
    0 requests for sfbufs delayed
    0 requests for I/O initiated by sendfile
    

  • Banned

    0/ Please, use the CODE bbcodde (the fine # button) to post console output in a readable form.
    1/ Post the output of

    ps auxw | grep '[p]hp'
    


  • Sorry for the code

    Here is the output:

    ps auxw | grep '[p]hp'
    root      54636 18.2  1.4 82632 28488  -  S    10:50AM    2:34.58 /usr/local/bin/php
    root      79377 17.1  1.4 82632 28484  -  R    11:15AM    2:05.14 /usr/local/bin/php
    root      85143 15.0  1.4 82632 28480  -  S    11:31AM    0:32.80 /usr/local/bin/php
    root      15517 10.8  1.7 86728 35704  -  S    11:25AM    1:27.37 /usr/local/bin/php
    root      77242  7.2  1.6 86728 33032  -  D    11:37AM    0:04.15 /usr/local/bin/php -f /etc/rc.prunecaptiveportal test
    root      56754  7.0  1.4 82632 28488  -  S    10:57AM    1:19.90 /usr/local/bin/php
    root      70124  7.0  1.8 86824 36256  -  R    11:31AM    0:22.09 php-fpm: pool lighty (php-fpm)
    root      39322  6.8  1.5 86824 30476  -  R    11:31AM    0:21.20 php-fpm: pool lighty (php-fpm)
    root      80658  6.6  1.7 86824 35824  -  S    11:31AM    0:22.52 php-fpm: pool lighty (php-fpm)
    root       8330  3.6  1.8 86728 36020  -  S    10:32AM    1:43.94 /usr/local/bin/php
    root      85706  3.5  1.4 82632 28476  -  S    11:37AM    0:06.11 /usr/local/bin/php
    root      26757  3.2  1.4 82632 28480  -  S    11:26AM    0:44.44 /usr/local/bin/php
    root      20387  0.8  1.5 86824 31508  -  S    11:31AM    0:25.87 php-fpm: pool lighty (php-fpm)
    root      77335  0.4  1.4 82632 28476  -  S    11:37AM    0:06.43 /usr/local/bin/php
    root      12429  0.1  1.4 82632 28484  -  I    11:30AM    0:06.67 /usr/local/bin/php
    root      87715  0.1  1.4 82632 28484  -  I    10:53AM    0:39.54 /usr/local/bin/php
    root        253  0.0  0.9 78632 18512  -  Ss   Fri01AM    0:21.82 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
    root       4965  0.0  1.4 82632 28496  -  I    11:12AM    0:57.50 /usr/local/bin/php
    root      24411  0.0  1.8 86728 36220  -  I    10:32AM    2:14.04 /usr/local/bin/php
    root      56709  0.0  0.1 10076  1704  -  S    Fri01AM    0:01.14 minicron: helper /usr/local/sbin/gmirror_status_check.php  (minic
    root      80479  0.0  1.7 86728 35716  -  I    10:49AM    0:38.18 /usr/local/bin/php
    root      82250  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.37 /usr/local/bin/php
    root      83172  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.39 /usr/local/bin/php
    root      83392  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.40 /usr/local/bin/php
    root      83704  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.40 /usr/local/bin/php
    root      83948  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.42 /usr/local/bin/php
    root      84168  0.0  1.0 78536 20400  -  Is    9:37AM    0:00.44 /usr/local/bin/php
    root      84514  0.0  1.8 86728 36268  -  I     9:37AM    2:44.40 /usr/local/bin/php
    root      84694  0.0  1.8 86728 35960  -  I     9:37AM    2:29.28 /usr/local/bin/php
    root      84917  0.0  1.8 86728 36208  -  I     9:37AM    1:35.46 /usr/local/bin/php
    root      85437  0.0  1.7 86728 35724  -  I     9:37AM    2:37.51 /usr/local/bin/php
    
    


  • any chance you are getting attacked? what do the rrd graphs show for em1 & em2 ? are those wan's or lan's ?



  • em1= wan fiber Colt
    em2= customer network


  • Banned

    Well, you have a shitton of php processes running for god knows what reason.

    • Restart webConfigurator from console
    • Restart PHP-FPM from console
    • Limit the GUI processes if not done yet (System - Advanced - Max Processes).

    Also look at the dashboard whether the GW monitor does not show some totally BS figured, like 109% packet loss. If it does, kill apinger.



    • Restart webConfigurator from console

    Whats the right command to do it?

    • Restart PHP-FPM from console

    Whats the right command to do it?

    • Limit the GUI processes if not done yet (System - Advanced - Max Processes).

    It's set to 2, do i have to change it ?


  • Banned

    The right command is to use the fine console menu that's there (both via SSH and on physical console).



  • Hi,

    I've restart the 2 still 100% cpu

    last pid: 70334;  load averages:  7.58,  6.90,  7.28    up 3+11:37:44  12:41:02
    172 processes: 9 running, 135 sleeping, 28 waiting
    CPU: 38.2% user,  0.2% nice, 34.7% system, 26.9% interrupt,  0.0% idle
    Mem: 111M Active, 211M Inact, 119M Wired, 74M Buf, 1531M Free
    Swap: 4096M Total, 4096M Free
    
      PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       12 root       -92    -     0K   224K WAIT    0 235:41  24.27% [intr{irq259:
    49598 root        45    0 86728K 35944K RUN     0   2:07  19.68% /usr/local/bin
    93820 root        39    0 86728K 35680K accept  1   1:03  15.97% /usr/local/bin
       12 root       -92    -     0K   224K WAIT    0 239:00  11.67% [intr{irq262:
    10691 root        29    0 86728K 35716K accept  1   1:03  11.38% /usr/local/bin
     5449 root        46    0 82632K 28476K accept  1   0:05   9.57% /usr/local/bin
    41372 root        42    0 86728K 35724K accept  1   2:20   7.76% /usr/local/bin
    49355 root        74    0 86824K 33060K RUN     0   0:27   7.28% php-fpm: pool
    14204 root        74    0 86824K 33940K RUN     1   0:30   6.88% php-fpm: pool
    54828 root        52    0 86824K 33828K piperd  0   0:27   6.40% php-fpm: pool
    96397 root        52    0 86728K 33052K sysctl  0   0:04   6.40% /usr/local/bin
    48718 root        52    0 82632K 28496K accept  0   0:36   5.37% /usr/local/bin
     2200 root        52    0 86824K 35476K piperd  1   0:35   5.08% php-fpm: pool
    45926 root        25    0 17248K  6644K kqread  1   0:03   4.39% /usr/local/sbi
    29995 nobody      22    0 11400K  4020K select  1   2:35   3.47% /usr/local/sbi
    27091 root        26    0 82632K 28492K accept  0   0:09   2.49% /usr/local/bin
    54198 admin-itp   20    0 11384K  2724K CPU1    1   0:00   0.59% top -aSH
    [/code]
    
    

  • Banned

    The output with truncated paths is completely useless.



  • Sorry

    last pid: 78757;  load averages:  7.10,  7.19,  7.14                                                                                                           up 3+11:53:35  12:56:53
    167 processes: 9 running, 132 sleeping, 26 waiting
    CPU: 38.8% user,  0.0% nice, 44.4% system, 16.2% interrupt,  0.6% idle
    Mem: 143M Active, 202M Inact, 119M Wired, 85M Buf, 1509M Free
    Swap: 4096M Total, 4096M Free
    
      PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
    26911 root        49    0 82632K 28484K accept  1   2:01  25.59% /usr/local/bin/php
       12 root       -92    -     0K   224K CPU0    0 241:42  21.78% [intr{irq262: em2:rx 0}]
    90617 root        33    0 82632K 28484K CPU0    0   1:48  17.77% /usr/local/bin/php
       12 root       -92    -     0K   224K CPU0    0 238:40  14.06% [intr{irq259: em1:rx 0}]
    14204 root        75    0 99112K 50224K RUN     1   1:40  11.47% php-fpm: pool lighty (php-fpm)
     2200 root        75    0   101M 50364K RUN     1   1:50  10.69% php-fpm: pool lighty (php-fpm)
    93820 root        40    0 86728K 35728K accept  1   1:24  10.16% /usr/local/bin/php
    54828 root        75    0   101M 51096K RUN     0   1:42   9.57% php-fpm: pool lighty (php-fpm)
    49355 root        52    0 99112K 47252K nanslp  1   1:40   9.38% php-fpm: pool lighty (php-fpm)
     5449 root        27    0 82632K 28484K accept  1   1:40   7.76% /usr/local/bin/php
    10691 root        33    0 86728K 35732K accept  1   1:26   5.66% /usr/local/bin/php
    48718 root        52    0 82632K 28496K accept  1   1:55   1.46% /usr/local/bin/php
    29995 nobody      21    0 11400K  4020K select  1   2:54   1.27% /usr/local/sbin/dnsmasq --all-servers --rebind-localhost-ok --stop-dns-rebind --dns-forward-max=5000 --cache-size=100
    54198 admin-itp   21    0 11384K  2908K CPU1    1   0:11   1.17% top -aSH
    89404 root        26    0 82632K 28488K accept  0   0:42   0.49% /usr/local/bin/php
       12 root       -92    -     0K   224K WAIT    0  12:20   0.39% [intr{irq260: em1:tx 0}]
    47366 root        20    0 13148K  5484K kqread  0   0:06   0.20% /usr/local/sbin/lighttpd -f /var/etc/lighty-test-CaptivePortal.conf
    

    We also have a lot of these logs, don't know if it's linked:

    Sep 28 12:52:42	lighttpd[47366]: (connections.c.137) (warning) close: 21 Connection reset by peer
    Sep 28 12:52:51	lighttpd[47366]: (connections.c.137) (warning) close: 13 Connection reset by peer
    Sep 28 12:53:05	lighttpd[47366]: (connections.c.137) (warning) close: 13 Connection reset by peer
    Sep 28 12:53:15	lighttpd[47366]: (connections.c.137) (warning) close: 15 Connection reset by peer
    Sep 28 12:53:46	lighttpd[47366]: (connections.c.137) (warning) close: 15 Connection reset by peer
    Sep 28 12:53:49	lighttpd[47366]: (connections.c.137) (warning) close: 18 Connection reset by peer
    Sep 28 12:54:11	lighttpd[47366]: (connections.c.137) (warning) close: 18 Connection reset by peer
    Sep 28 12:54:35	lighttpd[47366]: (connections.c.137) (warning) close: 21 Connection reset by peer
    Sep 28 12:54:57	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
    Sep 28 12:55:02	lighttpd[47366]: (connections.c.137) (warning) close: 16 Connection reset by peer
    Sep 28 12:55:15	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
    Sep 28 12:55:33	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
    Sep 28 12:55:54	lighttpd[47366]: (connections.c.137) (warning) close: 13 Connection reset by peer
    Sep 28 12:55:57	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
    Sep 28 12:56:21	lighttpd[47366]: (connections.c.137) (warning) close: 22 Connection reset by peer
    Sep 28 12:56:23	lighttpd[47366]: (connections.c.137) (warning) close: 18 Connection reset by peer
    Sep 28 12:57:05	lighttpd[47366]: (connections.c.137) (warning) close: 16 Connection reset by peer
    Sep 28 12:57:18	lighttpd[47366]: (connections.c.137) (warning) close: 14 Connection reset by peer
    

  • Banned

    Disable the Captive portal thing.



  • Do you thin kthe Platform is not powerfull enough ? ( we have 100-150 users)



  • also might try

    sysctl dev.em.0.debug=1
    ```from console

  • Banned

    karlzre - Apparently not in your case when idiots/broken mobile shit keeps hammering the CP over and over again. (Other than that, you told us exactly NOTHING about your "platform").  And yeah, there's also something very broken about the NICs. Did you enable some nonsense like polling there?



  • also might try
    Code: [Select]
    sysctl dev.em.0.debug=1
    from console

    dev.em.0.debug: -1 -> -1
    
    
    Intel(R) Atom(TM) CPU N270 @ 1.60GHz
    Current: 800 MHz, Max: 1600 MHz
    2 CPUs: 1 package(s) x 1 core(s) x 2 HTT threads
    

    Memory usage
    13% of 2006 MB
    SWAP usage
    0% of 4096 MB
    Disk usage
    / (ufs): 2% of 23G
    /var/run (ufs in RAM): 3% of 3.4M



  • Did you enable some nonsense like polling there?

    Enable device polling is disabled.


  • Banned

    @karlzre:

    Intel(R) Atom(TM) CPU N270 @ 1.60GHz
    Current: 800 MHz, Max: 1600 MHz
    2 CPUs: 1 package(s) x 1 core(s) x 2 HTT threads
    

    This CPU is just old slow POS, but that thing is running at half the freq it's spec'ed for with 100% CPU usage?  :o Try fiddling with the powerd settings.

    Also, post some information about the NICs.



  • @doktornotor:

    This CPU is just old slow POS, but that thing is running at half the freq it's spec'ed for with 100% CPU usage?  :o Try fiddling with the powerd settings.

    Also, post some information about the NICs.

    I have a bit more power behind my box and Ive run into the same thing.. Is powerd on minimum capable of causing this issue?

    my machine gets loaded up with 60mbits of torrent/netflix traffic (that is my connection speed) from time to time and I see that [kernel{em0 taskq}] and [intr{swi5: fast taskq}] tasks taking up the majority of the available CPU.. Polling is turned off..

    This machine as built from spare parts that I had laying around is it possible that im just running into a limitation with the hardware?

    IC2D 2.13ghz, Asus OEM shitty mobo, 3gb ram, Intel Pro/1000 Legacy (external), Realtek 8169 (Internal)


  • Banned

    @SplineZ:

    I have a bit more power behind my box and Ive run into the same thing.. Is powerd on minimum capable of causing this issue?

    You need to fiddle with whatever settings are available there. People here mentioned boxes where CPU is never run at full freq unless powerd is enabled somehow. Sure if set @minimum, that's a good candidate.



  • You may also want to look at https://forum.pfsense.org/index.php?topic=59028.0 to remove those messages from the log and help cut down the noise.  Looks like they may not mean anything.



  • @doktornotor:

    @SplineZ:

    I have a bit more power behind my box and Ive run into the same thing.. Is powerd on minimum capable of causing this issue?

    You need to fiddle with whatever settings are available there. People here mentioned boxes where CPU is never run at full freq unless powerd is enabled somehow. Sure if set @minimum, that's a good candidate.

    Interestingly enough, I switched it to adaptive and the overall load on the machine has lowered a HUGE amount. Lets say from 6.0 to under 0.7-0.5..  Kind of ironic that the feature meant to save energy is actually causing an increase… (in this situation)

    Seriously, Turn on the profile "minimum" and max out your connection, your load will raise slowly.. turn it to Adaptive and watch it lower almost immediately. "[kernel{em0 taskq}]" is the task using all the cpu..

    
    With "Minimum" power setting on powerd
    
    last pid: 11607;  load averages:  3.36,  1.79,  0.99 
      PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
        0 root          -92    0     0K   192K CPU0    0 647:46  79.49% [kernel{em0 taskq}]
       12 root          -56    -     0K   272K WAIT    1 432:38  51.46% [intr{swi5: fast taskq}]
    87987 root          100   20 25160K  6464K CPU0    0   0:04  26.27% /sbin/pfctl -ss
       11 root          155 ki31     0K    32K RUN     1 390.8H  15.97% [idle{idle: cpu1}]
       11 root          155 ki31     0K    32K RUN     0 385.3H  11.18% [idle{idle: cpu0}]
    
    With "HiAdaptive" power setting on powerd
    
    last pid: 61673;  load averages:  0.53,  1.23,  0.89   
         PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       11 root          155 ki31     0K    32K RUN     1 390.9H  97.56% [idle{idle: cpu1}]
       11 root          155 ki31     0K    32K CPU0    0 385.3H  83.98% [idle{idle: cpu0}]
        0 root          -92    0     0K   192K -       0 648:19  10.16% [kernel{em0 taskq}]
       12 root          -56    -     0K   272K WAIT    1 433:00   5.66% [intr{swi5: fast taskq}]
       12 root          -92    -     0K   272K WAIT    1  58:56   0.98% [intr{irq19: em0 uhci1}]
       12 root          -92    -     0K   272K WAIT    0  40:54   0.78% [intr{irq16: re0 uhci3}]
    
    

    No other changes were made on the machine other then fiddling with the powerd settings..  I never would have thought to look there.. Kudos and thanks!



  • You're misunderstanding what're being reported. Load is relative. If your CPU is running at 100mhz, then it's not hard to reach 75% usage. The reason "load" is high when using minimum is because the CPU is being down-clocked, you are saving power.


Log in to reply