High Load Troubleshooting - RCC-VE 2440 2.2.5



  • Hi,

    I read https://doc.pfsense.org/index.php/High_Load_Troubleshooting as I am trying to track down an elusive CPU usage problem on a new RCC-VE2440 / 2.2.5 build I'm using at home.  Once in a while after some as-yet-undetermined set of events, the CPU usage on the dashboard starts hovering around 20% and not dropping down. It will dip to 18, up to 21 etc but generally just stay "high" even when not under any load.

    I have tried looking at output from "top -aSH" as well as "systat -vmstat 1" and I honestly don't know how to track down the culprit (nothing stands out).

    I have PowerD enabled (HiAdaptive) and also the NUT package installed attached to an APC UPS via USB (I suspected NUT/USB drivers but even stopping the nut daemon and unplugging the USB cord the CPU usage remained high)

    Is there any more detail on how to "drill in" to what's causing this high-cpu issue? Do long-term detailed CPU monitoring? thanks



  • @luckman212:

    I have tried looking at output from "top -aSH" as well as "systat -vmstat 1" and I honestly don't know how to track down the culprit (nothing stands out).

    Post the output of those and we can tell you. Otherwise no way to even guess.



  • sorry- I had to wait for the problem to occur again
    here is some output:

    top -aSH

    last pid: 42450;  load averages:  0.10,  0.07,  0.02
    122 processes: 3 running, 93 sleeping, 26 waiting
    CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    Mem: 40M Active, 73M Inact, 150M Wired, 167M Buf, 3636M Free
    Swap: 8192M Total, 8192M Free
    
      PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       11 root     155 ki31     0K    32K RUN     1 179:29 100.00% [idle{idle: cpu1}]
       11 root     155 ki31     0K    32K CPU0    0 178:18 100.00% [idle{idle: cpu0}]
     3828 root      52    0   233M 32824K accept  1   0:00   0.59% php-fpm: pool lighty (php-fpm)
        0 root     -16    0     0K   288K swapin  1   0:36   0.00% [kernel{swapper}]
       12 root     -92    -     0K   416K WAIT    0   0:31   0.00% [intr{irq259: igb1:que}]
       12 root     -92    -     0K   416K WAIT    0   0:22   0.00% [intr{irq256: igb0:que}]
       15 root     -16    -     0K    16K -       0   0:13   0.00% [rand_harvestq]
        5 root     -16    -     0K    16K pftm    1   0:10   0.00% [pf purge]
       12 root     -60    -     0K   416K WAIT    0   0:10   0.00% [intr{swi4: clock}]
    22002 root      20    0 12456K  2172K select  0   0:09   0.00% /usr/local/sbin/apinger -c /var/etc/apinger.conf
    86962 root      20    0 18984K  2688K select  0   0:09   0.00% /usr/local/libexec/nut/usbhid-ups -a ups
    91411 root      20    0 18984K  2688K select  0   0:08   0.00% /usr/local/libexec/nut/usbhid-ups -a ups
       12 root     -92    -     0K   416K WAIT    1   0:08   0.00% [intr{irq257: igb0:que}]
       12 root     -92    -     0K   416K WAIT    1   0:08   0.00% [intr{irq260: igb1:que}]
    71277 root      20    0 55628K  6108K select  0   0:06   0.00% sshd: admin@pts/0 (sshd)
       16 root     -68    -     0K    64K -       1   0:05   0.00% [usb{usbus0}]
    50575 root      20    0 14540K  1992K select  0   0:04   0.00% /usr/sbin/powerd -b hadp -a hadp -n hadp
       12 root     -88    -     0K   416K WAIT    0   0:04   0.00% [intr{irq22: ehci0}]
    90883 root      20    0 18844K  2488K select  1   0:03   0.00% /usr/local/sbin/upsd -u root
    48457 root      52   20 17136K  2436K wait    0   0:03   0.00% /bin/sh /var/db/rrd/updaterrd.sh
    10588 root      20    0 28344K 18104K select  1   0:03   0.00% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid{ntpd}
       20 root      16    -     0K    16K syncer  1   0:02   0.00% [syncer]
       16 root     -72    -     0K    64K -       1   0:02   0.00% [usb{usbus0}]
    27214 root      20    0 48692K  6036K kqread  0   0:02   0.00% /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigurator.conf
      244 root      44    0   229M 21852K kqread  0   0:02   0.00% php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
       12 root     -72    -     0K   416K WAIT    1   0:01   0.00% [intr{swi1: netisr 1}]
    74277 root      20    0 17476K  4056K pause   1   0:01   0.00% /bin/tcsh
    97371 uucp      20    0 18832K  2520K nanslp  0   0:01   0.00% /usr/local/sbin/upsmon ups@localhost
        7 root     -16    -     0K    16K psleep  0   0:01   0.00% [pagedaemon]
       14 root      -8    -     0K    48K -       0   0:01   0.00% [geom{g_event}]
       12 root     -88    -     0K   416K WAIT    1   0:01   0.00% [intr{irq268: ahci0}]
       16 root     -68    -     0K    64K -       0   0:01   0.00% [usb{usbus0}]
    17661 root      20    0 16804K  2200K bpf     0   0:01   0.00% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
       62 root      -8    -     0K    16K mdwait  1   0:01   0.00% [md0]
       18 root      20    -     0K    32K sdflus  0   0:01   0.00% [bufdaemon{/ worker}]
    38529 root      20    0 14532K  2228K select  0   0:01   0.00% /usr/local/sbin/radvd -p /var/run/radvd.pid -C /var/etc/radvd.conf -m syslog
    22306 root      20    0 28328K  3004K piperd  0   0:00   0.00% rrdtool -
    65334 root      20    0 14656K  2336K select  1   0:00   0.00% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /var/etc/syslog.conf
       18 root     -16    -     0K    32K psleep  0   0:00   0.00% [bufdaemon{bufdaemon}]
       19 root     -16    -     0K    16K vlruwt  0   0:00   0.00% [vnlru]
    84285 proxy     20    0 12448K  2124K kqread  0   0:00   0.00% /usr/sbin/ftp-proxy
        4 root     -16    -     0K    32K -       0   0:00   0.00% [cam{doneq0}]
    99065 unbound   20    0 47036K 14404K kqread  1   0:00   0.00% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
    90103 root      20    0 21988K  3232K CPU1    1   0:00   0.00% top -aSH
    99065 unbound   20    0 47036K 14404K kqread  1   0:00   0.00% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        0 root       8    0     0K   288K -       1   0:00   0.00% [kernel{thread taskq}]
    27598 root      20    0 16664K  2300K nanslp  0   0:00   0.00% /usr/sbin/cron -s
     2007 root      20    0 14560K  2184K nanslp  0   0:00   0.00% /usr/libexec/getty al.Pc ttyv0
        9 root     -16 ki-1     0K    16K pollid  0   0:00   0.00% [idlepoll]
        4 root     -16    -     0K    32K -       0   0:00   0.00% [cam{scanner}]
        0 root       8    0     0K   288K -       0   0:00   0.00% [kernel{ffs_trim taskq}]
        1 root      52    0  9112K   792K wait    1   0:00   0.00% [init]
      276 root      20    0 13160K  4476K select  0   0:00   0.00% /sbin/devd -q
    
    

    systat -vmstat 1

        2 users    Load  0.08  0.09  0.03                  Dec  8 01:04
    
    Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
            Tot   Share      Tot    Share    Free           in   out     in   out
    Act  125912    9412  1200760    10336 3713544  count            
    All  205768   14056  1261620    59092          pages            
    Proc:                                                            Interrupts
      r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        ioflt  1645 total
      1          51      1688 4850 8767   98   26 5786    915 cow         uart1 3
                                                         3299 zfod     12 ehci0 22
    19.8%Sys   0.4%Intr 17.1%User  0.0%Nice 62.8%Idle         ozfod  1128 cpu0:timer
    |    |    |    |    |    |    |    |    |    |           %ozfod    35 igb0:que 0
    ==========>>>>>>>>>                                       daefr    10 igb0:que 1
                                            31 dtbuf     1758 prcfr       igb0:link
    Namei     Name-cache   Dir-cache     85468 desvn     3227 totfr    31 igb1:que 0
       Calls    hits   %    hits   %      2778 numvn          react    11 igb1:que 1
        4545    4535 100       2   0       492 frevn          pdwak       igb1:link
                                                           17 pdpgs       ahci0 268
    Disks   md0   da0  ada0 pass0 pass1                       intrn   418 cpu1:timer
    KB/t   0.00  0.00  0.00  0.00  0.00                158452 wire
    tps      12     0     0     0     0                 44712 act
    MB/s   0.00  0.00  0.00  0.00  0.00                 76524 inact
    %busy     0     0     0     0     0                       cache
                                                      3713560 free
                                                       175472 buf
    
    


  • I think it's hard to capture what I'm seeing here in a screen full of text.  So I made a small screen recording showing the dashboard with a systat -vmstat 0.2 running on top of it.  You can see there are these quick "surges" of activity and the dashboard CPU stays pretty steady around 20% even though the system is 99%+ idle.  This is what I am trying to track down if you have any other ideas

    pfsense225.mp4.zip



  • After 24h the Processor RRD shows almost no CPU usage…

    even though dashboard shows high load avgs … 0.11, 0.16, 0.10

    mem usage seems normal, state table very small etc.
    Not sure what's going on??


  • Banned

    I think the dashboard is a CPU load generator, not exactly something usable for serious monitoring.



  • Yes I know that the php scripts etc on the dashboard cause some load but 20% continuous is not something I'm seeing on the other systems I've set up.  Also, I can observe the same cpu spikes with systat (see video above) so something is definitely going on here.



  • 0.1 is not a high load average at all, that's very low. Anything under 1 is generally normal, and values above that can be specious.

    Brief small spikes of 20% that are so trivial the 1 minute average is under 0.5% total CPU usage are also normal. The RRD stats gathering for instance will chew a bit of CPU for a fraction of a second once a minute to pull the stats. Other things will do similar depending on what you're running. You're not seeing 20% continuous. Everything there looks perfectly normal.



  • It's weird though because of many many other systems that I've set up, I've not seen one that idles at 20% on the dashboard. They are all at 0-2%.  I understand spikes to 20% here and there when pages are loaded or background tasks are run but to have it sit steadily at that level is something I have not seen.  Where is the code that generates the load % on the dashboard? Maybe somehow it's not giving accurate readings? I mean in this case if you look at the RRD graph it certainly doesn't seem to correlate with what the dash is showing…



  • Maybe your dashboard widget isn't actually updating? The initial load is always higher but falls off quickly as it updates. If it stays at exactly the same value, it's definitely not updating, which would be your browser's AJAX. Try clearing cache, or a diff browser or machine, see if that's any different.



  • I had thought that too but if you look at the video I posted, you can see the dash value is actually updating, it's just staying steadily elevated. I think I am missing the FreeBSD mojo to dig into what is causing this… interrupts??


Log in to reply