/usr/local/bin/rate taking 100% of CPU



  • Hi,

    I am using the latest pfsense on a netgate sg-4860. I have a 10/1 dsl connection, and I am trying to upload some files to a cloud file storage provider. The cloud uploader seems to saturate the link, and I think it is causing pfsense to stall out. Once the stall happens pfsense is very slugish to response to queries. Here is the output of the system activity when the stall occurs:

    last pid: 89803; load averages: 2.08, 2.07, 2.07 up 6+00:51:40 10:19:35
    220 processes: 9 running, 167 sleeping, 44 waiting

    Mem: 48M Active, 171M Inact, 802M Wired, 523M Buf, 6865M Free
    Swap: 16G Total, 16G Free
    
    
      PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
     5992 root     103    0  7004K  2492K CPU0    0 119.9H 100.00% /usr/local/bin/rate -i igb1 -nlq 1 -Aba 20
    92169 root     103    0  7004K  2492K CPU2    2  95.9H 100.00% /usr/local/bin/rate -i igb1 -nlq 1 -Aba 20
       11 root     155 ki31     0K    64K RUN     2 105.1H  71.29% [idle{idle: cpu2}]
       11 root     155 ki31     0K    64K RUN     3  86.6H  56.05% [idle{idle: cpu3}]
       11 root     155 ki31     0K    64K RUN     1  78.3H  43.99% [idle{idle: cpu1}]
       11 root     155 ki31     0K    64K RUN     0  90.7H  42.19% [idle{idle: cpu0}]
    85332 unbound   21    0 69152K 49000K CPU1    1   0:22   1.07% /usr/local/sbin/unbound -c /var/unbound/un
    85332 unbound   20    0 69152K 49000K kqread  3   0:08   0.68% /usr/local/sbin/unbound -c /var/unbound/un
    84909 root      52    0 98660K 41072K select  2   0:34   0.29% php-fpm: pool nginx (php-fpm){php-fpm}
    32840 root      20    0 96480K 41288K piperd  3   0:37   0.10% php-fpm: pool nginx (php-fpm)
       12 root     -60    -     0K   704K WAIT    0   5:01   0.00% [intr{swi4: clock (0)}]
    23665 root      20    0 51316K 38252K nanslp  3   2:49   0.00% /usr/local/bin/php -f /usr/local/pkg/pfblo
       27 root      16    -     0K    16K syncer  2   1:57   0.00% [syncer]
       20 root     -16    -     0K    16K -       3   1:36   0.00% [rand_harvestq]
    43804 root      52   20  6968K  2884K wait    1   1:32   0.00% /bin/sh /var/db/rrd/updaterrd.sh
    94164 root      24    0 96476K 39932K piperd  0   1:31   0.00% php-fpm: pool nginx (php-fpm)
       19 root     -16    -     0K    16K RUN     3   1:19   0.00% [pf purge]
    39493 root      20    0  6600K  2632K bpf     3   1:11   0.00% /usr/local/sbin/filterlog -i pflog0 -p /va
    

    The high load on the /usr/local/bin/rate programs continues even if I stop the upload. I have had to reboot pfsense to recover. When I do things come back to normal.

    Does anyone have any idea about what might be going on and how I can mitigate this?

    Thanks,

    Tom


  • Galactic Empire

    What packages are you running, I don't see the rate command running?



  • The /usr/local/bin/rate commands are the in the first two lines of the top output, both pegged at 100%.

    I have pfBlockerNG, Snort and Status_Traffic_Totals. I had the 'Traffic Graph' window open and was looking at the upload speeds when the problem occurred.


  • Netgate Administrator

    I would disable those packages until it stops happening. Given what 'rate' does I'm most suspicious of the traffic totals package. It could also have been the data on the traffic graphs page calling it though.

    Steve


  • Galactic Empire

    @stephenw10 said in /usr/local/bin/rate taking 100% of CPU:

    I would disable those packages until it stops happening. Given what 'rate' does I'm most suspicious of the traffic totals package. It could also have been the data on the traffic graphs page calling it though.

    Steve

    I run pfBlockerNG, Snort and Status_Traffic_Totals I've never seen the rate command being run, odd.


  • Netgate Administrator

    Yeah, I do too. There is obviously something additional at play here.

    The rate command getting stuck at 100% seems to occasionally pop up for some users. I don't think we've ever been able to replicate it.

    Steve



  • When the stall happened I was looking at the 'Traffic Graph' page, viewing the amount of outbound traffic to see how fast it was pumping it out.

    During the stall the traffic graph also stalled updating the display. Once it seemed to recover, and the traffic graph updated several times in quick succession (several times a second) before getting back to normal.

    I am a bit suspicious that the this is a heisenbug that only shows up when I'm watching the traffic graph. I will do a few trials with the traffic graph window open and closed to see if a pattern emerges.

    BTW, what does the 'rate' command do? What package is it in? Where can I find docs about it?


  • Galactic Empire

    @tmoore said in /usr/local/bin/rate taking 100% of CPU:

    BTW, what does the 'rate' command do? What package is it in? Where can I find docs about it?

    [2.4.4-RELEASE][admin@pfsense]/root: rate -h
    rate 0.9 - Mateusz 'mteg' Golicz <mtg@elsat.net.pl>, 2003
    usage: rate [-h | -?]
           rate [mode select option] [-h | -?]
           rate -L <name>
           rate [filtering/generic options] [mode select option] [mode options]
    
    MODES COMPILED INTO THIS BINARY OF RATE:
     -R  Rate estimator
         Use this mode to estimate bandwidth utilized by packets matching
         given filtering options. This is the default mode.
    
     -A  Bandwidth abusers
         This mode is for determining IPs of hosts that consume the highest
         amount of available bandwidth.
    
     -T  Stream analyzer
         Using this mode you can have a deeper look on TCP connections and
         ICMP and UDP streams detected on an interface.
    
     -E  Regular expression extractor
         Use this mode to extract strings from packets.
    
    
    GENERIC OPTIONS:
      -h  -?    Show this help
      -r <t>    Print reports every <t> seconds (default: 1)
      -g        Dump reports on SIGUSR1, ignore timing.
      -k        Dump reports on newline on stdin, ignore timing.
      -q <r>    Quit after printing r reports.
      -l        Make stdout line-buffered.
      -p <pref> Datalink layer header size (gets substracted from each packet size, default: 14)
      -s <b>    Capture l bytes (default: 40)
      -i <int>  Bind to interface <int> - default eth0
      -P	    Bring the interface into promiscuous mode
      -n        Numeric IPs. Don't do reverse DNS lookups.
      -c        Use colors (ANSI-compatible) whenever possible.
      -v        Print exact values, do not use SI prefixes.
      -e        Output a separator after every report to improve readability.
      -w        Clear the screen before dumping a report.
      -S <name> Save operation mode (saves all specified command line options).
      -L <name> Recall operation mode (recalls previously saved option set).
    
    FILTERING OPTIONS:
      -f <bpf>  BPF filter expression to use
      -x <rege> Match this regex in packet (increases capture length to at least 1500b)
      -0 n      Replace nul character with this before doing regex matches (default: '@')
    [2.4.4-RELEASE][admin@pfsense]/root: 
    

  • Netgate Administrator

    Also:
    https://www.freebsd.org/cgi/man.cgi?query=rate&apropos=0&sektion=1&manpath=FreeBSD+11.2-RELEASE+and+Ports&arch=default&format=html

    It could well be related to the Traffic Graphs page where it shows flow info for IPs on that interface also.
    You might try using the traffic graphs widget instead which does not display that.

    Steve


Log in to reply