Possible memory leak in bandwidthd



  • Hello,

    I have 2.2.1 on an Alix board with a Hifn 7955 card and bandwidthd installed.  State table is 2% (349/22000), usable memory is shown as 223 MB, Load average 0.75, 0.88, 0.82

    It looks like there was a memory leak (starting 4/23) that finally exhausted all the RAM yesterday.  I logged in yesterday to check on a bandwidth usage issue, and it killed charon and took down an IPSec tunnel:

    
    Apr 30 11:17:48	php-fpm[7217]: /index.php: Successful login for user 'yyyyyyyy' from: xx.xx.xxx.xx
    Apr 30 11:17:48	php-fpm[7217]: /index.php: Successful login for user 'yyyyyyyy' from: xx.xx.xxx.xx
    Apr 30 11:17:50	kernel: pid 34148 (charon), uid 0, was killed: out of swap space
    
    Apr 30 11:17:55 charon: 00[DMN] Starting IKE charon daemon (strongSwan 5.2.1, FreeBSD 10.1-RELEASE-p6, i386)
    Apr 30 11:17:56 charon: 00[KNL] unable to set UDP_ENCAP: Invalid argument
    Apr 30 11:17:56 charon: 00[NET] enabling UDP decapsulation for IPv6 on port 4500 failed
    Apr 30 11:17:56 charon: 00[CFG] ipseckey plugin is disabled
    ...
    
    

    Both the remote router and this router showed the IPSec tunnel as up, but no traffic would pass.  The tunnel eventually came back up after about 30 minutes of fiddling with IPSec settings and uninstalling bandwidthd.

    Has anyone else had this sort of trouble on Alix?  I think I did installed bandwidthd on 4/23.

    I just reduced Max Process from 2 to 1. 
    Block bogons is enabled on WAN, maximum table entries is default 200000.  Memory usage is now shown as 46% of 223 MB.


  • Banned

    pfBlockerNG is not a service -> cannot leak. When you create too many aliases or run too many services, you eventually run out of RAM on low-end limited boxes like the Alix.

    (Getting rid of the bogons is a good idea. The IPv6 part is huuuuge.)


  • Moderator

    Hi ttblum,

    Run the 'top' command from the Shell and see if you see any process using large amounts of cpu/memory… What other packages are you using?



  • Here's the output of top.  bandwidthd was the only package installed, so there aren't any packages installed now.

    
    last pid: 16234;  load averages:  0.94,  0.88,  0.85                                                                                up 18+02:14:41  13:53:09
    40 processes:  1 running, 39 sleeping
    CPU:  0.4% user,  0.0% nice,  1.5% system,  1.9% interrupt, 96.3% idle
    Mem: 11M Active, 72M Inact, 80M Wired, 296K Cache, 30M Buf, 57M Free
    Swap: 
    
      PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
    65326 root        1  20    0 25116K 16592K select   0:05   0.00% bsnmpd
     4891 root        1  20    0 13084K 13112K select   0:02   0.00% ntpd
    44424 unbound     1  20    0 21468K 12120K kqread   0:13   0.00% unbound
    34703 root       17  20    0 26216K 11028K uwait    0:03   0.00% charon
    52520 dhcpd       1  20    0 16348K  9072K select   0:01   0.00% dhcpd
      284 root        1  20    0 32368K  7668K kqread   1:34   0.00% php-fpm
    25135 root        1  20    0 13148K  6784K kqread   1:23   0.00% lighttpd
    79669 root        1  20    0 15912K  6252K select   0:02   0.00% sshd
    13074 root        1  20    0 13160K  3780K select   0:00   0.00% sshd
      320 root        1  20    0  8980K  3244K select   0:03   0.00% devd
    55527 root        1  20    0 10956K  2892K pause    0:00   0.00% tcsh
     6646 root        1  20    0 11392K  2552K RUN      0:04   0.20% top
    33846 root        1  20    0 10760K  2536K select   0:00   0.00% starter
    60678 root        1  52   20 10592K  2136K wait     0:03   0.00% sh
    10576 root        1  20    0 11576K  2056K piperd   0:30   0.00% rrdtool
     2965 root        1  20    0 10300K  2056K select   0:04   0.00% syslogd
    80584 root        1  20    0 10592K  1944K wait     0:00   0.00% sh
     7667 root        1  20    0 10292K  1920K select   0:00   0.00% inetd
    80230 root        2  20    0 10372K  1916K nanslp   0:00   0.00% sshlockout_pf
    29608 root        1  20    0 10168K  1904K select   1:20   0.00% radvd
    66422 root        2  20    0 10372K  1880K nanslp   0:00   0.00% sshlockout_pf
     7160 root        1  20    0 10364K  1872K bpf      9:31   0.00% filterlog
    13406 root        2  20    0 10372K  1868K nanslp   0:00   0.00% sshlockout_pf
    66564 root        1  52    0 10592K  1852K ttyin    0:00   0.00% sh
    66680 root        1  52    0 10592K  1852K ttyin    0:00   0.00% sh
    10405 root        1  20    0 10132K  1796K select   8:18   0.00% apinger
    16094 root        1  52   20  5984K  1660K nanslp   0:00   0.00% sleep
    51069 root        1  20    0 10244K  1036K nanslp   0:05   0.00% cron
      299 root        1  40   20 10524K   992K kqread   0:00   0.00% check_reload_status
    53534 root        1  52    0 10084K   340K nanslp   0:01   0.00% minicron
    54160 root        1  20    0 10084K   340K nanslp   0:00   0.00% minicron
    54522 root        1  20    0 10084K   316K nanslp   0:00   0.00% minicron
    66494 root        1  52    0 10592K     0K wait     0:00   0.00% <sh>66522 root        1  52    0 10592K     0K wait     0:00   0.00% <sh>66075 root        1  52    0 10584K     0K wait     0:00   0.00% <login>66030 root        1  52    0 10584K     0K wait     0:00   0.00% <login>301 root        1  52   20 10524K     0K kqread   0:00   0.00% <check_reload_stat></check_reload_stat></login></login></sh></sh> 
    

  • Moderator

    You need to run the 'top' command when you are noticing issues… Then you can start to diagnose the issue. Also keep an eye on the logs...



  • Ok.

    Why did the kernel choose to kill the charon process instead of any of the other processes?



  • "out of swap space" on Alix means "out of real memory" because there is zero swap space. When every page of memory is used, the next process that wants to allocate a page of memory will have the allocation request fail and the kernel will kill it. The process that this happens to is "just lucky" :)

    As others have said above, you need to monitor the memory use from time to time and see if there is a process that is gradually using more and more memory.


  • Rebel Alliance Developer Netgate

    There is a slight leak somewhere in strongSwan that we haven't tracked down yet, which is probably why charon was the one to die since it couldn't get more memory.

    Most systems have enough RAM that it goes unnoticed, but ALIX is severely limited by today's standards.


Log in to reply