Possible memory leak in bandwidthd

ttblum

Hello,

I have 2.2.1 on an Alix board with a Hifn 7955 card and bandwidthd installed. State table is 2% (349/22000), usable memory is shown as 223 MB, Load average 0.75, 0.88, 0.82

It looks like there was a memory leak (starting 4/23) that finally exhausted all the RAM yesterday. I logged in yesterday to check on a bandwidth usage issue, and it killed charon and took down an IPSec tunnel:


Apr 30 11:17:48	php-fpm[7217]: /index.php: Successful login for user 'yyyyyyyy' from: xx.xx.xxx.xx
Apr 30 11:17:48	php-fpm[7217]: /index.php: Successful login for user 'yyyyyyyy' from: xx.xx.xxx.xx
Apr 30 11:17:50	kernel: pid 34148 (charon), uid 0, was killed: out of swap space

Apr 30 11:17:55 charon: 00[DMN] Starting IKE charon daemon (strongSwan 5.2.1, FreeBSD 10.1-RELEASE-p6, i386)
Apr 30 11:17:56 charon: 00[KNL] unable to set UDP_ENCAP: Invalid argument
Apr 30 11:17:56 charon: 00[NET] enabling UDP decapsulation for IPv6 on port 4500 failed
Apr 30 11:17:56 charon: 00[CFG] ipseckey plugin is disabled
...

Both the remote router and this router showed the IPSec tunnel as up, but no traffic would pass. The tunnel eventually came back up after about 30 minutes of fiddling with IPSec settings and uninstalling bandwidthd.

Has anyone else had this sort of trouble on Alix? I think I did installed bandwidthd on 4/23.

I just reduced Max Process from 2 to 1.
Block bogons is enabled on WAN, maximum table entries is default 200000. Memory usage is now shown as 46% of 223 MB.

Alix-RAM-issue.png_thumb

doktornotor

pfBlockerNG is not a service -> cannot leak. When you create too many aliases or run too many services, you eventually run out of RAM on low-end limited boxes like the Alix.

(Getting rid of the bogons is a good idea. The IPv6 part is huuuuge.)

BBcan177

Hi ttblum,

Run the 'top' command from the Shell and see if you see any process using large amounts of cpu/memory… What other packages are you using?

ttblum

Here's the output of top. bandwidthd was the only package installed, so there aren't any packages installed now.


last pid: 16234;  load averages:  0.94,  0.88,  0.85                                                                                up 18+02:14:41  13:53:09
40 processes:  1 running, 39 sleeping
CPU:  0.4% user,  0.0% nice,  1.5% system,  1.9% interrupt, 96.3% idle
Mem: 11M Active, 72M Inact, 80M Wired, 296K Cache, 30M Buf, 57M Free
Swap: 

  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
65326 root        1  20    0 25116K 16592K select   0:05   0.00% bsnmpd
 4891 root        1  20    0 13084K 13112K select   0:02   0.00% ntpd
44424 unbound     1  20    0 21468K 12120K kqread   0:13   0.00% unbound
34703 root       17  20    0 26216K 11028K uwait    0:03   0.00% charon
52520 dhcpd       1  20    0 16348K  9072K select   0:01   0.00% dhcpd
  284 root        1  20    0 32368K  7668K kqread   1:34   0.00% php-fpm
25135 root        1  20    0 13148K  6784K kqread   1:23   0.00% lighttpd
79669 root        1  20    0 15912K  6252K select   0:02   0.00% sshd
13074 root        1  20    0 13160K  3780K select   0:00   0.00% sshd
  320 root        1  20    0  8980K  3244K select   0:03   0.00% devd
55527 root        1  20    0 10956K  2892K pause    0:00   0.00% tcsh
 6646 root        1  20    0 11392K  2552K RUN      0:04   0.20% top
33846 root        1  20    0 10760K  2536K select   0:00   0.00% starter
60678 root        1  52   20 10592K  2136K wait     0:03   0.00% sh
10576 root        1  20    0 11576K  2056K piperd   0:30   0.00% rrdtool
 2965 root        1  20    0 10300K  2056K select   0:04   0.00% syslogd
80584 root        1  20    0 10592K  1944K wait     0:00   0.00% sh
 7667 root        1  20    0 10292K  1920K select   0:00   0.00% inetd
80230 root        2  20    0 10372K  1916K nanslp   0:00   0.00% sshlockout_pf
29608 root        1  20    0 10168K  1904K select   1:20   0.00% radvd
66422 root        2  20    0 10372K  1880K nanslp   0:00   0.00% sshlockout_pf
 7160 root        1  20    0 10364K  1872K bpf      9:31   0.00% filterlog
13406 root        2  20    0 10372K  1868K nanslp   0:00   0.00% sshlockout_pf
66564 root        1  52    0 10592K  1852K ttyin    0:00   0.00% sh
66680 root        1  52    0 10592K  1852K ttyin    0:00   0.00% sh
10405 root        1  20    0 10132K  1796K select   8:18   0.00% apinger
16094 root        1  52   20  5984K  1660K nanslp   0:00   0.00% sleep
51069 root        1  20    0 10244K  1036K nanslp   0:05   0.00% cron
  299 root        1  40   20 10524K   992K kqread   0:00   0.00% check_reload_status
53534 root        1  52    0 10084K   340K nanslp   0:01   0.00% minicron
54160 root        1  20    0 10084K   340K nanslp   0:00   0.00% minicron
54522 root        1  20    0 10084K   316K nanslp   0:00   0.00% minicron
66494 root        1  52    0 10592K     0K wait     0:00   0.00% <sh>66522 root        1  52    0 10592K     0K wait     0:00   0.00% <sh>66075 root        1  52    0 10584K     0K wait     0:00   0.00% <login>66030 root        1  52    0 10584K     0K wait     0:00   0.00% <login>301 root        1  52   20 10524K     0K kqread   0:00   0.00% <check_reload_stat></check_reload_stat></login></login></sh></sh>

BBcan177

You need to run the 'top' command when you are noticing issues… Then you can start to diagnose the issue. Also keep an eye on the logs...

ttblum

Ok.

Why did the kernel choose to kill the charon process instead of any of the other processes?

phil.davis

"out of swap space" on Alix means "out of real memory" because there is zero swap space. When every page of memory is used, the next process that wants to allocate a page of memory will have the allocation request fail and the kernel will kill it. The process that this happens to is "just lucky" :)

As others have said above, you need to monitor the memory use from time to time and see if there is a process that is gradually using more and more memory.

jimp

There is a slight leak somewhere in strongSwan that we haven't tracked down yet, which is probably why charon was the one to die since it couldn't get more memory.

Most systems have enough RAM that it goes unnoticed, but ALIX is severely limited by today's standards.