Possible memory leak in bandwidthd
-
Hello,
I have 2.2.1 on an Alix board with a Hifn 7955 card and bandwidthd installed. State table is 2% (349/22000), usable memory is shown as 223 MB, Load average 0.75, 0.88, 0.82
It looks like there was a memory leak (starting 4/23) that finally exhausted all the RAM yesterday. I logged in yesterday to check on a bandwidth usage issue, and it killed charon and took down an IPSec tunnel:
Apr 30 11:17:48 php-fpm[7217]: /index.php: Successful login for user 'yyyyyyyy' from: xx.xx.xxx.xx Apr 30 11:17:48 php-fpm[7217]: /index.php: Successful login for user 'yyyyyyyy' from: xx.xx.xxx.xx Apr 30 11:17:50 kernel: pid 34148 (charon), uid 0, was killed: out of swap space Apr 30 11:17:55 charon: 00[DMN] Starting IKE charon daemon (strongSwan 5.2.1, FreeBSD 10.1-RELEASE-p6, i386) Apr 30 11:17:56 charon: 00[KNL] unable to set UDP_ENCAP: Invalid argument Apr 30 11:17:56 charon: 00[NET] enabling UDP decapsulation for IPv6 on port 4500 failed Apr 30 11:17:56 charon: 00[CFG] ipseckey plugin is disabled ...
Both the remote router and this router showed the IPSec tunnel as up, but no traffic would pass. The tunnel eventually came back up after about 30 minutes of fiddling with IPSec settings and uninstalling bandwidthd.
Has anyone else had this sort of trouble on Alix? I think I did installed bandwidthd on 4/23.
I just reduced Max Process from 2 to 1.
Block bogons is enabled on WAN, maximum table entries is default 200000. Memory usage is now shown as 46% of 223 MB.
-
pfBlockerNG is not a service -> cannot leak. When you create too many aliases or run too many services, you eventually run out of RAM on low-end limited boxes like the Alix.
(Getting rid of the bogons is a good idea. The IPv6 part is huuuuge.)
-
Hi ttblum,
Run the 'top' command from the Shell and see if you see any process using large amounts of cpu/memory… What other packages are you using?
-
Here's the output of top. bandwidthd was the only package installed, so there aren't any packages installed now.
last pid: 16234; load averages: 0.94, 0.88, 0.85 up 18+02:14:41 13:53:09 40 processes: 1 running, 39 sleeping CPU: 0.4% user, 0.0% nice, 1.5% system, 1.9% interrupt, 96.3% idle Mem: 11M Active, 72M Inact, 80M Wired, 296K Cache, 30M Buf, 57M Free Swap: PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 65326 root 1 20 0 25116K 16592K select 0:05 0.00% bsnmpd 4891 root 1 20 0 13084K 13112K select 0:02 0.00% ntpd 44424 unbound 1 20 0 21468K 12120K kqread 0:13 0.00% unbound 34703 root 17 20 0 26216K 11028K uwait 0:03 0.00% charon 52520 dhcpd 1 20 0 16348K 9072K select 0:01 0.00% dhcpd 284 root 1 20 0 32368K 7668K kqread 1:34 0.00% php-fpm 25135 root 1 20 0 13148K 6784K kqread 1:23 0.00% lighttpd 79669 root 1 20 0 15912K 6252K select 0:02 0.00% sshd 13074 root 1 20 0 13160K 3780K select 0:00 0.00% sshd 320 root 1 20 0 8980K 3244K select 0:03 0.00% devd 55527 root 1 20 0 10956K 2892K pause 0:00 0.00% tcsh 6646 root 1 20 0 11392K 2552K RUN 0:04 0.20% top 33846 root 1 20 0 10760K 2536K select 0:00 0.00% starter 60678 root 1 52 20 10592K 2136K wait 0:03 0.00% sh 10576 root 1 20 0 11576K 2056K piperd 0:30 0.00% rrdtool 2965 root 1 20 0 10300K 2056K select 0:04 0.00% syslogd 80584 root 1 20 0 10592K 1944K wait 0:00 0.00% sh 7667 root 1 20 0 10292K 1920K select 0:00 0.00% inetd 80230 root 2 20 0 10372K 1916K nanslp 0:00 0.00% sshlockout_pf 29608 root 1 20 0 10168K 1904K select 1:20 0.00% radvd 66422 root 2 20 0 10372K 1880K nanslp 0:00 0.00% sshlockout_pf 7160 root 1 20 0 10364K 1872K bpf 9:31 0.00% filterlog 13406 root 2 20 0 10372K 1868K nanslp 0:00 0.00% sshlockout_pf 66564 root 1 52 0 10592K 1852K ttyin 0:00 0.00% sh 66680 root 1 52 0 10592K 1852K ttyin 0:00 0.00% sh 10405 root 1 20 0 10132K 1796K select 8:18 0.00% apinger 16094 root 1 52 20 5984K 1660K nanslp 0:00 0.00% sleep 51069 root 1 20 0 10244K 1036K nanslp 0:05 0.00% cron 299 root 1 40 20 10524K 992K kqread 0:00 0.00% check_reload_status 53534 root 1 52 0 10084K 340K nanslp 0:01 0.00% minicron 54160 root 1 20 0 10084K 340K nanslp 0:00 0.00% minicron 54522 root 1 20 0 10084K 316K nanslp 0:00 0.00% minicron 66494 root 1 52 0 10592K 0K wait 0:00 0.00% <sh>66522 root 1 52 0 10592K 0K wait 0:00 0.00% <sh>66075 root 1 52 0 10584K 0K wait 0:00 0.00% <login>66030 root 1 52 0 10584K 0K wait 0:00 0.00% <login>301 root 1 52 20 10524K 0K kqread 0:00 0.00% <check_reload_stat></check_reload_stat></login></login></sh></sh>
-
You need to run the 'top' command when you are noticing issues… Then you can start to diagnose the issue. Also keep an eye on the logs...
-
Ok.
Why did the kernel choose to kill the charon process instead of any of the other processes?
-
"out of swap space" on Alix means "out of real memory" because there is zero swap space. When every page of memory is used, the next process that wants to allocate a page of memory will have the allocation request fail and the kernel will kill it. The process that this happens to is "just lucky" :)
As others have said above, you need to monitor the memory use from time to time and see if there is a process that is gradually using more and more memory.
-
There is a slight leak somewhere in strongSwan that we haven't tracked down yet, which is probably why charon was the one to die since it couldn't get more memory.
Most systems have enough RAM that it goes unnoticed, but ALIX is severely limited by today's standards.