Severe Problems on ALIX 2D13: high cpu load, out of memory/swap
-
Hello,
I have two ALIX 2D13 boxes and did an upgrade from 1.2.3 to 2.0 on one box and a fresh install of 2.0 on the other box.
Since the upgrades I have severe problems with processes (like php) dying because of low memory or swap space and the overall performance of the webgui is very poor (takes 1 minute to load pages, displays error 500 randomly, have php-related error messages on the top of the page, etc.)I have no additional packages installed and just some very simple filtering rules and the box has a load value between 3 and 7.
Whenever I try to install packages, the process never finishes and the webgui stops loading on a certain point.I have error messages in system.log:
Dec 3 14:55:53 gateway kernel: pid 9536 (php), uid 0, was killed: out of swap space Dec 3 14:55:53 gateway kernel: pid 17096 (php), uid 0, was killed: out of swap space Dec 3 13:56:42 gateway check_reload_status: Syncing firewall Dec 3 14:56:42 gateway php: /pkg_mgr_install.php: Beginning package installation for OpenVPN Client Export Utility. Dec 3 14:57:15 gateway kernel: pid 12524 (php), uid 0, was killed: out of swap space Dec 3 14:57:15 gateway kernel: pid 42267 (php), uid 0, was killed: out of swap space
lighttpd.error.log
2011-12-03 14:43:33: (mod_fastcgi.c.2566) unexpected end-of-file (perhaps the fastcgi process died): pid: 51917 socket: unix:/tmp/php-fastcgi.socket-19 2011-12-03 14:43:33: (mod_fastcgi.c.3354) response not received, request sent: 492 on socket: unix:/tmp/php-fastcgi.socket-19 for /preload.php?, closing connection 2011-12-03 14:46:26: (mod_fastcgi.c.2566) unexpected end-of-file (perhaps the fastcgi process died): pid: 51917 socket: unix:/tmp/php-fastcgi.socket-19 2011-12-03 14:46:26: (mod_fastcgi.c.3369) response already sent out, but backend returned error on socket: unix:/tmp/php-fastcgi.socket-19 for /pkg_mgr.php?, terminating connection 2011-12-03 14:50:16: (mod_fastcgi.c.2566) unexpected end-of-file (perhaps the fastcgi process died): pid: 51917 socket: unix:/tmp/php-fastcgi.socket-19 2011-12-03 14:50:16: (mod_fastcgi.c.3369) response already sent out, but backend returned error on socket: unix:/tmp/php-fastcgi.socket-19 for /pkg_mgr_install.php?id=OpenVPN%20Client%20Export%20Utility, terminating connection
Anyone has similar issues? Any hints on how to find out the problem?
Thanks in between.EDIT: I should have mentioned that I use 4GB CF-Cards ;)
-
Running from CF you should be using the NanoBSD images which don't have swap at all so I'm surprised to see that.
Are you running NanoBSD?Steve
-
Yes, I'm running the nanobsd image, so the message about swap is really strange.
Is it possible that the box is running out of memory?
I think this should not happen unless I have some memory consuming packages installed.The box has a Nanobsd-Image, NO packages, a very simple ruleset and no VPN activated at all.
Why are the php/fastcgi processes terminating?Thanks!
-
I'm not sure to be honest. I've just been looking on the forum and it seems those messages can appear on a NanoBSD install if you run low on memory.
How much RAM do you have in those boxes? How much does it show as used in the dashboard?Steve
-
Hi,
The Dashboard says 70% and the box has a total of 256MB and a 500MHz AMD Geode CPU.
Thanks
EDIT: And the dashboard always says 100% for CPU usage !?
-
Hmm, 100% you say. Even when you're not doing anything?
What's using all your cpu cycles? Try running at the console:
top -S
Steve
-
It was a process called "idlepoll", when I deactivated the device polling option the cpu load went down, but the memory problems with dying processes are the same as before - now my box doesn't even display the dashboard :(
Thanks
-
Device polling can cause some odd things, at least it did for me last time I played with it.
I would make sure you have rebooted after enabling/disabling polling, switching it didn't always behave as I expected.
Otherwise I'm out of suggestions. Add more ram? I can't really see why you should have to if you haven't started using more services. :-\Steve
-
There is no possibility to add more RAM on these embedded boxes.
I found another strange message on the serial console:Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.
-
Read this http://forum.pfsense.org/index.php/topic,38707.0.html
To ease out:
run a shell command:```
sysctl vm.pmap.shpgperprocanswer should be 200 then create a file, if it doesn't exist``` /boot/loader.conf.local
add next line in there```
vm.pmap.shpgperproc=500save and reboot
-
"out of swap space" just means you're entirely out of memory. There is no swap on embedded.
You running any packages? The stock base system wouldn't exhaust 256 MB unless you had 200,000+ active connections, but there are many other things that could with packages.
-
My home firewall has been running "full" pfSense for over 3 years on 256MB of RAM and I can't recall ever seeing it even use swap. I'm currently running only two packages: siproxd and pfflowd. I have also run bandwidthd and countryblock. top currently tells me I have 92MB free.
What packages are you trying to run on that system?
Please post a snapshot of top output. Here's a snapshot of top from my system:```
last pid: 18074; load averages: 0.07, 0.04, 0.06 up 2+07:49:06 14:45:23
60 processes: 2 running, 50 sleeping, 8 zombie
CPU: 1.6% user, 0.0% nice, 4.3% system, 1.2% interrupt, 93.0% idle
Mem: 57M Active, 18M Inact, 47M Wired, 32M Buf, 92M Free
Swap: 260M Total, 260M FreePID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
12997 root 1 44 0 3712K 2084K RUN 0:01 0.10% top
3552 root 1 76 20 3656K 1584K wait 5:17 0.00% sh
19946 root 1 64 20 3316K 1364K RUN 2:25 0.00% apinger
51310 root 1 64 20 7688K 4196K bpf 0:50 0.00% bandwidthd
50565 root 1 64 20 8712K 5232K bpf 0:50 0.00% bandwidthd
50805 root 1 44 0 7688K 3876K bpf 0:50 0.00% bandwidthd
50259 root 1 45 0 8712K 4908K bpf 0:50 0.00% bandwidthd
50479 root 1 44 0 7688K 3888K bpf 0:49 0.00% bandwidthd
50394 root 1 44 0 8712K 4196K bpf 0:49 0.00% bandwidthd
50777 root 1 64 20 8712K 4520K bpf 0:48 0.00% bandwidthd
51040 root 1 64 20 7688K 4216K bpf 0:48 0.00% bandwidthd
57601 root 1 76 0 34140K 23668K accept 0:46 0.00% php
61779 nobody 1 44 0 5556K 2676K select 0:45 0.00% dnsmasq
57693 root 1 76 0 33116K 21108K accept 0:44 0.00% php
24595 root 1 44 0 4944K 2544K select 0:42 0.00% syslogd
49099 root 1 44 0 2912K 740K bpf 0:37 0.00% pfflowd
60468 dhcpd 1 44 0 8436K 5372K select 0:28 0.00% dhcpd
50016 root 2 65 r21 7824K 5880K select 0:19 0.00% siproxd
51812 root 1 44 0 8464K 4580K select 0:19 0.00% mpd5
16171 root 1 44 0 5176K 2632K select 0:15 0.00% hostapd
28718 root 1 44 0 5912K 3040K bpf 0:12 0.00% tcpdump
52732 root 1 44 0 6588K 3500K kqread 0:12 0.00% lighttpdEven with 8 copies of bandwidthd running (that I think shouldn't be running because I removed the package some reboots ago) I still have 92MB free memory. I wonder if you have lots of extraneous processes. I have a vague memory of some reports of runaway process creation in some 2.0 snapshots builds.
-
Hi,
As mentioned before I have NO packages installed. The pfSense 1.2.3 worked flawlessly on the same hardware for a long time.
I did a upgrade to 2.0 and had device polling activated under 1.2.3 (gave me more bandwidth).
With device polling the load was at 100% on 2.0 and the webgui wasn't useable at all.Afterwards I deactivated device polling and now the webgui works again (if you don't worry about the frequent error 500).
But the errors I mentioned above are the same, especially when the box has been up for some days.I had to do a reboot yesterday evening, so currently my top displays the following:
last pid: 53715; load averages: 0.20, 0.08, 0.02 up 0+17:06:56 15:15:58 459 processes: 1 running, 458 sleeping CPU: 0.4% user, 0.0% nice, 1.5% system, 2.3% interrupt, 95.8% idle Mem: 126M Active, 14M Inact, 70M Wired, 5572K Cache, 34M Buf, 18M Free Swap: PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 53715 root 1 45 0 4736K 2196K RUN 0:00 0.98% top 58990 root 1 76 20 3656K 1080K wait 1:04 0.00% sh 26060 root 1 44 0 4944K 1636K select 0:51 0.00% syslogd 34843 root 1 64 20 3316K 692K select 0:34 0.00% apinger 27425 root 1 44 0 3316K 740K piperd 0:22 0.00% logger 27329 root 1 44 0 5912K 2712K bpf 0:19 0.00% tcpdump 8719 root 1 76 0 33116K 15040K accept 0:17 0.00% php 8684 root 1 76 0 33116K 12680K accept 0:09 0.00% php 4764 dhcpd 1 44 0 8436K 1736K select 0:09 0.00% dhcpd 5482 root 1 76 0 33116K 12580K accept 0:07 0.00% php 1460 root 1 76 0 33116K 12388K accept 0:06 0.00% php 1278 root 1 76 0 33116K 15960K accept 0:04 0.00% php 48287 root 1 44 0 6588K 3528K kqread 0:03 0.00% lighttpd 6467 nobody 1 44 0 5556K 1288K select 0:02 0.00% dnsmasq 10276 root 1 44 0 8464K 2800K select 0:02 0.00% mpd5 2945 root 1 44 0 5116K 2984K select 0:01 0.00% openvpn 54285 root 1 45 0 33116K 12240K accept 0:01 0.00% php 14737 root 1 76 0 33116K 10740K accept 0:01 0.00% php 59858 root 1 76 0 34140K 14748K accept 0:01 0.00% php 35322 sshd 1 76 0 6616K 2756K select 0:01 0.00% sshd 8924 root 1 76 0 33116K 14596K accept 0:01 0.00% php 57661 root 1 44 0 33116K 11596K accept 0:01 0.00% php 14508 root 1 76 0 33116K 11084K accept 0:01 0.00% php 17403 _ntp 1 64 20 3316K 1044K select 0:01 0.00% ntpd 37689 root 1 44 0 3436K 1172K select 0:00 0.00% inetd 1200 root 1 76 0 33116K 11744K accept 0:00 0.00% php 51639 root 1 44 0 3404K 832K nanslp 0:00 0.00% cron 14983 root 1 76 0 33116K 9772K accept 0:00 0.00% php 48520 root 1 76 0 32092K 6388K wait 0:00 0.00% php 53598 root 1 76 0 32092K 6388K wait 0:00 0.00% php 53449 root 1 76 0 32092K 6388K wait 0:00 0.00% php 52507 root 1 76 0 32092K 6388K wait 0:00 0.00% php 52757 root 1 76 0 32092K 6388K wait 0:00 0.00% php 53888 root 1 76 0 32092K 6388K wait 0:00 0.00% php 44532 root 1 44 0 7992K 2924K select 0:00 0.00% sshd 53150 root 1 76 0 32092K 6388K wait 0:00 0.00% php 53426 root 1 76 0 32092K 6388K wait 0:00 0.00% php
I also wondered why there are SOO many php processes running…
Thanks for now
-
I also wondered why there are SOO many php processes running…
And what are they waiting for so close to their startup?
Perhaps the system log will give some clues. pfSense shell command clog /var/log/system.log can be used to display the whole log. (The web GUI Status -> System Logs normally displays only the most recent 50 lines or so.)
-
If you removed bandwidthd (and siproxd I see) they didn't fully come out. Try switching to the other NanoBSD slice to see if it behaves better. First make extra sure there are no packages installed under System > Packages.
It would be rather easy to run an ALIX out of RAM.
It's normal to have a couple PHP processes going, but top doesn't show you the whole story, check the output of "ps uxawww" and see if it shows more about what PHP is doing, like if it's being used to run a certain script, like rc.filter_configure_sync, etc.