2.0.1 Nano Alix Memory Usage, Unstable System, Config saved but not applying
-
Hello, I've been seeing problems in several sites that are running pfSense 2.0.1 Nanobsd on Alix 2d13's (256MB ram).
The usual symptom is that the captive portal stops working for new users, to restore functionality the firewall needs to be rebooted.
In one case today all config changes have stopped applying. I can make a change, click apply changes, but the changes never actually get applied to the system. The changes are saved, but not applied. If I look at /tmp/rules.debug it doesn't get updated and if I run pfctl -sa |less I can see that the changes have not been applied.
I did notice that there were a large number of php processes just sitting around when I first looked at the box. I tried killing them to see if one of them has the system stuck, and now I have 5 zombie processes. Now there are 16 php processes just sitting around.
When I look at the filter reload status, it is just stuck at one line "Creating NAT rule VNC…".
When I look at the process list I see.
PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 32169 root 1 76 20 39260K 17940K lockf 0:01 0.00% php 19144 root 1 76 20 39260K 17900K lockf 0:01 0.00% php 50355 root 1 76 20 39260K 17896K lockf 0:01 0.00% php 2772 root 1 76 0 34140K 16080K accept 0:06 0.00% php 2540 root 1 76 0 34140K 16012K accept 0:03 0.00% php 2222 root 1 76 0 34140K 15220K accept 0:03 0.00% php
Does this mean that there is some sort of lock contention going on, 3 processes are all trying to grab locks, so I'm in a deadlock?
My full process list looks like
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 10 89.0 0.0 0 8 ?? RL 28Feb12 277714:04.60 [idle] root 2823 4.0 5.2 33116 12788 ?? S 2:40PM 0:22.60 /usr/local/bin/php root 11 2.0 0.0 0 104 ?? WL 28Feb12 3050:44.41 [intr] root 0 0.0 0.0 0 56 ?? DLs 28Feb12 871:49.73 [kernel] root 1 0.0 0.0 1888 84 ?? ILs 28Feb12 0:06.85 /sbin/init -- root 2 0.0 0.0 0 8 ?? DL 28Feb12 20:15.29 [g_event] root 3 0.0 0.0 0 8 ?? DL 28Feb12 8:00.53 [g_up] root 4 0.0 0.0 0 8 ?? DL 28Feb12 7:56.95 [g_down] root 5 0.0 0.0 0 8 ?? DL 28Feb12 0:00.00 [crypto] root 6 0.0 0.0 0 8 ?? DL 28Feb12 0:00.00 [crypto returns] root 7 0.0 0.0 0 8 ?? DL 28Feb12 19:27.58 [pfpurge] root 8 0.0 0.0 0 8 ?? DL 28Feb12 0:00.00 [xpt_thrd] root 9 0.0 0.0 0 8 ?? DL 28Feb12 33:28.61 [pagedaemon] root 12 0.0 0.0 0 8 ?? DL 28Feb12 0:00.00 [ng_queue] root 13 0.0 0.0 0 8 ?? DL 28Feb12 417:45.44 [yarrow] root 14 0.0 0.0 0 64 ?? DL 28Feb12 5:27.94 [usb] root 15 0.0 0.0 0 8 ?? DL 28Feb12 0:00.94 [vmdaemon] root 16 0.0 0.0 0 8 ?? DL 28Feb12 0:00.52 [pagezero] root 17 0.0 0.0 0 8 ?? DL 28Feb12 0:46.74 [idlepoll] root 18 0.0 0.0 0 8 ?? DL 28Feb12 2:34.03 [bufdaemon] root 19 0.0 0.0 0 8 ?? DL 28Feb12 2:23.98 [vnlru] root 20 0.0 0.0 0 8 ?? DL 28Feb12 32:50.15 [syncer] root 21 0.0 0.0 0 8 ?? DL 28Feb12 2:37.69 [softdepflush] root 29 0.0 0.0 0 8 ?? DL 28Feb12 35:54.14 [md0] root 35 0.0 0.0 0 8 ?? DL 28Feb12 125:34.09 [md1] root 267 0.0 0.2 3408 476 ?? INs 28Feb12 0:00.06 /usr/local/sbin/check_reload_status root 272 0.0 0.0 3408 0 ?? IWN - 0:00.00 check_reload_status: Monitoring daemon of check_reload_s root 288 0.0 0.1 1888 268 ?? Is 28Feb12 0:00.01 /sbin/devd root 1863 0.0 3.0 32092 7324 ?? Is 2:40PM 0:00.27 /usr/local/bin/php root 1933 0.0 5.3 33116 13044 ?? I 2:40PM 0:00.83 /usr/local/bin/php root 2222 0.0 6.2 34140 15220 ?? I 2:40PM 0:03.10 /usr/local/bin/php root 2539 0.0 6.1 34140 15008 ?? I 2:40PM 0:02.89 /usr/local/bin/php root 2540 0.0 6.5 34140 16012 ?? I 2:40PM 0:03.41 /usr/local/bin/php root 2772 0.0 6.6 34140 16080 ?? I 2:40PM 0:06.24 /usr/local/bin/php root 8528 0.0 0.5 3656 1144 ?? SN 2:41PM 0:02.35 /bin/sh /var/db/rrd/updaterrd.sh root 8546 0.0 0.7 7336 1672 ?? Ss 28Feb12 13:13.89 /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmp root 12657 0.0 0.2 1564 468 ?? SN 3:14PM 0:00.00 sleep 60 root 16710 0.0 0.4 7412 1020 ?? Is 7Sep12 0:23.69 /usr/local/sbin/dhcpleases -l /var/dhcpd/var/db/dhcpd.le nobody 17307 0.0 0.9 5556 2176 ?? S 7Sep12 15:52.22 /usr/local/sbin/dnsmasq --local-ttl 1 --all-servers --dn root 17537 0.0 0.2 3404 436 ?? Is 28Feb12 1:23.92 /usr/sbin/cron -s root 18854 0.0 1.1 7992 2812 ?? Ss 2:10PM 0:02.34 sshd: admin@pts/0 (sshd) root 19144 0.0 7.3 39260 17900 ?? IN 2:44PM 0:01.05 /usr/local/bin/php -f /etc/rc.filter_configure_sync root 20237 0.0 0.1 3316 140 ?? Is 28Feb12 0:30.70 /usr/local/bin/minicron 240 /var/run/ping_hosts.pid /usr root 20525 0.0 0.1 3316 208 ?? Is 28Feb12 0:02.14 /usr/local/bin/minicron 3600 /var/run/expire_accounts.pi root 20879 0.0 0.1 3316 196 ?? Is 28Feb12 0:00.09 /usr/local/bin/minicron 86400 /var/run/update_alias_url_ root 20943 0.0 0.4 4948 1100 ?? Is 28Feb12 8:59.26 /usr/sbin/syslogd -c -c -l /var/dhcpd/var/run/log -f /va root 22238 0.0 0.6 5116 1372 ?? Ss 28Feb12 99:21.51 /usr/local/sbin/openvpn --config /var/etc/openvpn/client root 22652 0.0 2.4 10008 5892 ?? SN 31Aug12 1:22.85 /usr/sbin/tcpdump -s 256 -v -l -n -e -ttt -i pflog0 root 22711 0.0 0.2 3316 448 ?? IN 31Aug12 0:35.32 logger -t pf -p local0.info root 25226 0.0 1.0 12108 2376 ?? Is 2:44PM 0:00.14 /usr/local/sbin/filterdns -p /tmp/filterdns-cpah.pid -i dhcpd 26565 0.0 0.9 8436 2296 ?? Ss 7Sep12 3:17.26 /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot / root 26684 0.0 0.6 5272 1404 ?? Is 28Feb12 0:00.01 /usr/sbin/sshd root 27154 0.0 0.9 5576 2108 ?? S 2:44PM 0:00.08 /usr/local/sbin/lighttpd -f /var/etc/lighty-CaptivePorta root 27550 0.0 0.3 3532 620 ?? Is 28Feb12 0:00.42 /usr/local/sbin/sshlockout_pf 15 root 27602 0.0 3.0 32092 7368 ?? Is 2:44PM 0:00.27 /usr/local/bin/php root 28089 0.0 0.3 3316 768 ?? Is 2:44PM 0:00.01 /usr/local/bin/minicron 60 /var/run/cp_prunedb.pid /etc/ root 28358 0.0 3.0 32092 7368 ?? Is 2:44PM 0:00.27 /usr/local/bin/php root 30798 0.0 0.1 3436 152 ?? Is 28Feb12 0:00.03 /usr/sbin/inetd -wW -R 0 -a 127.0.0.1 /var/etc/inetd.con root 31219 0.0 3.0 32092 7380 ?? I 2:44PM 0:00.00 /usr/local/bin/php root 31476 0.0 3.0 32092 7380 ?? I 2:44PM 0:00.00 /usr/local/bin/php root 31821 0.0 3.0 32092 7380 ?? I 2:44PM 0:00.00 /usr/local/bin/php root 31922 0.0 3.0 32092 7380 ?? I 2:44PM 0:00.00 /usr/local/bin/php root 32169 0.0 7.3 39260 17940 ?? IN 2:44PM 0:01.05 /usr/local/bin/php -f /etc/rc.filter_configure_sync root 34062 0.0 0.4 4496 1068 ?? S 2:43PM 0:00.03 /usr/local/bin/rrdtool - root 46065 0.0 0.3 3316 652 ?? Ss 28Feb12 110:20.73 /usr/local/sbin/apinger -c /var/etc/apinger.conf root 50355 0.0 7.3 39260 17896 ?? IN 2:41PM 0:01.05 /usr/local/bin/php -f /etc/rc.filter_configure_sync root 54262 0.0 1.4 7612 3480 ?? S 28Feb12 30:40.66 /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigura root 54517 0.0 0.0 0 0 ?? Z 28Feb12 66:53.76 <defunct>root 55563 0.0 0.0 0 0 ?? Z 28Feb12 239:57.08 <defunct>root 55928 0.0 0.0 0 0 ?? Z 28Feb12 20:47.48 <defunct>root 56079 0.0 0.2 3316 524 ?? Ss 28Feb12 0:32.77 ntpd: [priv] (ntpd) root 56281 0.0 0.0 0 0 ?? Z 28Feb12 20:59.64 <defunct>root 56618 0.0 0.0 0 0 ?? Z 28Feb12 26:27.41 <defunct>root 37159 0.0 0.0 3684 0 u0 IWs - 0:00.00 login [pam] (login) root 37712 0.0 0.0 3656 0 u0 IW - 0:00.00 -sh (sh) root 39851 0.0 0.1 3656 220 u0 I+ 28Feb12 0:00.01 /bin/sh /etc/rc.initial _ntp 55519 0.0 0.2 3316 556 u0- I 28Feb12 2:40.48 ntpd: ntp engine (ntpd) root 13484 0.0 0.4 3456 892 0 R+ 3:14PM 0:00.01 ps aux root 19426 0.0 0.4 3656 976 0 Is 2:10PM 0:00.02 /bin/sh /etc/rc.initial root 25765 0.0 0.9 4696 2144 0 S 2:10PM 0:00.13 /bin/tcsh</defunct></defunct></defunct></defunct></defunct>
My system log looks like this.
Sep 27 14:17:37 cr-firewall kernel: pid 45333 (php), uid 0, was killed: out of swap space Sep 27 17:08:25 cr-firewall kernel: pid 62341 (php), uid 0, was killed: out of swap space Sep 27 17:09:26 cr-firewall php: : The command '/sbin/ipfw table 1 delete 192.168.1.208' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process' Sep 27 17:09:26 cr-firewall php: : The command '/sbin/ipfw table 2 delete 192.168.1.208' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process' Sep 27 17:09:26 cr-firewall php: : The command '/sbin/ipfw pipe 20794 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument' Sep 27 17:09:27 cr-firewall php: : The command '/sbin/ipfw pipe 20795 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument' Sep 27 21:13:27 cr-firewall kernel: pid 54687 (php), uid 0, was killed: out of swap space Sep 28 02:23:39 cr-firewall kernel: pid 30905 (php), uid 0, was killed: out of swap space Sep 28 03:34:27 cr-firewall filterdns: host_dns: failed looking up "ils.larl.org": hostname nor servname provided, or not known Sep 28 03:34:27 cr-firewall filterdns: host_dns: failed looking up "ils.larl.org": hostname nor servname provided, or not known Sep 28 03:34:27 cr-firewall filterdns: host_dns: failed looking up "ils.larl.org": hostname nor servname provided, or not known Sep 28 03:34:27 cr-firewall filterdns: host_dns: failed looking up "ils.larl.org": hostname nor servname provided, or not known Sep 28 04:10:02 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 04:10:02 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 04:18:01 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 04:18:01 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 04:25:01 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 04:25:01 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 10:07:59 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 10:07:59 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 11:57:07 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 11:57:07 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98 Sep 28 16:58:35 cr-firewall check_reload_status: Syncing firewall Sep 28 16:59:10 cr-firewall check_reload_status: Syncing firewall Sep 28 16:59:43 cr-firewall check_reload_status: Syncing firewall Sep 28 17:00:11 cr-firewall check_reload_status: Syncing firewall Sep 28 17:00:32 cr-firewall check_reload_status: Syncing firewall Sep 28 17:00:50 cr-firewall check_reload_status: Syncing firewall Sep 28 17:00:50 cr-firewall check_reload_status: Reloading filter Sep 28 12:00:50 cr-firewall php: /firewall_shaper.php: Creating rrd update script Sep 28 12:00:57 cr-firewall kernel: pid 14302 (php), uid 0, was killed: out of swap space Sep 28 17:01:16 cr-firewall check_reload_status: Syncing firewall Sep 28 12:01:24 cr-firewall apinger: Error while feeding rrdtool: Broken pipe Sep 28 17:01:37 cr-firewall check_reload_status: Syncing firewall Sep 28 17:01:45 cr-firewall check_reload_status: Syncing firewall Sep 28 17:01:45 cr-firewall check_reload_status: Reloading filter Sep 28 12:01:53 cr-firewall kernel: pid 19436 (php), uid 0, was killed: out of swap space Sep 28 12:02:02 cr-firewall kernel: pid 19636 (php), uid 0, was killed: out of swap space Sep 28 12:04:39 cr-firewall kernel: pid 21966 (php), uid 0, was killed: out of swap space Sep 28 17:04:52 cr-firewall check_reload_status: Syncing firewall Sep 28 17:04:57 cr-firewall check_reload_status: Reloading filter Sep 28 12:16:12 cr-firewall kernel: pid 33523 (php), uid 0, was killed: out of swap space Sep 28 17:17:52 cr-firewall check_reload_status: Syncing firewall Sep 28 17:17:57 cr-firewall check_reload_status: Syncing firewall Sep 28 17:17:57 cr-firewall check_reload_status: Reloading filter Sep 28 12:17:59 cr-firewall php: /firewall_shaper.php: Creating rrd update script Sep 28 12:18:05 cr-firewall kernel: pid 8111 (php), uid 0, was killed: out of swap space Sep 28 12:18:24 cr-firewall apinger: Error while feeding rrdtool: Broken pipe Sep 28 12:20:17 cr-firewall kernel: pid 8159 (php), uid 0, was killed: out of swap space Sep 28 18:57:39 cr-firewall check_reload_status: Syncing firewall Sep 28 18:57:45 cr-firewall check_reload_status: Syncing firewall Sep 28 18:57:45 cr-firewall check_reload_status: Reloading filter Sep 28 13:57:47 cr-firewall php: /firewall_shaper.php: Creating rrd update script Sep 28 13:57:53 cr-firewall kernel: pid 57576 (php), uid 0, was killed: out of swap space Sep 28 13:58:25 cr-firewall apinger: Error while feeding rrdtool: Broken pipe Sep 28 19:08:33 cr-firewall check_reload_status: Syncing firewall Sep 28 19:08:57 cr-firewall check_reload_status: Syncing firewall Sep 28 19:08:57 cr-firewall check_reload_status: Reloading filter Sep 28 14:08:58 cr-firewall php: /firewall_shaper.php: Creating rrd update script Sep 28 14:09:05 cr-firewall kernel: pid 32043 (php), uid 0, was killed: out of swap space Sep 28 14:09:25 cr-firewall apinger: Error while feeding rrdtool: Broken pipe Sep 28 14:10:24 cr-firewall sshd[18854]: Accepted keyboard-interactive/pam for admin from 216.239.28.98 port 45647 ssh2 Sep 28 19:33:04 cr-firewall check_reload_status: Syncing firewall Sep 28 19:33:16 cr-firewall check_reload_status: Syncing firewall Sep 28 19:33:16 cr-firewall check_reload_status: Reloading filter Sep 28 14:33:17 cr-firewall php: /firewall_shaper.php: Creating rrd update script Sep 28 14:33:26 cr-firewall apinger: Error while feeding rrdtool: Broken pipe Sep 28 14:35:19 cr-firewall kernel: pid 21821 (php), uid 0, was killed: out of swap space Sep 28 14:36:20 cr-firewall php: : The command '/sbin/ipfw table 1 delete 192.168.1.199' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process' Sep 28 14:36:20 cr-firewall php: : The command '/sbin/ipfw table 2 delete 192.168.1.199' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process' Sep 28 14:36:20 cr-firewall php: : The command '/sbin/ipfw pipe 20014 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument' Sep 28 14:36:20 cr-firewall php: : The command '/sbin/ipfw pipe 20015 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument' Sep 28 19:41:28 cr-firewall check_reload_status: Syncing firewall Sep 28 19:41:37 cr-firewall check_reload_status: Syncing firewall Sep 28 19:41:37 cr-firewall check_reload_status: Reloading filter Sep 28 14:41:37 cr-firewall php: /firewall_shaper.php: Creating rrd update script Sep 28 14:42:26 cr-firewall apinger: Error while feeding rrdtool: Broken pipe Sep 28 19:43:54 cr-firewall check_reload_status: Syncing firewall Sep 28 14:43:55 cr-firewall kernel: IP firewall unloaded Sep 28 19:43:55 cr-firewall check_reload_status: Reloading filter Sep 28 19:44:10 cr-firewall check_reload_status: Syncing firewall Sep 28 14:44:11 cr-firewall kernel: ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding enabled, default to accept, logging disabled Sep 28 19:44:16 cr-firewall check_reload_status: Reloading filter
My theory right now is that the system gets low on memory, then starts killing processes randomly, or however it decided which process to kill, and a process that holds a lock gets killed, which deadlocks the system until it is rebooted.
Does anyone have any tips for figuring out what is causing the memory usage to run out of control?
Thanks
Josh -
This redmine ticket may be related to my problem.
http://redmine.pfsense.org/issues/2475
According to the submitter, the captive portal page limits don't actually work because a lighttpd module isn't included with pfsense. So it could be that the captive portal is getting dos'd once in a while, which causes memory to be used up, which triggers system instability.
Josh
-
The Mod_evasive per-ip connection to the captive portal issue should be fixed in 2.0.3, which should keep the Captive portal from causing OOM events. I've upgraded several sites and the memory usage does seem to be lower now.
Josh