2.0.1 Nano Alix Memory Usage, Unstable System, Config saved but not applying



  • Hello, I've been seeing problems in several sites that are running pfSense 2.0.1 Nanobsd on Alix 2d13's (256MB ram).

    The usual symptom is that the captive portal stops working for new users, to restore functionality the firewall needs to be rebooted.

    In one case today all config changes have stopped applying.  I can make a change, click apply changes, but the changes never actually get applied to the system.  The changes are saved, but not applied.  If I look at /tmp/rules.debug it doesn't get updated and if I run pfctl -sa |less I can see that the changes have not been applied.

    I did notice that there were a large number of php processes just sitting around when I first looked at the box.  I tried killing them to see if one of them has the system stuck, and now I have 5 zombie processes.  Now there are 16 php processes just sitting around.

    When I look at the filter reload status, it is just stuck at one line "Creating NAT rule VNC…".

    When I look at the process list I see.

      PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
    32169 root        1  76   20 39260K 17940K lockf    0:01  0.00% php
    19144 root        1  76   20 39260K 17900K lockf    0:01  0.00% php
    50355 root        1  76   20 39260K 17896K lockf    0:01  0.00% php
     2772 root        1  76    0 34140K 16080K accept   0:06  0.00% php
     2540 root        1  76    0 34140K 16012K accept   0:03  0.00% php
     2222 root        1  76    0 34140K 15220K accept   0:03  0.00% php
    
    

    Does this mean that there is some sort of lock contention going on, 3 processes are all trying to grab locks, so I'm in a deadlock?

    My full process list looks like

    USER     PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
    root      10 89.0  0.0     0     8  ??  RL   28Feb12 277714:04.60 [idle]
    root    2823  4.0  5.2 33116 12788  ??  S     2:40PM   0:22.60 /usr/local/bin/php
    root      11  2.0  0.0     0   104  ??  WL   28Feb12 3050:44.41 [intr]
    root       0  0.0  0.0     0    56  ??  DLs  28Feb12 871:49.73 [kernel]
    root       1  0.0  0.0  1888    84  ??  ILs  28Feb12   0:06.85 /sbin/init --
    root       2  0.0  0.0     0     8  ??  DL   28Feb12  20:15.29 [g_event]
    root       3  0.0  0.0     0     8  ??  DL   28Feb12   8:00.53 [g_up]
    root       4  0.0  0.0     0     8  ??  DL   28Feb12   7:56.95 [g_down]
    root       5  0.0  0.0     0     8  ??  DL   28Feb12   0:00.00 [crypto]
    root       6  0.0  0.0     0     8  ??  DL   28Feb12   0:00.00 [crypto returns]
    root       7  0.0  0.0     0     8  ??  DL   28Feb12  19:27.58 [pfpurge]
    root       8  0.0  0.0     0     8  ??  DL   28Feb12   0:00.00 [xpt_thrd]
    root       9  0.0  0.0     0     8  ??  DL   28Feb12  33:28.61 [pagedaemon]
    root      12  0.0  0.0     0     8  ??  DL   28Feb12   0:00.00 [ng_queue]
    root      13  0.0  0.0     0     8  ??  DL   28Feb12 417:45.44 [yarrow]
    root      14  0.0  0.0     0    64  ??  DL   28Feb12   5:27.94 [usb]
    root      15  0.0  0.0     0     8  ??  DL   28Feb12   0:00.94 [vmdaemon]
    root      16  0.0  0.0     0     8  ??  DL   28Feb12   0:00.52 [pagezero]
    root      17  0.0  0.0     0     8  ??  DL   28Feb12   0:46.74 [idlepoll]
    root      18  0.0  0.0     0     8  ??  DL   28Feb12   2:34.03 [bufdaemon]
    root      19  0.0  0.0     0     8  ??  DL   28Feb12   2:23.98 [vnlru]
    root      20  0.0  0.0     0     8  ??  DL   28Feb12  32:50.15 [syncer]
    root      21  0.0  0.0     0     8  ??  DL   28Feb12   2:37.69 [softdepflush]
    root      29  0.0  0.0     0     8  ??  DL   28Feb12  35:54.14 [md0]
    root      35  0.0  0.0     0     8  ??  DL   28Feb12 125:34.09 [md1]
    root     267  0.0  0.2  3408   476  ??  INs  28Feb12   0:00.06 /usr/local/sbin/check_reload_status
    root     272  0.0  0.0  3408     0  ??  IWN  -         0:00.00 check_reload_status: Monitoring daemon of check_reload_s
    root     288  0.0  0.1  1888   268  ??  Is   28Feb12   0:00.01 /sbin/devd
    root    1863  0.0  3.0 32092  7324  ??  Is    2:40PM   0:00.27 /usr/local/bin/php
    root    1933  0.0  5.3 33116 13044  ??  I     2:40PM   0:00.83 /usr/local/bin/php
    root    2222  0.0  6.2 34140 15220  ??  I     2:40PM   0:03.10 /usr/local/bin/php
    root    2539  0.0  6.1 34140 15008  ??  I     2:40PM   0:02.89 /usr/local/bin/php
    root    2540  0.0  6.5 34140 16012  ??  I     2:40PM   0:03.41 /usr/local/bin/php
    root    2772  0.0  6.6 34140 16080  ??  I     2:40PM   0:06.24 /usr/local/bin/php
    root    8528  0.0  0.5  3656  1144  ??  SN    2:41PM   0:02.35 /bin/sh /var/db/rrd/updaterrd.sh
    root    8546  0.0  0.7  7336  1672  ??  Ss   28Feb12  13:13.89 /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmp
    root   12657  0.0  0.2  1564   468  ??  SN    3:14PM   0:00.00 sleep 60
    root   16710  0.0  0.4  7412  1020  ??  Is    7Sep12   0:23.69 /usr/local/sbin/dhcpleases -l /var/dhcpd/var/db/dhcpd.le
    nobody 17307  0.0  0.9  5556  2176  ??  S     7Sep12  15:52.22 /usr/local/sbin/dnsmasq --local-ttl 1 --all-servers --dn
    root   17537  0.0  0.2  3404   436  ??  Is   28Feb12   1:23.92 /usr/sbin/cron -s
    root   18854  0.0  1.1  7992  2812  ??  Ss    2:10PM   0:02.34 sshd: admin@pts/0 (sshd)
    root   19144  0.0  7.3 39260 17900  ??  IN    2:44PM   0:01.05 /usr/local/bin/php -f /etc/rc.filter_configure_sync
    root   20237  0.0  0.1  3316   140  ??  Is   28Feb12   0:30.70 /usr/local/bin/minicron 240 /var/run/ping_hosts.pid /usr
    root   20525  0.0  0.1  3316   208  ??  Is   28Feb12   0:02.14 /usr/local/bin/minicron 3600 /var/run/expire_accounts.pi
    root   20879  0.0  0.1  3316   196  ??  Is   28Feb12   0:00.09 /usr/local/bin/minicron 86400 /var/run/update_alias_url_
    root   20943  0.0  0.4  4948  1100  ??  Is   28Feb12   8:59.26 /usr/sbin/syslogd -c -c -l /var/dhcpd/var/run/log -f /va
    root   22238  0.0  0.6  5116  1372  ??  Ss   28Feb12  99:21.51 /usr/local/sbin/openvpn --config /var/etc/openvpn/client
    root   22652  0.0  2.4 10008  5892  ??  SN   31Aug12   1:22.85 /usr/sbin/tcpdump -s 256 -v -l -n -e -ttt -i pflog0
    root   22711  0.0  0.2  3316   448  ??  IN   31Aug12   0:35.32 logger -t pf -p local0.info
    root   25226  0.0  1.0 12108  2376  ??  Is    2:44PM   0:00.14 /usr/local/sbin/filterdns -p /tmp/filterdns-cpah.pid -i
    dhcpd  26565  0.0  0.9  8436  2296  ??  Ss    7Sep12   3:17.26 /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /
    root   26684  0.0  0.6  5272  1404  ??  Is   28Feb12   0:00.01 /usr/sbin/sshd
    root   27154  0.0  0.9  5576  2108  ??  S     2:44PM   0:00.08 /usr/local/sbin/lighttpd -f /var/etc/lighty-CaptivePorta
    root   27550  0.0  0.3  3532   620  ??  Is   28Feb12   0:00.42 /usr/local/sbin/sshlockout_pf 15
    root   27602  0.0  3.0 32092  7368  ??  Is    2:44PM   0:00.27 /usr/local/bin/php
    root   28089  0.0  0.3  3316   768  ??  Is    2:44PM   0:00.01 /usr/local/bin/minicron 60 /var/run/cp_prunedb.pid /etc/
    root   28358  0.0  3.0 32092  7368  ??  Is    2:44PM   0:00.27 /usr/local/bin/php
    root   30798  0.0  0.1  3436   152  ??  Is   28Feb12   0:00.03 /usr/sbin/inetd -wW -R 0 -a 127.0.0.1 /var/etc/inetd.con
    root   31219  0.0  3.0 32092  7380  ??  I     2:44PM   0:00.00 /usr/local/bin/php
    root   31476  0.0  3.0 32092  7380  ??  I     2:44PM   0:00.00 /usr/local/bin/php
    root   31821  0.0  3.0 32092  7380  ??  I     2:44PM   0:00.00 /usr/local/bin/php
    root   31922  0.0  3.0 32092  7380  ??  I     2:44PM   0:00.00 /usr/local/bin/php
    root   32169  0.0  7.3 39260 17940  ??  IN    2:44PM   0:01.05 /usr/local/bin/php -f /etc/rc.filter_configure_sync
    root   34062  0.0  0.4  4496  1068  ??  S     2:43PM   0:00.03 /usr/local/bin/rrdtool -
    root   46065  0.0  0.3  3316   652  ??  Ss   28Feb12 110:20.73 /usr/local/sbin/apinger -c /var/etc/apinger.conf
    root   50355  0.0  7.3 39260 17896  ??  IN    2:41PM   0:01.05 /usr/local/bin/php -f /etc/rc.filter_configure_sync
    root   54262  0.0  1.4  7612  3480  ??  S    28Feb12  30:40.66 /usr/local/sbin/lighttpd -f /var/etc/lighty-webConfigura
    root   54517  0.0  0.0     0     0  ??  Z    28Feb12  66:53.76 <defunct>root   55563  0.0  0.0     0     0  ??  Z    28Feb12 239:57.08 <defunct>root   55928  0.0  0.0     0     0  ??  Z    28Feb12  20:47.48 <defunct>root   56079  0.0  0.2  3316   524  ??  Ss   28Feb12   0:32.77 ntpd: [priv] (ntpd)
    root   56281  0.0  0.0     0     0  ??  Z    28Feb12  20:59.64 <defunct>root   56618  0.0  0.0     0     0  ??  Z    28Feb12  26:27.41 <defunct>root   37159  0.0  0.0  3684     0  u0  IWs  -         0:00.00 login [pam] (login)
    root   37712  0.0  0.0  3656     0  u0  IW   -         0:00.00 -sh (sh)
    root   39851  0.0  0.1  3656   220  u0  I+   28Feb12   0:00.01 /bin/sh /etc/rc.initial
    _ntp   55519  0.0  0.2  3316   556  u0- I    28Feb12   2:40.48 ntpd: ntp engine (ntpd)
    root   13484  0.0  0.4  3456   892   0  R+    3:14PM   0:00.01 ps aux
    root   19426  0.0  0.4  3656   976   0  Is    2:10PM   0:00.02 /bin/sh /etc/rc.initial
    root   25765  0.0  0.9  4696  2144   0  S     2:10PM   0:00.13 /bin/tcsh</defunct></defunct></defunct></defunct></defunct> 
    

    My system log looks like this.

    Sep 27 14:17:37 cr-firewall kernel: pid 45333 (php), uid 0, was killed: out of swap space
    Sep 27 17:08:25 cr-firewall kernel: pid 62341 (php), uid 0, was killed: out of swap space
    Sep 27 17:09:26 cr-firewall php: : The command '/sbin/ipfw table 1 delete 192.168.1.208' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
    Sep 27 17:09:26 cr-firewall php: : The command '/sbin/ipfw table 2 delete 192.168.1.208' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
    Sep 27 17:09:26 cr-firewall php: : The command '/sbin/ipfw pipe 20794 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
    Sep 27 17:09:27 cr-firewall php: : The command '/sbin/ipfw pipe 20795 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
    Sep 27 21:13:27 cr-firewall kernel: pid 54687 (php), uid 0, was killed: out of swap space
    Sep 28 02:23:39 cr-firewall kernel: pid 30905 (php), uid 0, was killed: out of swap space
    Sep 28 03:34:27 cr-firewall filterdns: host_dns: failed looking up "ils.larl.org": hostname nor servname provided, or not known
    Sep 28 03:34:27 cr-firewall filterdns: host_dns: failed looking up "ils.larl.org": hostname nor servname provided, or not known
    Sep 28 03:34:27 cr-firewall filterdns: host_dns: failed looking up "ils.larl.org": hostname nor servname provided, or not known
    Sep 28 03:34:27 cr-firewall filterdns: host_dns: failed looking up "ils.larl.org": hostname nor servname provided, or not known
    Sep 28 04:10:02 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 04:10:02 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 04:18:01 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 04:18:01 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 04:25:01 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 04:25:01 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 10:07:59 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 10:07:59 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 11:57:07 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 11:57:07 cr-firewall php: /index.php: Successful webConfigurator login for user 'admin' from 216.239.28.98
    Sep 28 16:58:35 cr-firewall check_reload_status: Syncing firewall
    Sep 28 16:59:10 cr-firewall check_reload_status: Syncing firewall
    Sep 28 16:59:43 cr-firewall check_reload_status: Syncing firewall
    Sep 28 17:00:11 cr-firewall check_reload_status: Syncing firewall
    Sep 28 17:00:32 cr-firewall check_reload_status: Syncing firewall
    Sep 28 17:00:50 cr-firewall check_reload_status: Syncing firewall
    Sep 28 17:00:50 cr-firewall check_reload_status: Reloading filter
    Sep 28 12:00:50 cr-firewall php: /firewall_shaper.php: Creating rrd update script
    Sep 28 12:00:57 cr-firewall kernel: pid 14302 (php), uid 0, was killed: out of swap space
    Sep 28 17:01:16 cr-firewall check_reload_status: Syncing firewall
    Sep 28 12:01:24 cr-firewall apinger: Error while feeding rrdtool: Broken pipe
    Sep 28 17:01:37 cr-firewall check_reload_status: Syncing firewall
    Sep 28 17:01:45 cr-firewall check_reload_status: Syncing firewall
    Sep 28 17:01:45 cr-firewall check_reload_status: Reloading filter
    Sep 28 12:01:53 cr-firewall kernel: pid 19436 (php), uid 0, was killed: out of swap space
    Sep 28 12:02:02 cr-firewall kernel: pid 19636 (php), uid 0, was killed: out of swap space
    Sep 28 12:04:39 cr-firewall kernel: pid 21966 (php), uid 0, was killed: out of swap space
    Sep 28 17:04:52 cr-firewall check_reload_status: Syncing firewall
    Sep 28 17:04:57 cr-firewall check_reload_status: Reloading filter
    Sep 28 12:16:12 cr-firewall kernel: pid 33523 (php), uid 0, was killed: out of swap space
    Sep 28 17:17:52 cr-firewall check_reload_status: Syncing firewall
    Sep 28 17:17:57 cr-firewall check_reload_status: Syncing firewall
    Sep 28 17:17:57 cr-firewall check_reload_status: Reloading filter
    Sep 28 12:17:59 cr-firewall php: /firewall_shaper.php: Creating rrd update script
    Sep 28 12:18:05 cr-firewall kernel: pid 8111 (php), uid 0, was killed: out of swap space
    Sep 28 12:18:24 cr-firewall apinger: Error while feeding rrdtool: Broken pipe
    Sep 28 12:20:17 cr-firewall kernel: pid 8159 (php), uid 0, was killed: out of swap space
    Sep 28 18:57:39 cr-firewall check_reload_status: Syncing firewall
    Sep 28 18:57:45 cr-firewall check_reload_status: Syncing firewall
    Sep 28 18:57:45 cr-firewall check_reload_status: Reloading filter
    Sep 28 13:57:47 cr-firewall php: /firewall_shaper.php: Creating rrd update script
    Sep 28 13:57:53 cr-firewall kernel: pid 57576 (php), uid 0, was killed: out of swap space
    Sep 28 13:58:25 cr-firewall apinger: Error while feeding rrdtool: Broken pipe
    Sep 28 19:08:33 cr-firewall check_reload_status: Syncing firewall
    Sep 28 19:08:57 cr-firewall check_reload_status: Syncing firewall
    Sep 28 19:08:57 cr-firewall check_reload_status: Reloading filter
    Sep 28 14:08:58 cr-firewall php: /firewall_shaper.php: Creating rrd update script
    Sep 28 14:09:05 cr-firewall kernel: pid 32043 (php), uid 0, was killed: out of swap space
    Sep 28 14:09:25 cr-firewall apinger: Error while feeding rrdtool: Broken pipe
    Sep 28 14:10:24 cr-firewall sshd[18854]: Accepted keyboard-interactive/pam for admin from 216.239.28.98 port 45647 ssh2
    Sep 28 19:33:04 cr-firewall check_reload_status: Syncing firewall
    Sep 28 19:33:16 cr-firewall check_reload_status: Syncing firewall
    Sep 28 19:33:16 cr-firewall check_reload_status: Reloading filter
    Sep 28 14:33:17 cr-firewall php: /firewall_shaper.php: Creating rrd update script
    Sep 28 14:33:26 cr-firewall apinger: Error while feeding rrdtool: Broken pipe
    Sep 28 14:35:19 cr-firewall kernel: pid 21821 (php), uid 0, was killed: out of swap space
    Sep 28 14:36:20 cr-firewall php: : The command '/sbin/ipfw table 1 delete 192.168.1.199' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
    Sep 28 14:36:20 cr-firewall php: : The command '/sbin/ipfw table 2 delete 192.168.1.199' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
    Sep 28 14:36:20 cr-firewall php: : The command '/sbin/ipfw pipe 20014 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
    Sep 28 14:36:20 cr-firewall php: : The command '/sbin/ipfw pipe 20015 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
    Sep 28 19:41:28 cr-firewall check_reload_status: Syncing firewall
    Sep 28 19:41:37 cr-firewall check_reload_status: Syncing firewall
    Sep 28 19:41:37 cr-firewall check_reload_status: Reloading filter
    Sep 28 14:41:37 cr-firewall php: /firewall_shaper.php: Creating rrd update script
    Sep 28 14:42:26 cr-firewall apinger: Error while feeding rrdtool: Broken pipe
    Sep 28 19:43:54 cr-firewall check_reload_status: Syncing firewall
    Sep 28 14:43:55 cr-firewall kernel: IP firewall unloaded
    Sep 28 19:43:55 cr-firewall check_reload_status: Reloading filter
    Sep 28 19:44:10 cr-firewall check_reload_status: Syncing firewall
    Sep 28 14:44:11 cr-firewall kernel: ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding enabled, default to accept, logging disabled
    Sep 28 19:44:16 cr-firewall check_reload_status: Reloading filter
    
    

    My theory right now is that the system gets low on memory, then starts killing processes randomly, or however it decided which process to kill, and a process that holds a lock gets killed, which deadlocks the system until it is rebooted.

    Does anyone have any tips for figuring out what is causing the memory usage to run out of control?
    Thanks
    Josh



  • This redmine ticket may be related to my problem.

    http://redmine.pfsense.org/issues/2475

    According to the submitter, the captive portal page limits don't actually work because a lighttpd module isn't included with pfsense.  So it could be that the captive portal is getting dos'd once in a while, which causes memory to be used up, which triggers system instability.

    Josh



  • The Mod_evasive per-ip connection to the captive portal issue should be fixed in 2.0.3, which should keep the Captive portal from causing OOM events.  I've upgraded several sites and the memory usage does seem to be lower now.
    Josh


Log in to reply