Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Severe Problems on ALIX 2D13: high cpu load, out of memory/swap

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    15 Posts 6 Posters 6.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      setchi
      last edited by

      Hello,

      I have two ALIX 2D13 boxes and did an upgrade from 1.2.3 to 2.0 on one box and a fresh install of 2.0 on the other box.
      Since the upgrades I have severe problems with processes (like php) dying because of low memory or swap space and the overall performance of the webgui is very poor (takes 1 minute to load pages, displays error 500 randomly, have php-related error messages on the top of the page, etc.)

      I have no additional packages installed and just some very simple filtering rules and the box has a load value between 3 and 7.
      Whenever I try to install packages, the process never finishes and the webgui stops loading on a certain point.

      I have error messages in system.log:

      Dec  3 14:55:53 gateway kernel: pid 9536 (php), uid 0, was killed: out of swap space
      Dec  3 14:55:53 gateway kernel: pid 17096 (php), uid 0, was killed: out of swap space
      Dec  3 13:56:42 gateway check_reload_status: Syncing firewall
      Dec  3 14:56:42 gateway php: /pkg_mgr_install.php: Beginning package installation for OpenVPN Client Export Utility.
      Dec  3 14:57:15 gateway kernel: pid 12524 (php), uid 0, was killed: out of swap space
      Dec  3 14:57:15 gateway kernel: pid 42267 (php), uid 0, was killed: out of swap space
      

      lighttpd.error.log

      2011-12-03 14:43:33: (mod_fastcgi.c.2566) unexpected end-of-file (perhaps the fastcgi process died): pid: 51917 socket: unix:/tmp/php-fastcgi.socket-19
      2011-12-03 14:43:33: (mod_fastcgi.c.3354) response not received, request sent: 492 on socket: unix:/tmp/php-fastcgi.socket-19 for /preload.php?, closing connection
      2011-12-03 14:46:26: (mod_fastcgi.c.2566) unexpected end-of-file (perhaps the fastcgi process died): pid: 51917 socket: unix:/tmp/php-fastcgi.socket-19
      2011-12-03 14:46:26: (mod_fastcgi.c.3369) response already sent out, but backend returned error on socket: unix:/tmp/php-fastcgi.socket-19 for /pkg_mgr.php?, terminating connection
      2011-12-03 14:50:16: (mod_fastcgi.c.2566) unexpected end-of-file (perhaps the fastcgi process died): pid: 51917 socket: unix:/tmp/php-fastcgi.socket-19
      2011-12-03 14:50:16: (mod_fastcgi.c.3369) response already sent out, but backend returned error on socket: unix:/tmp/php-fastcgi.socket-19 for /pkg_mgr_install.php?id=OpenVPN%20Client%20Export%20Utility, terminating connection
      

      Anyone has similar issues? Any hints on how to find out the problem?
      Thanks in between.

      EDIT: I should have mentioned that I use 4GB CF-Cards ;)

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Running from CF you should be using the NanoBSD images which don't have swap at all so I'm surprised to see that.
        Are you running NanoBSD?

        Steve

        1 Reply Last reply Reply Quote 0
        • S
          setchi
          last edited by

          Yes, I'm running the nanobsd image, so the message about swap is really strange.
          Is it possible that the box is running out of memory?
          I think this should not happen unless I have some memory consuming packages installed.

          The box has a Nanobsd-Image, NO packages, a very simple ruleset and no VPN activated at all.
          Why are the php/fastcgi processes terminating?

          Thanks!

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            I'm not sure to be honest. I've just been looking on the forum and it seems those messages can appear on a NanoBSD install if you run low on memory.
            How much RAM do you have in those boxes? How much does it show as used in the dashboard?

            Steve

            1 Reply Last reply Reply Quote 0
            • S
              setchi
              last edited by

              Hi,

              The Dashboard says 70% and the box has a total of 256MB and a 500MHz AMD Geode CPU.

              Thanks

              EDIT: And the dashboard always says 100% for CPU usage !?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, 100% you say. Even when you're not doing anything?

                What's using all your cpu cycles? Try running at the console:

                top -S
                

                Steve

                1 Reply Last reply Reply Quote 0
                • S
                  setchi
                  last edited by

                  It was a process called "idlepoll", when I deactivated the device polling option the cpu load went down, but the memory problems with dying processes are the same as before - now my box doesn't even display the dashboard :(

                  Thanks

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Device polling can cause some odd things, at least it did for me last time I played with it.
                    I would make sure you have rebooted after enabling/disabling polling, switching it didn't always behave as I expected.
                    Otherwise I'm out of suggestions. Add more ram? I can't really see why you should have to if you haven't started using more services.  :-\

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • S
                      setchi
                      last edited by

                      There is no possibility to add more RAM on these embedded boxes.
                      I found another strange message on the serial console:

                      Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable.
                      
                      1 Reply Last reply Reply Quote 0
                      • M
                        Metu69salemi
                        last edited by

                        Read this http://forum.pfsense.org/index.php/topic,38707.0.html

                        To ease out:
                        run  a shell command:```
                        sysctl vm.pmap.shpgperproc

                        answer should be 200
                        then create a file, if it doesn't exist```
                        /boot/loader.conf.local
                        

                        add next line in there```
                        vm.pmap.shpgperproc=500

                        save and reboot
                        1 Reply Last reply Reply Quote 0
                        • C
                          cmb
                          last edited by

                          "out of swap space" just means you're entirely out of memory. There is no swap on embedded.

                          You running any packages? The stock base system wouldn't exhaust 256 MB unless you had 200,000+ active connections, but there are many other things that could with packages.

                          1 Reply Last reply Reply Quote 0
                          • W
                            wallabybob
                            last edited by

                            My home firewall has been running "full" pfSense for over 3 years on 256MB of RAM and I can't recall ever seeing it even use swap. I'm currently running only two packages: siproxd and pfflowd. I have also run bandwidthd and countryblock. top currently tells me I have 92MB free.

                            What packages are you trying to run on that system?

                            Please post a snapshot of top output. Here's a snapshot of top from my system:```

                            last pid: 18074;  load averages:  0.07,  0.04,  0.06                                                up 2+07:49:06  14:45:23
                            60 processes:  2 running, 50 sleeping, 8 zombie
                            CPU:  1.6% user,  0.0% nice,  4.3% system,  1.2% interrupt, 93.0% idle
                            Mem: 57M Active, 18M Inact, 47M Wired, 32M Buf, 92M Free
                            Swap: 260M Total, 260M Free

                            PID USERNAME  THR PRI NICE  SIZE    RES STATE    TIME  WCPU COMMAND
                            12997 root        1  44    0  3712K  2084K RUN      0:01  0.10% top
                            3552 root        1  76  20  3656K  1584K wait    5:17  0.00% sh
                            19946 root        1  64  20  3316K  1364K RUN      2:25  0.00% apinger
                            51310 root        1  64  20  7688K  4196K bpf      0:50  0.00% bandwidthd
                            50565 root        1  64  20  8712K  5232K bpf      0:50  0.00% bandwidthd
                            50805 root        1  44    0  7688K  3876K bpf      0:50  0.00% bandwidthd
                            50259 root        1  45    0  8712K  4908K bpf      0:50  0.00% bandwidthd
                            50479 root        1  44    0  7688K  3888K bpf      0:49  0.00% bandwidthd
                            50394 root        1  44    0  8712K  4196K bpf      0:49  0.00% bandwidthd
                            50777 root        1  64  20  8712K  4520K bpf      0:48  0.00% bandwidthd
                            51040 root        1  64  20  7688K  4216K bpf      0:48  0.00% bandwidthd
                            57601 root        1  76    0 34140K 23668K accept  0:46  0.00% php
                            61779 nobody      1  44    0  5556K  2676K select  0:45  0.00% dnsmasq
                            57693 root        1  76    0 33116K 21108K accept  0:44  0.00% php
                            24595 root        1  44    0  4944K  2544K select  0:42  0.00% syslogd
                            49099 root        1  44    0  2912K  740K bpf      0:37  0.00% pfflowd
                            60468 dhcpd      1  44    0  8436K  5372K select  0:28  0.00% dhcpd
                            50016 root        2  65  r21  7824K  5880K select  0:19  0.00% siproxd
                            51812 root        1  44    0  8464K  4580K select  0:19  0.00% mpd5
                            16171 root        1  44    0  5176K  2632K select  0:15  0.00% hostapd
                            28718 root        1  44    0  5912K  3040K bpf      0:12  0.00% tcpdump
                            52732 root        1  44    0  6588K  3500K kqread  0:12  0.00% lighttpd

                            Even with 8 copies of bandwidthd running (that I think shouldn't be running because I removed the package some reboots ago) I still have 92MB free memory.
                            
                            I wonder if you have lots of extraneous processes. I have a vague memory of some reports of runaway process creation in some 2.0 snapshots builds.
                            1 Reply Last reply Reply Quote 0
                            • S
                              setchi
                              last edited by

                              Hi,

                              As mentioned before I have NO packages installed. The pfSense 1.2.3 worked flawlessly on the same hardware for a long time.
                              I did a upgrade to 2.0 and had device polling activated under 1.2.3 (gave me more bandwidth).
                              With device polling the load was at 100% on 2.0 and the webgui wasn't useable at all.

                              Afterwards I deactivated device polling and now the webgui works again (if you don't worry about the frequent error 500).
                              But the errors I mentioned above are the same, especially when the box has been up for some days.

                              I had to do a reboot yesterday evening, so currently my top displays the following:

                              last pid: 53715;  load averages:  0.20,  0.08,  0.02                                                                                            up 0+17:06:56  15:15:58
                              459 processes: 1 running, 458 sleeping
                              CPU:  0.4% user,  0.0% nice,  1.5% system,  2.3% interrupt, 95.8% idle
                              Mem: 126M Active, 14M Inact, 70M Wired, 5572K Cache, 34M Buf, 18M Free
                              Swap:
                              
                                PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
                              53715 root        1  45    0  4736K  2196K RUN      0:00  0.98% top
                              58990 root        1  76   20  3656K  1080K wait     1:04  0.00% sh
                              26060 root        1  44    0  4944K  1636K select   0:51  0.00% syslogd
                              34843 root        1  64   20  3316K   692K select   0:34  0.00% apinger
                              27425 root        1  44    0  3316K   740K piperd   0:22  0.00% logger
                              27329 root        1  44    0  5912K  2712K bpf      0:19  0.00% tcpdump
                               8719 root        1  76    0 33116K 15040K accept   0:17  0.00% php
                               8684 root        1  76    0 33116K 12680K accept   0:09  0.00% php
                               4764 dhcpd       1  44    0  8436K  1736K select   0:09  0.00% dhcpd
                               5482 root        1  76    0 33116K 12580K accept   0:07  0.00% php
                               1460 root        1  76    0 33116K 12388K accept   0:06  0.00% php
                               1278 root        1  76    0 33116K 15960K accept   0:04  0.00% php
                              48287 root        1  44    0  6588K  3528K kqread   0:03  0.00% lighttpd
                               6467 nobody      1  44    0  5556K  1288K select   0:02  0.00% dnsmasq
                              10276 root        1  44    0  8464K  2800K select   0:02  0.00% mpd5
                               2945 root        1  44    0  5116K  2984K select   0:01  0.00% openvpn
                              54285 root        1  45    0 33116K 12240K accept   0:01  0.00% php
                              14737 root        1  76    0 33116K 10740K accept   0:01  0.00% php
                              59858 root        1  76    0 34140K 14748K accept   0:01  0.00% php
                              35322 sshd        1  76    0  6616K  2756K select   0:01  0.00% sshd
                               8924 root        1  76    0 33116K 14596K accept   0:01  0.00% php
                              57661 root        1  44    0 33116K 11596K accept   0:01  0.00% php
                              14508 root        1  76    0 33116K 11084K accept   0:01  0.00% php
                              17403 _ntp        1  64   20  3316K  1044K select   0:01  0.00% ntpd
                              37689 root        1  44    0  3436K  1172K select   0:00  0.00% inetd
                               1200 root        1  76    0 33116K 11744K accept   0:00  0.00% php
                              51639 root        1  44    0  3404K   832K nanslp   0:00  0.00% cron
                              14983 root        1  76    0 33116K  9772K accept   0:00  0.00% php
                              48520 root        1  76    0 32092K  6388K wait     0:00  0.00% php
                              53598 root        1  76    0 32092K  6388K wait     0:00  0.00% php
                              53449 root        1  76    0 32092K  6388K wait     0:00  0.00% php
                              52507 root        1  76    0 32092K  6388K wait     0:00  0.00% php
                              52757 root        1  76    0 32092K  6388K wait     0:00  0.00% php
                              53888 root        1  76    0 32092K  6388K wait     0:00  0.00% php
                              44532 root        1  44    0  7992K  2924K select   0:00  0.00% sshd
                              53150 root        1  76    0 32092K  6388K wait     0:00  0.00% php
                              53426 root        1  76    0 32092K  6388K wait     0:00  0.00% php
                              
                              

                              I also wondered why there are SOO many php processes running…

                              Thanks for now

                              1 Reply Last reply Reply Quote 0
                              • W
                                wallabybob
                                last edited by

                                @setchi:

                                I also wondered why there are SOO many php processes running…

                                And what are they waiting for so close to their startup?

                                Perhaps the system log will give some clues. pfSense shell command clog /var/log/system.log can be used to display the whole log. (The web GUI Status -> System Logs normally displays only the most recent 50 lines or so.)

                                1 Reply Last reply Reply Quote 0
                                • jimpJ
                                  jimp Rebel Alliance Developer Netgate
                                  last edited by

                                  If you removed bandwidthd (and siproxd I see) they didn't fully come out. Try switching to the other NanoBSD slice to see if it behaves better. First make extra sure there are no packages installed under System > Packages.

                                  It would be rather easy to run an ALIX out of RAM.

                                  It's normal to have a couple PHP processes going, but top doesn't show you the whole story, check the output of "ps uxawww" and see if it shows more about what PHP is doing, like if it's being used to run a certain script, like rc.filter_configure_sync, etc.

                                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                  Need help fast? Netgate Global Support!

                                  Do not Chat/PM for help!

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.