Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SLBD using entire CPU

    Scheduled Pinned Locked Moved Routing and Multi WAN
    35 Posts 8 Posters 14.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      sai
      last edited by

      I am using RC3 and I do not see this problem.

      1 Reply Last reply Reply Quote 0
      • W
        wjs
        last edited by

        @sai:

        I am using RC3 and I do not see this problem.

        What are your load balancer configurations?

        1 Reply Last reply Reply Quote 0
        • S
          sai
          last edited by

          Status:
          Loadb  gateway  opt2  Online Last change Nov 15
                                          wan  Online Last change Nov 15

          Failoverlb gateway  opt2  Online Last change Nov 15
                                          wan  Online Last change Nov 15

          Config:
          Loadb  gateway  opt2    192.168.100.1  (this is a cable modem and we need to change the monitor ip)
                                          wan    a.b.c.1

          Failoverlb gateway opt2    192.168.100.1
                                          wan        a.b.c.1

          top:
          last pid: 52013;  load averages:  0.25,  0.12,  0.09  up 0+02:29:12    11:31:10
          32 processes:  1 running, 31 sleeping

          Mem: 32M Active, 8572K Inact, 25M Wired, 14M Buf, 114M Free
          Swap:

          PID USERNAME  THR PRI NICE  SIZE    RES STATE    TIME  WCPU COMMAND
          51634 root        1  -8    0 14796K 12004K piperd  0:03  4.07% php
            861 root        1  4    0  3328K  2464K kqread  0:14  0.00% lighttpd
          6391 root        5  20    0  1908K  1128K kserel  0:08  0.00% slbd
            866 root        1  4    0 22384K 19896K accept  0:05  0.00% php
            250 root        1  96    0  1440K  1072K select  0:04  0.00% syslogd
          1129 root        1  8  20  1768K  1208K wait    0:03  0.00% sh
            366 root        1 -58    0  4208K  2492K bpf      0:02  0.00% tcpdump
            367 root        1  -8    0  1276K  728K piperd  0:02  0.00% logger
            919 nobody      1  96    0  1472K  1108K select  0:02  0.00% dnsmasq
          1274 root        1  8  20  1272K  716K nanslp  0:01  0.00% check_reload_status
          1155 dhcpd      1  96    0  2268K  1892K select  0:00  0.00% dhcpd
            285 root        1  96    0  2804K  1788K select  0:00  0.00% mpd
          1245 _ntp        1  96    0  1340K  1052K select  0:00  0.00% ntpd
            872 root        1  8    0 14200K  4644K wait    0:00  0.00% php
            786 proxy      1  4    0  704K  452K kqread  0:00  0.00% pftpx
            808 proxy      1  4    0  704K  504K kqread  0:00  0.00% pftpx
          1248 root        1  8    0  1384K  1032K nanslp  0:00  0.00% cron
            862 root        1  8    0 14200K  4644K wait    0:00  0.00% php

          The cpu is a VIA (probably a C7 or maybe a C3).

          1 Reply Last reply Reply Quote 0
          • W
            wjs
            last edited by

            I've been trying to execute the killall slbd command. In its defaul state (sending the TERM command) the processes don't exit. The killall -9 slbd command does seem to work (it is kill -9 !!! )

            Anyway, anyone have any advice on changing the script to killall -9 slbd?

            i'm not sure that this is a good idea…

            1 Reply Last reply Reply Quote 0
            • S
              sai
              last edited by

              maybe you should reinstall using the latest image? probably some update went wrong which is why the script you have is not working.

              1 Reply Last reply Reply Quote 0
              • W
                wjs
                last edited by

                This install was done to a clean hard drive and configured from scratch. I currently have RC3 installed also.

                I'm not sure what you're recommending. Would you like me to put a newer snapshot on?

                1 Reply Last reply Reply Quote 0
                • S
                  sai
                  last edited by

                  I was recommending a clean install from scratch

                  :)

                  1 Reply Last reply Reply Quote 0
                  • S
                    Superman
                    last edited by

                    I have to say that my system is a complete clean install from the released 1.2RC3, and I'm seeing the same problems. I tried the killall -9 slbd and that worked on my system as well. But the regular script doesn't work at all.

                    1 Reply Last reply Reply Quote 0
                    • W
                      wjs
                      last edited by

                      I think i'm going to try modifying that scrpt and see what happens.

                      I'm not to happy about using kill -9 every 5 houts to fix a problem though…
                      any other ideas?

                      1 Reply Last reply Reply Quote 0
                      • W
                        wjs
                        last edited by

                        I changed the script so that it kill -9's
                        I ran it by hand and it worked. now its time to wait a few hours and see if the problem is "fixed".

                        $ cat /usr/local/sbin/reset_slbd.sh

                        #!/bin/sh
                        
                        if [ `ps awux | grep slbd | wc -l` -gt 0 ]; then
                        	killall slbd
                        	killall -9 slbd
                        	/usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
                        fi
                        
                        1 Reply Last reply Reply Quote 0
                        • S
                          SurfceS
                          last edited by

                          I have the same problem here…

                          http://forum.pfsense.org/index.php/topic,6852.0.html

                          I have two boxes, I change the script on the main one, and let the other one with the old script.

                          After few hour, I can see only 2 slbd processes on the main one, and 14 one the second one... So, that did the trick.

                          Script was changed as this, as the second killall command is not needed.

                          #!/bin/sh

                          if [ ps awux | grep slbd | wc -l -gt 0 ]; then
                          killall -9 slbd
                          killall slbd
                          /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
                          fi

                          Regards,

                          1 Reply Last reply Reply Quote 0
                          • W
                            wjs
                            last edited by

                            Things appear stable although I have very little traffic being routed through the pool.
                            It looks like this 'fix' works.

                            I'm definatly not a developer but should something like this be considered for integration into the source tree?

                            I'm not sure who to talk to even to mention this…

                            1 Reply Last reply Reply Quote 0
                            • W
                              wjs
                              last edited by

                              I've started moving more and more traffic back into the load balancing pools…
                              SLBD is getting stuck at full usage again! even with the modified script

                              I think the script isn't being run often enough.
                              This is rapidly turning into less of a fix and more of a workaround. I want to make this right.

                              Is anyone having this problem still? or am I going nuts??

                              I currently have the majority of my traffic going into my primary wan port without going through a pool. The rest (light web browsing from a few users) goes into a pool which has its own two wan ports.

                              1 Reply Last reply Reply Quote 0
                              • C
                                cmb
                                last edited by

                                It's not a fix, it's a work around until we can properly test and implement an alternative to slbd. We know what the problem is, unfortunately it's pretty much impossible to solve. The solution is ditching slbd for hoststated, which will be done in a future version.

                                1 Reply Last reply Reply Quote 0
                                • C
                                  cmb
                                  last edited by

                                  Also, this work around does seem to work for the vast majority of people.

                                  wjs: how much load are you pushing to cause it to break down so easily?

                                  1 Reply Last reply Reply Quote 0
                                  • W
                                    wjs
                                    last edited by

                                    cmb,
                                    Thanks very much for pushing that change, I saw it on the cvs track.

                                    right now its only my web browsing that going into the wan pool. the primary wan port, which is not part of the pool at the moment, has a good bit of traffic. last night we had about 1MB/s continuous sometimes going up to about 10MB/s when someone would pull down something big.

                                    The cpu load hovers under 15% or 20% but i think most of that is because i've got the whole dashboard open.
                                    I am only getting one process at a time maxing out before the script kicks in so the system never goes full load. (dual cpu system)

                                    I'm not sure this answered your question…

                                    If there is anything I can do to help get "hoststated" working for the next version let me know.

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      sullrich
                                      last edited by

                                      @wjs:

                                      If there is anything I can do to help get "hoststated" working for the next version let me know.

                                      Basically translate the .conf file from slbd -> hostsated.  I really want to get this in here and its on my gigantic whiteboard now but if you want to do the work please do so as my gigantic whiteboard has many entries now :)

                                      1 Reply Last reply Reply Quote 0
                                      • W
                                        wjs
                                        last edited by

                                        i'm not an expert but i'll take a shot at it

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          Superman
                                          last edited by

                                          Any steps we can take to install and test hostated, maybe having it alongside slbd just in case?

                                          The change in the script is working, but in between several processes get started and start to chew up 100% CPU again. It would be nice to try out the newer service, but have the other to fall-back on just in case…

                                          1 Reply Last reply Reply Quote 0
                                          • J
                                            Juve
                                            last edited by

                                            Sorry to pull that topic up again but I am suffering the same problem with a clean and fresh 1.2 install. The pool is a failover pool with two WANs. One of the line is currently down so one gateway can't be pinged.

                                            Am I unlucky or is it something people still encounter ?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.