Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SLBD using entire CPU

    Scheduled Pinned Locked Moved Routing and Multi WAN
    35 Posts 8 Posters 14.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • W
      wjs
      last edited by

      I know there is a thread which discusses this but I was unable to post to it (inactive for too long?)

      I have been trying to configure multiple load balancing with multiple wans. Everything seems to be functioning ok. However SLBD seems to get stuck at 100% use. This doesn't take down the machine as I have 2 cpu's.

      I saw in a thread that this "would be fixed in 1.2 beta 1" but I am running 1.2 RC3 and still having this problem

      If someone could point me towards any information or offer any advice it would be greatly appreciated.

      1 Reply Last reply Reply Quote 0
      • W
        wjs
        last edited by

        if it's any help i've included the output from the top command. The last time this happened, a few hours ago, i kill -9'ed the process and then started it again by hand.
        The system is functioning and it seems like everything is ok (although the network load is very low at the moment anyway)

        $ top
        last pid: 15825;  load averages:  1.06,  1.01,  1.00  up 0+17:23:12    20:40:52
        35 processes:  2 running, 33 sleeping

        Mem: 27M Active, 9316K Inact, 29M Wired, 16M Buf, 675M Free
        Swap: 2048M Total, 2048M Free

        PID USERNAME  THR PRI NICE  SIZE    RES STATE  C  TIME  WCPU COMMAND
        7073 root        1 115    0  1924K  1104K CPU0  0  57:17 98.93% slbd
        57045 root        1  -8    0 15228K 13288K piperd 0  0:08  0.93% php
          411 root        1  4    0  3976K  3536K kqread 1  2:06  0.44% lighttpd
        54436 root        1  4    0 14768K 12340K accept 0  0:11  0.39% php
        85342 root        1  8  20  1864K  1304K wait  0  0:15  0.00% sh
          188 root        1  96    0  1440K  1072K select 1  0:08  0.00% syslogd
          705 root        1  8  20  1272K  716K nanslp 1  0:07  0.00% check_reload_status
          324 root        1 -58    0  3716K  1912K bpf    0  0:06  0.00% tcpdump
        1584 dhcpd      1  96    0  2540K  2172K select 0  0:04  0.00% dhcpd
          325 root        1  -8    0  1276K  728K piperd 1  0:03  0.00% logger
        9271 root        6  20    0  1924K  1140K kserel 0  0:01  0.00% slbd
        83441 nobody      1 116  20  1472K  1128K select 0  0:01  0.00% dnsmasq
          683 _ntp        1  96    0  1340K  1052K select 0  0:00  0.00% ntpd
          380 proxy      1  4    0  704K  452K kqread 0  0:00  0.00% pftpx
          694 root        1  8    0  1384K  1016K nanslp 0  0:00  0.00% cron
        83681 proxy      1  4  20  704K  504K kqread 0  0:00  0.00% pftpx
          109 root        1  96    0  504K  360K select 1  0:00  0.00% devd
          684 root        1  96    0  1376K  1048K select 0  0:00  0.00% ntpd

        1 Reply Last reply Reply Quote 0
        • S
          Superman
          last edited by

          Just to confirm that it seems still to be a problem here's the output from our pfSense firewall.

          Dual Pentium III Xeon
          256MB RDRAM
          3 Intel Pro/100 NICs

          # top
          last pid: 19867;  load averages: 41.03, 40.61, 40.37                                                                                 up 2+23:14:45  08:16:23
          83 processes:  39 running, 43 sleeping, 1 zombie
          CPU states:  7.6% user,  0.0% nice, 91.7% system,  0.7% interrupt,  0.0% idle
          Mem: 58M Active, 14M Inact, 30M Wired, 15M Buf, 137M Free
          Swap: 512M Total, 512M Free
          
            PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
          87681 root        1 132    0  2064K  1120K RUN    0  10:58  5.13% slbd
          33521 root        1 132    0  2060K  1116K RUN    0 245:02  5.03% slbd
          22389 root        1 132    0  2064K  1120K RUN    1 120:53  5.03% slbd
           5023 root        1 132    0  2068K  1124K RUN    0 273:17  4.98% slbd
          18288 root        1 132    0  2068K  1124K RUN    0 178:08  4.98% slbd
          74050 root        1 132    0  2064K  1120K RUN    0 143:04  4.93% slbd
          47444 root        1 132    0  2064K  1120K RUN    0 337:37  4.83% slbd
          60666 root        1 132    0  2064K  1120K RUN    0 321:13  4.83% slbd
          98236 root        1 132    0  2068K  1124K RUN    0 193:23  4.79% slbd
          68726 root        1 132    0  2064K  1120K RUN    0 146:49  4.79% slbd
          55706 root        1 132    0  2068K  1124K RUN    1 549:09  4.74% slbd
          80004 root        1 131    0  2068K  1128K RUN    0 462:31  4.74% slbd
           8586 root        1 131    0  2068K  1128K RUN    0 397:24  4.74% slbd
          17741 root        1 131    0  2064K  1124K RUN    0 259:00  4.74% slbd
          69227 root        1 132    0  2064K  1124K RUN    0 214:44  4.74% slbd
           7107 root        1 131    0  2064K  1120K RUN    0 186:13  4.74% slbd
          22395 root        1 132    0  2064K  1120K RUN    1 174:30  4.74% slbd
          61055 root        1 132    0  2064K  1120K RUN    0 151:43  4.74% slbd
          29996 root        1 131    0  2064K  1120K RUN    0 116:23  4.74% slbd
          80622 root        1 131    0  2064K  1120K RUN    0  92:01  4.74% slbd
           4439 root        1 132    0  2068K  1124K RUN    0  41:14  4.74% slbd
          88900 root        1 131    0  2068K  1128K RUN    0 437:57  4.69% slbd
          25584 root        1 131    0  2064K  1120K RUN    0 171:57  4.69% slbd
          78407 root        1 132    0  2064K  1120K RUN    1 140:01  4.69% slbd
           6488 root        1 132    0  2064K  1120K RUN    1  40:14  4.69% slbd
           1442 root        1 131    0  2068K  1124K RUN    0 411:39  4.64% slbd
          40966 root        1 132    0  2064K  1120K RUN    0 348:16  4.64% slbd
          48921 root        1 131    0  2068K  1128K RUN    0 230:18  4.64% slbd
          23749 root        1 131    0  2064K  1120K RUN    0  71:48  4.64% slbd
           4887 root        1 132    0  2064K  1120K CPU1   0  40:59  4.64% slbd
          32569 root        1 132    0  2068K  1124K RUN    1  29:48  4.59% slbd
          89364 root        1 132    0  2064K  1120K RUN    0  10:18  4.59% slbd
          53988 root        1 132    0  2068K  1124K RUN    1 556:30  4.54% slbd
          80933 root        1 131    0  2064K  1120K RUN    0  91:50  4.54% slbd
           2640 root        1 132   20  2068K  1132K RUN    0 420:57  2.29% slbd
           2794 root        1 132   20  2068K  1132K RUN    1 418:49  2.15% slbd
          19867 root        1 128    0  2444K  1664K CPU0   0   0:00  0.51% top
            667 root        1  96    0  1280K   716K select 0   0:40  0.00% choparp
            607 root        1   4    0  3408K  2548K kqread 0   0:36  0.00% lighttpd
           1344 root        1  -8   20  1868K  1308K piperd 1   0:22  0.00% sh
           1138 root        1   8    0  1720K  1156K wait   0   0:16  0.00% sh
            309 root        1 -58    0  4308K  2588K bpf    0   0:14  0.00% tcpdump
            192 root        1  96    0  1460K  1092K select 0   0:12  0.00% syslogd
            225 root        1  96    0  2804K  1792K select 0   0:06  0.00% mpd
          17433 root        1   8    0 14752K 11796K nanslp 0   0:06  0.00% php
           1137 root        1  96    0  1372K  1056K select 0   0:05  0.00% miniupnpd
            616 root        1   4    0 22912K 20528K accept 0   0:03  0.00% php
            310 root        1  -8    0  1276K   728K piperd 0   0:03  0.00% logger
           1286 root        1 116   20  2880K  2372K select 0   0:03  0.00% racoon
            570 proxy       1   4    0   704K   452K kqread 0   0:02  0.00% pftpx
          

          Relevant part of ps output:

          # ps wwaux
          USER     PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
          root   25584  5.6  0.4  2064  1120  ??  R    Mon02PM 172:04.36 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   19970  5.1  0.5  2064  1124  ??  R     8:16AM   0:10.30 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   22389  5.0  0.4  2064  1120  ??  R    Tue12AM 121:01.22 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   87681  4.8  0.4  2064  1120  ??  R     4:51AM  11:05.18 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   89364  4.8  0.4  2064  1120  ??  R     5:03AM  10:25.35 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   40966  4.8  0.4  2064  1120  ??  R    Sun08PM 348:23.30 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   80622  4.8  0.4  2064  1120  ??  R    Tue07AM  92:08.38 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root    4439  4.7  0.5  2068  1124  ??  R     7:58PM  41:21.34 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   88900  4.7  0.5  2068  1128  ??  R    Sun03PM 438:04.66 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root    7107  4.6  0.4  2064  1120  ??  R    Mon12PM 186:20.32 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   17741  4.6  0.5  2064  1124  ??  R    Mon03AM 259:07.83 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   61055  4.6  0.4  2064  1120  ??  R    Mon06PM 151:50.78 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   78407  4.6  0.4  2064  1120  ??  R    Mon08PM 140:08.98 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root    5023  4.5  0.5  2068  1124  ??  R    Mon02AM 273:24.07 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   22395  4.5  0.4  2064  1120  ??  R    Mon02PM 174:37.21 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   74050  4.5  0.4  2064  1120  ??  R    Mon07PM 143:11.92 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root    6488  4.5  0.4  2064  1120  ??  RL    8:13PM  40:21.46 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   23749  4.5  0.4  2064  1120  ??  R    12:04PM  71:55.50 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   29996  4.5  0.4  2064  1120  ??  R    Tue01AM 116:30.97 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root    8586  4.4  0.5  2068  1128  ??  R    Sun05PM 397:31.74 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   33521  4.4  0.4  2060  1116  ??  R    Mon05AM 245:09.41 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   53988  4.4  0.5  2068  1124  ??  R    Sun12PM 556:37.39 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   55706  4.4  0.5  2068  1124  ??  R    Sun12PM 549:16.78 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   68726  4.4  0.4  2064  1120  ??  R    Mon07PM 146:56.54 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   80933  4.4  0.4  2064  1120  ??  R    Tue07AM  91:57.58 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root    1442  4.4  0.5  2068  1124  ??  R    Sun04PM 411:46.58 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   47444  4.4  0.4  2064  1120  ??  R    Sun08PM 337:44.57 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   48921  4.4  0.5  2068  1128  ??  R    Mon06AM 230:26.46 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   69227  4.4  0.5  2064  1124  ??  R    Mon08AM 214:51.33 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   80004  4.4  0.5  2068  1128  ??  R    Sun02PM 462:38.85 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   98236  4.4  0.5  2068  1124  ??  R    Mon11AM 193:30.23 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   18288  4.3  0.5  2068  1124  ??  R    Mon01PM 178:15.61 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   60666  4.3  0.4  2064  1120  ??  R    Sun10PM 321:20.62 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root   32569  4.2  0.5  2068  1124  ??  R    11:15PM  29:55.74 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root    4887  4.2  0.4  2064  1120  ??  R     8:01PM  41:06.42 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root    2640  2.2  0.5  2068  1132  ??  RN   Sun09AM 421:01.26 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          root    2794  1.8  0.5  2068  1132  ??  RN   Sun09AM 418:53.32 /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
          

          I noticed in cvstrac that there was a script created to kill and restart slbd every 5 hours, but it doesn't seem to be actually working on my system, as the multiple slbd process from the ps output seem to indicate.

          Thanks

          1 Reply Last reply Reply Quote 0
          • W
            wjs
            last edited by

            I found the ticket number, 1316, and the check-ins,  18733,18734 and 18735.
            The script does look like it would work. Was something broken is RC3? I didn't try to load balance with previous versions.

            I'm going to try to manually execute that script and see if that fixes something.
            Maybe the script isn't being called?

            1 Reply Last reply Reply Quote 0
            • S
              sai
              last edited by

              I am using RC3 and I do not see this problem.

              1 Reply Last reply Reply Quote 0
              • W
                wjs
                last edited by

                @sai:

                I am using RC3 and I do not see this problem.

                What are your load balancer configurations?

                1 Reply Last reply Reply Quote 0
                • S
                  sai
                  last edited by

                  Status:
                  Loadb  gateway  opt2  Online Last change Nov 15
                                                  wan  Online Last change Nov 15

                  Failoverlb gateway  opt2  Online Last change Nov 15
                                                  wan  Online Last change Nov 15

                  Config:
                  Loadb  gateway  opt2    192.168.100.1  (this is a cable modem and we need to change the monitor ip)
                                                  wan    a.b.c.1

                  Failoverlb gateway opt2    192.168.100.1
                                                  wan        a.b.c.1

                  top:
                  last pid: 52013;  load averages:  0.25,  0.12,  0.09  up 0+02:29:12    11:31:10
                  32 processes:  1 running, 31 sleeping

                  Mem: 32M Active, 8572K Inact, 25M Wired, 14M Buf, 114M Free
                  Swap:

                  PID USERNAME  THR PRI NICE  SIZE    RES STATE    TIME  WCPU COMMAND
                  51634 root        1  -8    0 14796K 12004K piperd  0:03  4.07% php
                    861 root        1  4    0  3328K  2464K kqread  0:14  0.00% lighttpd
                  6391 root        5  20    0  1908K  1128K kserel  0:08  0.00% slbd
                    866 root        1  4    0 22384K 19896K accept  0:05  0.00% php
                    250 root        1  96    0  1440K  1072K select  0:04  0.00% syslogd
                  1129 root        1  8  20  1768K  1208K wait    0:03  0.00% sh
                    366 root        1 -58    0  4208K  2492K bpf      0:02  0.00% tcpdump
                    367 root        1  -8    0  1276K  728K piperd  0:02  0.00% logger
                    919 nobody      1  96    0  1472K  1108K select  0:02  0.00% dnsmasq
                  1274 root        1  8  20  1272K  716K nanslp  0:01  0.00% check_reload_status
                  1155 dhcpd      1  96    0  2268K  1892K select  0:00  0.00% dhcpd
                    285 root        1  96    0  2804K  1788K select  0:00  0.00% mpd
                  1245 _ntp        1  96    0  1340K  1052K select  0:00  0.00% ntpd
                    872 root        1  8    0 14200K  4644K wait    0:00  0.00% php
                    786 proxy      1  4    0  704K  452K kqread  0:00  0.00% pftpx
                    808 proxy      1  4    0  704K  504K kqread  0:00  0.00% pftpx
                  1248 root        1  8    0  1384K  1032K nanslp  0:00  0.00% cron
                    862 root        1  8    0 14200K  4644K wait    0:00  0.00% php

                  The cpu is a VIA (probably a C7 or maybe a C3).

                  1 Reply Last reply Reply Quote 0
                  • W
                    wjs
                    last edited by

                    I've been trying to execute the killall slbd command. In its defaul state (sending the TERM command) the processes don't exit. The killall -9 slbd command does seem to work (it is kill -9 !!! )

                    Anyway, anyone have any advice on changing the script to killall -9 slbd?

                    i'm not sure that this is a good idea…

                    1 Reply Last reply Reply Quote 0
                    • S
                      sai
                      last edited by

                      maybe you should reinstall using the latest image? probably some update went wrong which is why the script you have is not working.

                      1 Reply Last reply Reply Quote 0
                      • W
                        wjs
                        last edited by

                        This install was done to a clean hard drive and configured from scratch. I currently have RC3 installed also.

                        I'm not sure what you're recommending. Would you like me to put a newer snapshot on?

                        1 Reply Last reply Reply Quote 0
                        • S
                          sai
                          last edited by

                          I was recommending a clean install from scratch

                          :)

                          1 Reply Last reply Reply Quote 0
                          • S
                            Superman
                            last edited by

                            I have to say that my system is a complete clean install from the released 1.2RC3, and I'm seeing the same problems. I tried the killall -9 slbd and that worked on my system as well. But the regular script doesn't work at all.

                            1 Reply Last reply Reply Quote 0
                            • W
                              wjs
                              last edited by

                              I think i'm going to try modifying that scrpt and see what happens.

                              I'm not to happy about using kill -9 every 5 houts to fix a problem though…
                              any other ideas?

                              1 Reply Last reply Reply Quote 0
                              • W
                                wjs
                                last edited by

                                I changed the script so that it kill -9's
                                I ran it by hand and it worked. now its time to wait a few hours and see if the problem is "fixed".

                                $ cat /usr/local/sbin/reset_slbd.sh

                                #!/bin/sh
                                
                                if [ `ps awux | grep slbd | wc -l` -gt 0 ]; then
                                	killall slbd
                                	killall -9 slbd
                                	/usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
                                fi
                                
                                1 Reply Last reply Reply Quote 0
                                • S
                                  SurfceS
                                  last edited by

                                  I have the same problem here…

                                  http://forum.pfsense.org/index.php/topic,6852.0.html

                                  I have two boxes, I change the script on the main one, and let the other one with the old script.

                                  After few hour, I can see only 2 slbd processes on the main one, and 14 one the second one... So, that did the trick.

                                  Script was changed as this, as the second killall command is not needed.

                                  #!/bin/sh

                                  if [ ps awux | grep slbd | wc -l -gt 0 ]; then
                                  killall -9 slbd
                                  killall slbd
                                  /usr/local/sbin/slbd -c/var/etc/slbd.conf -r5000
                                  fi

                                  Regards,

                                  1 Reply Last reply Reply Quote 0
                                  • W
                                    wjs
                                    last edited by

                                    Things appear stable although I have very little traffic being routed through the pool.
                                    It looks like this 'fix' works.

                                    I'm definatly not a developer but should something like this be considered for integration into the source tree?

                                    I'm not sure who to talk to even to mention this…

                                    1 Reply Last reply Reply Quote 0
                                    • W
                                      wjs
                                      last edited by

                                      I've started moving more and more traffic back into the load balancing pools…
                                      SLBD is getting stuck at full usage again! even with the modified script

                                      I think the script isn't being run often enough.
                                      This is rapidly turning into less of a fix and more of a workaround. I want to make this right.

                                      Is anyone having this problem still? or am I going nuts??

                                      I currently have the majority of my traffic going into my primary wan port without going through a pool. The rest (light web browsing from a few users) goes into a pool which has its own two wan ports.

                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        cmb
                                        last edited by

                                        It's not a fix, it's a work around until we can properly test and implement an alternative to slbd. We know what the problem is, unfortunately it's pretty much impossible to solve. The solution is ditching slbd for hoststated, which will be done in a future version.

                                        1 Reply Last reply Reply Quote 0
                                        • C
                                          cmb
                                          last edited by

                                          Also, this work around does seem to work for the vast majority of people.

                                          wjs: how much load are you pushing to cause it to break down so easily?

                                          1 Reply Last reply Reply Quote 0
                                          • W
                                            wjs
                                            last edited by

                                            cmb,
                                            Thanks very much for pushing that change, I saw it on the cvs track.

                                            right now its only my web browsing that going into the wan pool. the primary wan port, which is not part of the pool at the moment, has a good bit of traffic. last night we had about 1MB/s continuous sometimes going up to about 10MB/s when someone would pull down something big.

                                            The cpu load hovers under 15% or 20% but i think most of that is because i've got the whole dashboard open.
                                            I am only getting one process at a time maxing out before the script kicks in so the system never goes full load. (dual cpu system)

                                            I'm not sure this answered your question…

                                            If there is anything I can do to help get "hoststated" working for the next version let me know.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.