Squid 100% CPU every hour since it's been started



  • Hi all,

    I'm using pfSense 2.2.2 and I've recently installed Squid 3.4.10_2 and set it to transparent mode. The problem I'm facing is every hour since the process has started, Squid starts using 100% CPU for about 2.5 minutes. While this is happening Squid stops responding and no one can browse the internet, then as the process starts dropping down in CPU everything starts working again… until a hour later like clockwork.

    While this happens, if I try using "squidclient mgr:info" I get the message saying "Sending HTTP request ... done." and it hangs until the process becomes responsive again and then I receive a message saying "Alarm Clock".

    So obviously this alarm clock is going off every hour to tell the process to do something, the problem is I have no idea what. Can anyone shed some light on this, or know what I can try to figure out what Squid is doing?

    Also in the cache.log, at the time Squid starts working again, I get a message saying "Select loop Error. Retry 1".

    I do have a fairly large cache, at 50GB. The reason being is because I'm caching both Windows and Apple updates. Could Squid be doing some kind of hourly check on this and struggling with a cache that large, or the drive speed (it's a Samsung SSD mSATA drive in an APU box)? I switched the cache from UFS to AUFS in the hope this was the case, but it hasn't appeared to make any difference.

    I guess at this point I'm just trying to figure out what exactly Squid is doing every hour that's causing this.

    Thanks!
    Matt



  • This question has been repeatedly asked, but there has never been a single response.  It is consistent and repeatable across installations.

    It has been a problem on every version of pfSense since I started using pfSense 1.9 in 2010, including current (Nov.2015) releases.

    Squid by iteself does not have the issue (or is not as noticable).  If Squidguard is enabled, then every hour (exact time depends on when squidguard was started), the Squid process goes to 100% cpu for 30-40 seconds.  During this time, no internet traffic is passed.

    Also, if ShallaBlack list is enabled, the hang is 2-3 times longer.

    For the same reasons as above, I originally had the hd cache set for 100GB.  But, I have tried reducing this all the way down to 500MB.

    PS, this is running on a very capable Dell R210 with dual core Xeon, 16GB RAM and a 500GB Black drive.



  • i have set the "Memory Cache Size" to 16 MB.

    This fixes the problem with the internet hang every hour.

    My cache is now 50GB of total of 120GB.



  • This is the first reasonable answer I have heard.  THANKS!

    I tried greatly reducing the HD cache size to 1gb and RAM to 500mb.  That helped, but still 20-25s every hour.

    I will try a very small RAM cache….. Although, that is partly why I bought 16gb of RAM.



  • It took a couple days for the problem to come back, but it DOES still happen.  However…. The hang time is greatly reduced to about 10 seconds, but it is still present.

    Squid settings - Local HD Cache 1000Mb (UFS), Max Object size=4mb.  Local RAM Cache 1000Mb, Max Obj size=256kb.

    I have just reduced the Ram to 10Mb and max obj size to 100Kb.

    I will post any updates, but it will probably take a few days to get a consistent result.



  • Hello,

    Do you use any auth helper? NTLM auth- something like this ?


  • Netgate

    @duanes:

    It took a couple days for the problem to come back, but it DOES still happen.  However…. The hang time is greatly reduced to about 10 seconds, but it is still present.

    Squid settings - Local HD Cache 1000Mb (UFS), Max Object size=4mb.  Local RAM Cache 1000Mb, Max Obj size=256kb.

    I have just reduced the Ram to 10Mb and max obj size to 100Kb.

    I will post any updates, but it will probably take a few days to get a consistent result.

    For the record: b = bits B = bytes which is it?



  • Sorry B for Bytes.



  • No authentication is used for the proxy.  All users have the same access level.

    So…..... After a few days of running, I am not seeing any of the hangs like we were.  Key points are very small HD and RAM cache and the high/low water marks are set very close together (95/94%).

    I am not going to boost the HD cache to 80GB, max HD cache object to 1000MB.  I give it some time and see what happens.



  • Hello!

    -improved?
    I've the same problem a long time ago ..
    Yet the latest version of everything I use

    "2016/01/04 07:43:14 kid1| Select loop Error. Retry 1
    2016/01/04 08:43:14 kid1| Select loop Error. Retry 1
    2016/01/04 09:43:14 kid1| Select loop Error. Retry 1
    2016/01/04 10:43:13 kid1| Select loop Error. Retry 1
    "



  • Yes - The HD cache seems to work fine, however if I try to use the RAM cache in any meaningful capacity, then I get the same problems.

    Also, the problem DOES seem to appear again if the HD cache is too large.  My system is a Dell i210 (not a heavy duty machine, but has decent IO).  I have limited the HD cache to about 50GB.  The trick is to ensure that the highwater and lowwater amounts are only 1 digit apart.  Mine are set at 95% and 94%.  The hang seems to be in clearing the cache and it is VERY intrusive.

    Finally, I have the ram cache set to 1MB of RAM and max cache size of 1kb (minimum settings).  I have 16GB of RAM available, but any time I start increasing RAM cache and max item size to 1MB, the hang starts showing up in a matter of hours (or a few days for a 4GB ram cache setting).  I believe it to be the same process in which old items are flushed hourly, however, it is EXCRUCIATINGLY slow, even though it is a RAM based operation.

    So, I've backed my physical RAM down to 4GB, set the RAM cache to 1MB and max size to 1KB.  The HD is 100GB, but I have HD cache set for 50GB with 1GB as the max item size (This will cache a virtually all SW updates).  I have the cache policy set to keep the largest items longer and use diskd as the drive access method.

    So far, the hangs have not returned after 3+ weeks of operation.



  • It now looks like the options

    My ""Proxy Server: Cache Management"" settings:

    Squid Cache General Settings:
    Low-Water Mark in % = 93
    High-Water Mark in % = 95

    Squid Hard Disk Cache Settings:
    Hard Disk Cache Size = 60000
    Hard Disk Cache System = ufs
    Level 1 Directories = 8
    Minimum Object Size = 32
    Maximum Object Size = 256

    Squid Memory Cache Settings:
    Memory Cache Size = 5120
    Maximum Object Size in RAM = 512
    Memory Replacement Policy = Heap GDSF

    Exactly what you have set?

    thx



  • I've been working on this for quite some time -

    I started getting the hangs again when I upped the RAM Cache, so I keep the RAM size to an absolute minimum. (which is sad, because I specifically bought a ton of RAM thinking it would be better then HD cache).  Also, Low/High water marks need to be as close together as possible.  Apparently, the hourly trash collection process is run at a high priority and prevents all other activity or maybe locks something.  Either way, it is very intrusive.

    These settings have been running without the hang for 33 days.

    Squid Cache General Settings:
    Low-Water Mark in % = 94
    High-Water Mark in % = 95

    Squid Hard Disk Cache Settings:
    Cache Replacement Policy: Heap LFUDA
    Hard Disk Cache Size = 80000
    Hard Disk Cache System = diskd
    Level 1 Directories = 16
    Minimum Object Size = 0
    Maximum Object Size = 2000

    Squid Memory Cache Settings:
    Memory Cache Size = 300
    Maximum Object Size in RAM = 4
    Memory Replacement Policy = Heap GDSF



  • Hi,
    I have the exactly same problem with pfSense 2.2.6/squid3 in transparent mode/squidguard and squid virus check.
    I bought the pfSense XG-1540 with 2x128GB SSD HDD, 32GB RAM, 2x10GE, 6x1GE and one 1TB USB 3.0 external SSD drive (only used for squid cache).
    After changeing the Squid memory cache settings to

    Memory Cache Size = 300  (before 8092)
    Maximum Object Size in RAM = 4 (before 1024)

    the problem was fixed.

    But now the appliance uses only 4GB RAM….
    The performance is great, because of the SSD cache HDD.

    Sincerely
    Roman



  • Unfortunately, the problem has not disappeared.
    I tried again yesterday.
    The new pfSense (FreeBSD 10.3-RELEASE) does not help, this problem.

    How interesting that occurs every 60 minutes.

    Sincerely
    kemecs



  • I am pretty certain that this is an hourly garbage collection issue in Squid and there is no way to over come it.  For some reason the garbage collection is a blocking thread and stops all network traffic.  Additionally, all existing connections are dropped.  Finally, the collection is based on every 60 minutes from the boot time.

    I do see that having a memory cache of ANY size greatly increases the hang time.  There are a number of complaints about this around on the internet, but none of the responders seem to really grasp the problem.

    I have also found that I had to limit my squid HD cache size to about 40GB.  I wanted a larger cache to hold all of the MS updates, AV updates and other various files that tend to be large and repetitive.  Alas, I believe that I am stuck with the problem for now.



  • same problem, any chance this garbage collecting process can be set to low priority?
    100.00% (squid-1) -f /usr/local/etc/squid/squid.co

    Looks to be a bug, can someone post a new bug for this issue here https://redmine.pfsense.org/projects/pfsense-packages

    I have posted the bug here https://redmine.pfsense.org/issues/6485



  • For hard disk cache once it reaches about 30GB of 200GB squid starts pulling a high load

    
    last pid: 34059;  load averages:  0.65,  0.87,  0.87  up 10+04:05:45    22:20:31
    327 processes: 3 running, 307 sleeping, 17 waiting
    Mem: 127M Active, 2984M Inact, 450M Wired, 3688K Cache, 336M Buf, 349M Free
    
      PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       11 root     155 ki31     0K    32K CPU0    0 236.6H  75.00% [idle{idle: cpu0}]
       11 root     155 ki31     0K    32K RUN     1 236.2H  68.16% [idle{idle: cpu1}]
     4349 squid     37    0   191M 89032K kqread  1  59:26  12.35% (squid-1) -f /usr/local/etc/squid/squid.co
     6952 root      52    0   262M 28748K piperd  1   0:01   5.57% php-fpm: pool nginx (php-fpm)
    10213 root      52    0   262M 29140K accept  0   0:00   2.29% php-fpm: pool nginx (php-fpm)
    79057 squid     20    0 37660K 13416K sbwait  0   0:03   0.29% (squidGuard) -c /usr/local/etc/squidGuard/
       12 root     -92    -     0K   272K WAIT    0  34:56   0.00% [intr{irq260: re1}]
       12 root     -92    -     0K   272K WAIT    1  25:14   0.00% [intr{irq261: re2}]
       12 root     -60    -     0K   272K WAIT    0  20:42   0.00% [intr{swi4: clock}]
       19 root      16    -     0K    16K syncer  0  10:05   0.00% [syncer]
        5 root     -16    -     0K    16K pftm    0   8:07   0.00% [pf purge]
       15 root     -16    -     0K    16K -       0   2:44   0.00% [rand_harvestq]
     4223 unbound   20    0 55640K 26308K kqread  0   2:40   0.00% /usr/local/sbin/unbound -c /var/unbound/un
    26898 root      20    0 30140K 17968K select  1   2:19   0.00% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.c
    23145 root      20    0 28608K  6416K kqread  0   2:04   0.00% nginx: worker process (nginx)
     6187 squid     20    0 37752K  3544K select  0   2:00   0.00% (pinger) (pinger)
    60390 squid     20    0 37752K  3544K select  0   1:48   0.00% (pinger) (pinger)
    40522 squid     20    0 37752K  3544K select  0   1:48   0.00% (pinger) (pinger)
    
    

    going to try and set a 20GB cache

    update
    High load stops with 20GB cache, raising it to 30GB, will see at what point the issue starts

    Load averages: 0.24, 0.19, 0.08



  • looks like this issue has been posted in squid
    http://bugs.squid-cache.org/show_bug.cgi?id=4477