Squid 100% CPU every hour since it's been started

cracker1985

Hello,

Do you use any auth helper? NTLM auth- something like this ?

Derelict

It took a couple days for the problem to come back, but it DOES still happen. However…. The hang time is greatly reduced to about 10 seconds, but it is still present.

Squid settings - Local HD Cache 1000Mb (UFS), Max Object size=4mb. Local RAM Cache 1000Mb, Max Obj size=256kb.

I have just reduced the Ram to 10Mb and max obj size to 100Kb.

I will post any updates, but it will probably take a few days to get a consistent result.

For the record: b = bits B = bytes which is it?

duanes

Sorry B for Bytes.

duanes

No authentication is used for the proxy. All users have the same access level.

So…..... After a few days of running, I am not seeing any of the hangs like we were. Key points are very small HD and RAM cache and the high/low water marks are set very close together (95/94%).

I am not going to boost the HD cache to 80GB, max HD cache object to 1000MB. I give it some time and see what happens.

kemecs

Hello!

-improved?
I've the same problem a long time ago ..
Yet the latest version of everything I use

"2016/01/04 07:43:14 kid1| Select loop Error. Retry 1
2016/01/04 08:43:14 kid1| Select loop Error. Retry 1
2016/01/04 09:43:14 kid1| Select loop Error. Retry 1
2016/01/04 10:43:13 kid1| Select loop Error. Retry 1
"

duanes

Yes - The HD cache seems to work fine, however if I try to use the RAM cache in any meaningful capacity, then I get the same problems.

Also, the problem DOES seem to appear again if the HD cache is too large. My system is a Dell i210 (not a heavy duty machine, but has decent IO). I have limited the HD cache to about 50GB. The trick is to ensure that the highwater and lowwater amounts are only 1 digit apart. Mine are set at 95% and 94%. The hang seems to be in clearing the cache and it is VERY intrusive.

Finally, I have the ram cache set to 1MB of RAM and max cache size of 1kb (minimum settings). I have 16GB of RAM available, but any time I start increasing RAM cache and max item size to 1MB, the hang starts showing up in a matter of hours (or a few days for a 4GB ram cache setting). I believe it to be the same process in which old items are flushed hourly, however, it is EXCRUCIATINGLY slow, even though it is a RAM based operation.

So, I've backed my physical RAM down to 4GB, set the RAM cache to 1MB and max size to 1KB. The HD is 100GB, but I have HD cache set for 50GB with 1GB as the max item size (This will cache a virtually all SW updates). I have the cache policy set to keep the largest items longer and use diskd as the drive access method.

So far, the hangs have not returned after 3+ weeks of operation.

kemecs

It now looks like the options

My ""Proxy Server: Cache Management"" settings:

Squid Cache General Settings:
Low-Water Mark in % = 93
High-Water Mark in % = 95

Squid Hard Disk Cache Settings:
Hard Disk Cache Size = 60000
Hard Disk Cache System = ufs
Level 1 Directories = 8
Minimum Object Size = 32
Maximum Object Size = 256

Squid Memory Cache Settings:
Memory Cache Size = 5120
Maximum Object Size in RAM = 512
Memory Replacement Policy = Heap GDSF

Exactly what you have set?

thx

duanes

I've been working on this for quite some time -

I started getting the hangs again when I upped the RAM Cache, so I keep the RAM size to an absolute minimum. (which is sad, because I specifically bought a ton of RAM thinking it would be better then HD cache). Also, Low/High water marks need to be as close together as possible. Apparently, the hourly trash collection process is run at a high priority and prevents all other activity or maybe locks something. Either way, it is very intrusive.

These settings have been running without the hang for 33 days.

Squid Cache General Settings:
Low-Water Mark in % = 94
High-Water Mark in % = 95

Squid Hard Disk Cache Settings:
Cache Replacement Policy: Heap LFUDA
Hard Disk Cache Size = 80000
Hard Disk Cache System = diskd
Level 1 Directories = 16
Minimum Object Size = 0
Maximum Object Size = 2000

Squid Memory Cache Settings:
Memory Cache Size = 300
Maximum Object Size in RAM = 4
Memory Replacement Policy = Heap GDSF

sporkrom

Hi,
I have the exactly same problem with pfSense 2.2.6/squid3 in transparent mode/squidguard and squid virus check.
I bought the pfSense XG-1540 with 2x128GB SSD HDD, 32GB RAM, 2x10GE, 6x1GE and one 1TB USB 3.0 external SSD drive (only used for squid cache).
After changeing the Squid memory cache settings to

Memory Cache Size = 300 (before 8092)
Maximum Object Size in RAM = 4 (before 1024)

the problem was fixed.

But now the appliance uses only 4GB RAM….
The performance is great, because of the SSD cache HDD.

Sincerely
Roman

kemecs

Unfortunately, the problem has not disappeared.
I tried again yesterday.
The new pfSense (FreeBSD 10.3-RELEASE) does not help, this problem.

How interesting that occurs every 60 minutes.

Sincerely
kemecs

duanes

I am pretty certain that this is an hourly garbage collection issue in Squid and there is no way to over come it. For some reason the garbage collection is a blocking thread and stops all network traffic. Additionally, all existing connections are dropped. Finally, the collection is based on every 60 minutes from the boot time.

I do see that having a memory cache of ANY size greatly increases the hang time. There are a number of complaints about this around on the internet, but none of the responders seem to really grasp the problem.

I have also found that I had to limit my squid HD cache size to about 40GB. I wanted a larger cache to hold all of the MS updates, AV updates and other various files that tend to be large and repetitive. Alas, I believe that I am stuck with the problem for now.

aGeekhere

same problem, any chance this garbage collecting process can be set to low priority?
100.00% (squid-1) -f /usr/local/etc/squid/squid.co

Looks to be a bug, can someone post a new bug for this issue here https://redmine.pfsense.org/projects/pfsense-packages

I have posted the bug here https://redmine.pfsense.org/issues/6485

aGeekhere

For hard disk cache once it reaches about 30GB of 200GB squid starts pulling a high load


last pid: 34059;  load averages:  0.65,  0.87,  0.87  up 10+04:05:45    22:20:31
327 processes: 3 running, 307 sleeping, 17 waiting
Mem: 127M Active, 2984M Inact, 450M Wired, 3688K Cache, 336M Buf, 349M Free

  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   11 root     155 ki31     0K    32K CPU0    0 236.6H  75.00% [idle{idle: cpu0}]
   11 root     155 ki31     0K    32K RUN     1 236.2H  68.16% [idle{idle: cpu1}]
 4349 squid     37    0   191M 89032K kqread  1  59:26  12.35% (squid-1) -f /usr/local/etc/squid/squid.co
 6952 root      52    0   262M 28748K piperd  1   0:01   5.57% php-fpm: pool nginx (php-fpm)
10213 root      52    0   262M 29140K accept  0   0:00   2.29% php-fpm: pool nginx (php-fpm)
79057 squid     20    0 37660K 13416K sbwait  0   0:03   0.29% (squidGuard) -c /usr/local/etc/squidGuard/
   12 root     -92    -     0K   272K WAIT    0  34:56   0.00% [intr{irq260: re1}]
   12 root     -92    -     0K   272K WAIT    1  25:14   0.00% [intr{irq261: re2}]
   12 root     -60    -     0K   272K WAIT    0  20:42   0.00% [intr{swi4: clock}]
   19 root      16    -     0K    16K syncer  0  10:05   0.00% [syncer]
    5 root     -16    -     0K    16K pftm    0   8:07   0.00% [pf purge]
   15 root     -16    -     0K    16K -       0   2:44   0.00% [rand_harvestq]
 4223 unbound   20    0 55640K 26308K kqread  0   2:40   0.00% /usr/local/sbin/unbound -c /var/unbound/un
26898 root      20    0 30140K 17968K select  1   2:19   0.00% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.c
23145 root      20    0 28608K  6416K kqread  0   2:04   0.00% nginx: worker process (nginx)
 6187 squid     20    0 37752K  3544K select  0   2:00   0.00% (pinger) (pinger)
60390 squid     20    0 37752K  3544K select  0   1:48   0.00% (pinger) (pinger)
40522 squid     20    0 37752K  3544K select  0   1:48   0.00% (pinger) (pinger)

going to try and set a 20GB cache

update
High load stops with 20GB cache, raising it to 30GB, will see at what point the issue starts

Load averages: 0.24, 0.19, 0.08

aGeekhere

looks like this issue has been posted in squid
http://bugs.squid-cache.org/show_bug.cgi?id=4477