[solved] 2.0.3 to 2.1 release - squid dies

Ecnerwal

Hmmm - what are folks doing to post pretty pictures here? Hosting them elsewhere, I suppose. Also, goodnight ;-)

The hard way: Hard disk cache size 320000 MB
hard disk cache system ufs
hard disk cache location /squid/cache
memory cache size 7168 (was 8192)
min object size 0
max object size 4000000
max object size in RAM 3200
level 1 subdirectories 128
memory replacement Heap LFUDA
cache replacement Heap LFUDA
low-water % 96
high water % 98

kejianshi

HDD cache size is too big (assuming its being filled to that level)

with your RAM, you should reduce it to about 120000 (very safe).

Also be sure not to dedicate more than half of your RAM to RAM cache.

The reason you can't just make the HDD cache however big you have space for is that it take RAM to index HDD cache.

If you actually fill all that cache, at some point between 120000MB and say 200000MB, you will probably exhaust your RAM and soon after, your swap.

Check to see how much space is has actually been used so far.

df -h

Ecnerwal

8GB is only half the RAM. But I've started de-tuning, dropping to 7. I suppose I'll probably start reducing disk cache as well, but the math isn't there, or the squid docs are wrong.

df -h output is in first post. 167GB used on that date

Dashboard claim of swap usage (only place I see that reported) is typically 0-12%, and memory cache size (while in 2.0.3) was tuned while keeping an eye on it (aiming for 0).

per squid docs, 320GB of cache should require 4.5 GB of index on 64 bit system at 14MB/GB. An additional 112MB is needed to index 8GB in RAM cache. 13GB or so total, leaving 3GB of 16 for everything else, if it were even full (it's not)

at 167GB, memory index should be less than 2.5 GB, which comes out less than 11GB for memory index and 8GB RAM cache and 112MB memory cache index.

Yes, mention is made of "hot objects" exceeding cache allocation. But I've got 3GB (5, supposedly, at current cache fill) of RAM and the swap disk if things are really going to get exciting, and I find it hard to believe that the full 32GB of swap has been used up, given that swap use is normally 0, or very close to it, and I've never seen it above 12%

Mostly, what I notice is that 2.0.3 handled the exception "gracefully" while 2.1 kills squid, does not restart it, and runs like cold molasses (ie, very slowly) if I restart squid without rebooting the whole system…

kejianshi

OK - But my install was blowing up even on 2.03 with excessively large HDD cache. When I started allocating no more than 20x my available RAM to HDD cache my system became very reliable. And that was the only change I made, so that was for sure the issue. I calculate available RAM for me to be no more than 1/2 total RAM since I dedicate 1/2 the total RAM to RAM cache already.

For me anyway. Thats also why I switched to a 64GB SSD. I realized I had no practical use for more HDD unless I also seriously upgraded my RAM.

I'm on 2.1 now and its rock solid with these same settings.

Uptime 12 Days 20 Hours 37 Minutes 19 Seconds (not that thats amazing long, but its long enough to suspect there are no problems)

kejianshi

P.S. "167GB used on that date" - Yeah. That probably means you are caching lots of really small objects also. In that case, even a 120GB HDD cache might put you close to the ragged edge.

I'd have figured somewhere in that range it would crash or at least start swapping really annoyingly.

1 easy fix is double your RAM if you a married to a really big HDD cache.

Ecnerwal

kejianshi - What is your RAM amount?

Back on the network in question, updated numbers:

182 GB this morning.

78% memory use, 0% swap per dashboard (this is at 7GB RAM cache still, down one from the previous setup, but only up for 25 hours or so since reboot)

Dropped disk cache size to 220GB, moved min object size on disk up to 4, dropped the watermarks 1% each.

We'll see how that plays out, and reduce further if need be. Or buy more RAM, but to some extent my focus is on getting the maximum use from what I have (quite a bit) rather than simply throwing more at it.

I've found very little on tuning things like the number of level 1 subdirectories - the built-in commentary is rather limited on the subject, and most other things I can find about squid are similar. Likewise choice of hard disk cache system…or min object size on disk, for that matter.

kejianshi

Oddly - I have found that if anything 2.1 is easier on my memory. Doesn't go up in the 80% very often now and the disk cache seems to purge alot more than it used to.

I'm only running 4GB of ram on this one, but the math on cache is simple and fairly constant.

For you, I'd keep an eye on exactly when you start having problems. I wouldn't set the HDD cache size to prevent the system from running out of swap. I'd set it up to avoid swapping at all. My swap size is 8GB but if I were using even 1KB of that, I'd assume something wasn't optimal and adjust down.

Ecnerwal

I have cut down both disk and RAM cache, and it's at least not crapping out entirely anymore or so far. But it has made me look in the logs a bit more.

One thing I noticed in the logs was a complaint about number of url_rewrite_children, so I bumped that from 5 to 50 (custom options in the GUI - I was going to mess about with the conf file but noticed that it was already in there set to 5, so I set it to 50) but it does not appear to be working - in that I'm still getting complaints that would make it seem it's set to 5. 50 is what it says in /usr/local/etc/squid/squid.conf, however, so the option is being passed from the GUI.

The main system log has this:
Oct 9 09:47:43 squid[33282]: The url_rewriter helpers are crashing too rapidly, need help!

and at the time of those, this in squid cache log

2013/10/09 09:47:43| WARNING: url_rewriter #5 (FD 22) exited
2013/10/09 09:47:43| WARNING: url_rewriter #3 (FD 18) exited
2013/10/09 09:47:43| WARNING: url_rewriter #4 (FD 19) exited
2013/10/09 09:47:43| Too few url_rewriter processes are running
FATAL: The url_rewriter helpers are crashing too rapidly, need help!

While the squid cache log has lots of this (and```
url_rewrite_children 50

2013/10/11 08:34:47| WARNING: All url_rewriter processes are busy.
2013/10/11 08:50:59| WARNING: All url_rewriter processes are busy.
2013/10/11 08:53:10| WARNING: All url_rewriter processes are busy.
2013/10/11 08:53:10| Consider increasing the number of url_rewriter processes to at least 16 in your config file.
2013/10/11 08:54:04| WARNING: All url_rewriter processes are busy.
2013/10/11 08:54:04| Consider increasing the number of url_rewriter processes to at least 15 in your config file.

kejianshi

Today my RAM used is at 90%. Still haven't hit the swap ever and no lockups or crashes. For me, pfsense works like a champ so long as I don't get silly with the cache settings. Its been 25 days since a restart, shutdown or reboot.

I'm glad yours is working well now.

kejianshi

BTW, what is your RAM and what are your current cache settings? (Incase someone later wants to know what is working)

Ecnerwal

RAM is 16 GB, as stated in my signature (and a few times higher in the thread), as it has been since I actually deployed in February 2013 (two identical systems on different networks, second one deployed in July 2013 - some i386 fiddling before building the 64bit systems for serious use.) One is still running 2.0.3

Cache detuned to 7GB RAM and 160 GB disk with the post-2.1 troubles.
Typical daily throughput in the 30GB range, 150-175 users (per lightsquid - about 100 people, but people with multiple devices get counted twice or three times)

I am a bit disturbed by the url_rewrite_chrildren being set to 50, but asking to be set to 15-16 as though they are still at 5. 50 was picked since the largest number I found was 52, while most were suggesting the teens, so it seemed like it would cover most of the issues with that - but it's not clear that it's "taking."

I also added the vm.pmap.shpgperproc tunable in system tunables due to complaints in the main system log about "Approaching the limit on PV entries" and raised it from 200 to 500.

$ cat /usr/local/etc/squid/squid.conf
# Do not edit manually !
http_port 172.XX.XX.1:3128
http_port 127.0.0.1:3128 transparent
icp_port 0

pid_filename /var/run/squid.pid
cache_effective_user proxy
cache_effective_group proxy
error_directory /usr/pbi/squid-amd64/etc/squid/errors/en
icon_directory /usr/pbi/squid-amd64/etc/squid/icons
visible_hostname localhost
cache_mgr admin@localhost
access_log /var/squid/log/access.log
cache_log /var/squid/log/cache.log
cache_store_log none
logfile_rotate 14
shutdown_lifetime 3 seconds
# Allow local network(s) on interface(s)
acl localnet src  172.XX.XX.0/255.255.192.0
uri_whitespace strip

cache_mem 7200 MB
maximum_object_size_in_memory 3200 KB
memory_replacement_policy heap LFUDA
cache_replacement_policy heap LFUDA
cache_dir aufs /squid/cache 160000 128 256
minimum_object_size 4 KB
maximum_object_size 4000000 KB
offline_mode off
cache_swap_low 50
cache_swap_high 80

# No redirector configured

# Setup some default acls
acl all src 0.0.0.0/0.0.0.0
acl localhost src 127.0.0.1/255.255.255.255
acl safeports port 21 70 80 210 280 443 488 563 591 631 777 901  3128 1025-65535 
acl sslports port 443 563  
acl manager proto cache_object
acl purge method PURGE
acl connect method CONNECT
acl dynamic urlpath_regex cgi-bin \?
cache deny dynamic
http_access allow manager localhost

# Allow external cache managers
acl ext_manager_1 src 127.0.0.1 
http_access allow manager ext_manager_1
acl ext_manager_2 src 172.XX.XX.1 
http_access allow manager ext_manager_2

http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !safeports
http_access deny CONNECT !sslports

# Always allow localhost connections
http_access allow localhost

quick_abort_min 4000 KB
quick_abort_max 0 KB
quick_abort_pct 40
request_body_max_size 0 KB
reply_body_max_size 0 deny all
delay_pools 1
delay_class 1 2
delay_parameters 1 -1/-1 -1/-1
delay_initial_bucket_level 100
# Throttle extensions matched in the url
acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl"
delay_access 1 allow throttle_exts
delay_access 1 deny all

# Custom options
redirect_program /usr/pbi/squidguard-amd64/bin/squidGuard -c /usr/pbi/squidguard-amd64/etc/squidGuard/squidGuard.conf
redirector_bypass off
url_rewrite_children 50
# Setup allowed acls
# Allow local network(s) on interface(s)
http_access allow localnet
# Default block all to be sure
http_access deny all