[solved] 2.0.3 to 2.1 release - squid dies

Ecnerwal

Decided to upgrade my less-critical 2.0.3 system now that 2.1 release is out, and squid (2.7.9) did not survive the process. Hardware is in my signature, amd64 version on both releases.

Reinstalled (package), no better.

Removed and installed (package), no better.

Tried squid3, killed internet access until removed, didn't run either.
System log from most recent 2.7.9 install attempt:

Sep 21 12:07:42	php: /pkg_edit.php: Starting Squid
Sep 21 12:07:42	squid[48092]: Error Directory /usr/pbi/squid-amd64/etc/squid/errors/en: (2) No such file or directory
Sep 21 12:07:42	php: /pkg_edit.php: The command '/usr/pbi/squid-amd64/sbin/squid -D' returned exit code '6', the output was '2013/09/21 12:07:42| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" FATAL: Error Directory /usr/pbi/squid-amd64/etc/squid/errors/en: (2) No such file or directory Squid Cache (Version 2.7.STABLE9): Terminated abnormally. CPU Usage: 0.003 seconds = 0.003 user + 0.000 sys Maximum Resident Size: 3668 KB Page faults with physical i/o: 0'
Sep 21 12:07:42	kernel: pid 48092 (squid), uid 0: exited on signal 6
Sep 21 12:07:52	check_reload_status: Reloading filter
Sep 21 12:07:54	php: rc.filter_configure_sync: SQUID is installed but not started. Not installing "nat" rules.
Sep 21 12:07:54	php: rc.filter_configure_sync: SQUID is installed but not started. Not installing "pfearly" rules.
Sep 21 12:07:54	php: rc.filter_configure_sync: SQUID is installed but not started. Not installing "filter" rules.
Sep 21 12:08:02	php: /status_services.php: The command '/usr/local/etc/rc.d/squid.sh stop' returned exit code '2', the output was '/usr/local/etc/rc.d/squid.sh: 10: Syntax error: "}" unexpected (expecting "then")'

/var/squid/log:

2013/09/21 12:06:29| Starting Squid Cache version 2.7.STABLE9 for amd64-portbld-freebsd8.3...
2013/09/21 12:06:29| Process ID 92872
2013/09/21 12:06:29| With 11095 file descriptors available
2013/09/21 12:06:29| Using kqueue for the IO loop
2013/09/21 12:06:29| DNS Socket created at 0.0.0.0, port 51902, FD 11
2013/09/21 12:06:29| Adding domain local from /etc/resolv.conf
2013/09/21 12:06:29| Adding nameserver 127.0.0.1 from /etc/resolv.conf
2013/09/21 12:06:29| Adding nameserver 8.8.8.8 from /etc/resolv.conf
2013/09/21 12:06:29| Adding nameserver 8.8.4.4 from /etc/resolv.conf
2013/09/21 12:06:29| Referer logging is disabled.
2013/09/21 12:06:29| logfileOpen: opening log /var/squid/logs/access.log
2013/09/21 12:06:29| Unlinkd pipe opened on FD 17
2013/09/21 12:06:29| Swap maxSize 102400 + 8192 KB, estimated 8507 objects
2013/09/21 12:06:29| Target number of buckets: 425
2013/09/21 12:06:29| Using 8192 Store buckets
2013/09/21 12:06:29| Max Mem  size: 8192 KB
2013/09/21 12:06:29| Max Swap size: 102400 KB
2013/09/21 12:06:29| Local cache digest enabled; rebuild/rewrite every 3600/3600 sec
2013/09/21 12:06:29| logfileOpen: opening log /var/squid/logs/store.log
2013/09/21 12:06:29| /var/squid/cache/00: (2) No such file or directory
FATAL: 	Failed to verify one of the swap directories, Check cache.log
	for details.  Run 'squid -z' to create swap directories
	if needed, or if running Squid for the first time.
Squid Cache (Version 2.7.STABLE9): Terminated abnormally.
CPU Usage: 0.005 seconds = 0.000 user + 0.005 sys
Maximum Resident Size: 4672 KB
Page faults with physical i/o: 0

Interestingly, I have, and the proxy cache configuration GUI page confirms, my cache set to /squid/cache, not /var/squid/cache. So I don't know why the squid log is reporting squid looking in /var/squid/cache for anything. It appears to be getting the other settings as set in the GUI
/squid/cache/00 (through 7F) exist. /var/squid is only used for the logs.

I tried symlinking /var/squid/cache to /squid/cache, no better.

My partition scheme, which is why I don't use the default arrangement:```
$ df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/ad4s1a 70G 484M 64G 1% /
devfs 1.0k 1.0k 0B 100% /dev
/dev/ad4s1d 9.7G 1.4G 7.5G 16% /var
/dev/ad4s1e 989M 12M 898M 1% /tmp
/dev/ad4s1f 339G 167G 144G 54% /squid
/dev/md0 3.6M 42k 3.3M 1% /var/run
devfs 1.0k 1.0k 0B 100% /var/dhcpd/dev

Ecnerwal

$ squid -z
2013/09/21 23:26:06| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl"
FATAL: Error Directory /usr/pbi/squid-amd64/etc/squid/errors/en: (2) No such file or directory
Squid Cache (Version 2.7.STABLE9): Terminated abnormally.
CPU Usage: 0.009 seconds = 0.000 user + 0.009 sys
Maximum Resident Size: 3504 KB
Page faults with physical i/o: 0

Lord, love a duck. This seems to be what it wanted. 2.7.9 is running again.

ln -s /usr/pbi/squid-amd64/etc/squid/errors/English /usr/pbi/squid-amd64/etc/squid/errors/en

Ecnerwal

6 days later, and squid was down again (actually went down on the 5th day, I guess). Log reports something on the order of "out of swap space" (settings exactly the same as on 2.0.3, which never had a problem (or never a noticeable problem) and ran for ~3 months without a hitch - 16GB RAM, 32GB dedicated swap partition, as far as I recall.) I'll copy in what the log actually reported when I'm on that network again (this machine lives on the network I've left at 2.0.3 for now, and will until the network I upgraded is actually solid, unless I give up and back-grade that one.)

Restarted squid (dropping the memory cache from 8 to 7 gig) and everything was sluggish as all get out. Barked up a few wrong trees looking for a non-pfSense network problem, then rebooted pfSense and everything hunky again… for now.
Filter on squid: (package restart 9:44, reboot 10:50)

Sep 26 16:19:06	kernel: pid 61679 (squid), uid 62, was killed: out of swap space
Sep 26 16:19:07	squid[61285]: Squid Parent: child process 61679 exited due to signal 9
Sep 27 09:44:47	php: /status_services.php: The command '/usr/local/etc/rc.d/squid.sh stop' returned exit code '1', the output was '2013/09/27 09:44:42| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: Could not send signal 15 to process 61679: (3) No such process'
Sep 27 09:44:49	squid[10615]: Squid Parent: child process 11020 started
Sep 27 10:50:50	squid[10615]: Squid Parent: child process 11020 exited with status 0
Sep 27 10:50:50	php: /reboot.php: The command '/usr/local/etc/rc.d/squid.sh stop' returned exit code '1', the output was '2013/09/27 10:50:45| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl"'
Sep 27 10:52:36	squid[85746]: Squid Parent: child process 86095 started
Sep 27 10:52:36	php: rc.start_packages: The command '/usr/pbi/squid-amd64/sbin/squid -k reconfigure' returned exit code '1', the output was '2013/09/27 10:52:36| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: No running copy'
Sep 27 10:52:36	php: rc.start_packages: The command '/usr/pbi/squid-amd64/sbin/squid -k reconfigure' returned exit code '1', the output was '2013/09/27 10:52:36| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: No running copy'
Sep 27 10:52:36	php: rc.start_packages: Not calling package sync code for dependency squid of squid because some include files are missing.
Sep 27 10:52:36	php: rc.start_packages: The command '/usr/pbi/squid-amd64/sbin/squid -k reconfigure' returned exit code '1', the output was '2013/09/27 10:52:36| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: No running copy'
Sep 27 10:52:36	php: rc.start_packages: The command '/usr/pbi/squid-amd64/sbin/squid -k reconfigure' returned exit code '1', the output was '2013/09/27 10:52:36| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: No running copy'
Sep 27 10:52:36	php: rc.start_packages: The command '/usr/pbi/squid-amd64/sbin/squid -k reconfigure' returned exit code '1', the output was '2013/09/27 10:52:36| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: No running copy'
Sep 27 10:52:36	php: rc.start_packages: The command '/usr/pbi/squid-amd64/sbin/squid -k reconfigure' returned exit code '1', the output was '2013/09/27 10:52:36| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: No running copy'
Sep 27 10:52:38	php: rc.start_packages: Not calling package sync code for dependency squid of squid because some include files are missing.
Sep 27 10:52:41	php: rc.start_packages: The command '/usr/local/etc/rc.d/squid.sh stop' returned exit code '1', the output was '2013/09/27 10:52:36| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: No running copy'
Sep 27 10:52:43	squid[32503]: Squid Parent: child process 32737 started
Sep 27 10:52:43	php: rc.start_packages: The command '/usr/pbi/squid-amd64/sbin/squid -k reconfigure' returned exit code '1', the output was '2013/09/27 10:52:43| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: Could not send signal 1 to process 86095: (3) No such process'
Sep 27 10:52:44	php: rc.start_packages: The command '/usr/local/etc/rc.d/squid.sh stop' returned exit code '1', the output was '2013/09/27 10:52:39| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl"'
Sep 27 10:52:46	squid[42886]: Squid Parent: child process 43273 started
Sep 27 10:52:46	php: rc.start_packages: The command '/usr/pbi/squid-amd64/sbin/squid -k reconfigure' returned exit code '1', the output was '2013/09/27 10:52:46| aclParseAclLine: WARNING: empty ACL: acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl" squid: ERROR: Could not send signal 1 to process 32737: (3) No such process'

Filter on swap, which indicates that this did happen on 9/16 (running 2.0.3), but if so, squid restarted cleanly and there was no "great sluggishness" nor a need to manually restart squid. When I finally upped the GUI log entries enough to see that far back looking for squid, there is, in fact, this entry 7 seconds after it was killed:
Sep 16 20:28:30 squid[43487]: Squid Parent: child process 44026 started

Sep 16 20:28:23	kernel: swap zone exhausted, increase kern.maxswzone
Sep 16 20:28:23	kernel: pid 13449 (squid), uid 62, was killed: out of swap space
Sep 16 20:28:23	kernel: swap zone exhausted, increase kern.maxswzone
Sep 16 20:28:23	kernel: pid 30069 (php), uid 0, was killed: out of swap space
Sep 21 11:15:05	squid[55785]: Failed to verify one of the swap directories, Check cache.log for details. Run 'squid -z' to create swap directories if needed, or if running Squid for the first time.
Sep 21 11:39:16	squid[5931]: Failed to verify one of the swap directories, Check cache.log for details. Run 'squid -z' to create swap directories if needed, or if running Squid for the first time.
Sep 21 12:06:29	squid[92872]: Failed to verify one of the swap directories, Check cache.log for details. Run 'squid -z' to create swap directories if needed, or if running Squid for the first time.
Sep 26 16:19:06	kernel: swap zone exhausted, increase kern.maxswzone
Sep 26 16:19:06	kernel: pid 61679 (squid), uid 62, was killed: out of swap space
Sep 26 16:19:06	kernel: swap zone exhausted, increase kern.maxswzone
Sep 26 16:19:06	kernel: pid 31134 (php), uid 0, was killed: out of swap space

kejianshi

What does your squid cache management setup page look like?

Ecnerwal

Hmmm - what are folks doing to post pretty pictures here? Hosting them elsewhere, I suppose. Also, goodnight ;-)

The hard way: Hard disk cache size 320000 MB
hard disk cache system ufs
hard disk cache location /squid/cache
memory cache size 7168 (was 8192)
min object size 0
max object size 4000000
max object size in RAM 3200
level 1 subdirectories 128
memory replacement Heap LFUDA
cache replacement Heap LFUDA
low-water % 96
high water % 98

kejianshi

HDD cache size is too big (assuming its being filled to that level)

with your RAM, you should reduce it to about 120000 (very safe).

Also be sure not to dedicate more than half of your RAM to RAM cache.

The reason you can't just make the HDD cache however big you have space for is that it take RAM to index HDD cache.

If you actually fill all that cache, at some point between 120000MB and say 200000MB, you will probably exhaust your RAM and soon after, your swap.

Check to see how much space is has actually been used so far.

df -h

Ecnerwal

8GB is only half the RAM. But I've started de-tuning, dropping to 7. I suppose I'll probably start reducing disk cache as well, but the math isn't there, or the squid docs are wrong.

df -h output is in first post. 167GB used on that date

Dashboard claim of swap usage (only place I see that reported) is typically 0-12%, and memory cache size (while in 2.0.3) was tuned while keeping an eye on it (aiming for 0).

per squid docs, 320GB of cache should require 4.5 GB of index on 64 bit system at 14MB/GB. An additional 112MB is needed to index 8GB in RAM cache. 13GB or so total, leaving 3GB of 16 for everything else, if it were even full (it's not)

at 167GB, memory index should be less than 2.5 GB, which comes out less than 11GB for memory index and 8GB RAM cache and 112MB memory cache index.

Yes, mention is made of "hot objects" exceeding cache allocation. But I've got 3GB (5, supposedly, at current cache fill) of RAM and the swap disk if things are really going to get exciting, and I find it hard to believe that the full 32GB of swap has been used up, given that swap use is normally 0, or very close to it, and I've never seen it above 12%

Mostly, what I notice is that 2.0.3 handled the exception "gracefully" while 2.1 kills squid, does not restart it, and runs like cold molasses (ie, very slowly) if I restart squid without rebooting the whole system…

kejianshi

OK - But my install was blowing up even on 2.03 with excessively large HDD cache. When I started allocating no more than 20x my available RAM to HDD cache my system became very reliable. And that was the only change I made, so that was for sure the issue. I calculate available RAM for me to be no more than 1/2 total RAM since I dedicate 1/2 the total RAM to RAM cache already.

For me anyway. Thats also why I switched to a 64GB SSD. I realized I had no practical use for more HDD unless I also seriously upgraded my RAM.

I'm on 2.1 now and its rock solid with these same settings.

Uptime 12 Days 20 Hours 37 Minutes 19 Seconds (not that thats amazing long, but its long enough to suspect there are no problems)

kejianshi

P.S. "167GB used on that date" - Yeah. That probably means you are caching lots of really small objects also. In that case, even a 120GB HDD cache might put you close to the ragged edge.

I'd have figured somewhere in that range it would crash or at least start swapping really annoyingly.

1 easy fix is double your RAM if you a married to a really big HDD cache.

Ecnerwal

kejianshi - What is your RAM amount?

Back on the network in question, updated numbers:

182 GB this morning.

78% memory use, 0% swap per dashboard (this is at 7GB RAM cache still, down one from the previous setup, but only up for 25 hours or so since reboot)

Dropped disk cache size to 220GB, moved min object size on disk up to 4, dropped the watermarks 1% each.

We'll see how that plays out, and reduce further if need be. Or buy more RAM, but to some extent my focus is on getting the maximum use from what I have (quite a bit) rather than simply throwing more at it.

I've found very little on tuning things like the number of level 1 subdirectories - the built-in commentary is rather limited on the subject, and most other things I can find about squid are similar. Likewise choice of hard disk cache system…or min object size on disk, for that matter.

kejianshi

Oddly - I have found that if anything 2.1 is easier on my memory. Doesn't go up in the 80% very often now and the disk cache seems to purge alot more than it used to.

I'm only running 4GB of ram on this one, but the math on cache is simple and fairly constant.

For you, I'd keep an eye on exactly when you start having problems. I wouldn't set the HDD cache size to prevent the system from running out of swap. I'd set it up to avoid swapping at all. My swap size is 8GB but if I were using even 1KB of that, I'd assume something wasn't optimal and adjust down.

Ecnerwal

I have cut down both disk and RAM cache, and it's at least not crapping out entirely anymore or so far. But it has made me look in the logs a bit more.

One thing I noticed in the logs was a complaint about number of url_rewrite_children, so I bumped that from 5 to 50 (custom options in the GUI - I was going to mess about with the conf file but noticed that it was already in there set to 5, so I set it to 50) but it does not appear to be working - in that I'm still getting complaints that would make it seem it's set to 5. 50 is what it says in /usr/local/etc/squid/squid.conf, however, so the option is being passed from the GUI.

The main system log has this:
Oct 9 09:47:43 squid[33282]: The url_rewriter helpers are crashing too rapidly, need help!

and at the time of those, this in squid cache log

2013/10/09 09:47:43| WARNING: url_rewriter #5 (FD 22) exited
2013/10/09 09:47:43| WARNING: url_rewriter #3 (FD 18) exited
2013/10/09 09:47:43| WARNING: url_rewriter #4 (FD 19) exited
2013/10/09 09:47:43| Too few url_rewriter processes are running
FATAL: The url_rewriter helpers are crashing too rapidly, need help!

While the squid cache log has lots of this (and```
url_rewrite_children 50

2013/10/11 08:34:47| WARNING: All url_rewriter processes are busy.
2013/10/11 08:50:59| WARNING: All url_rewriter processes are busy.
2013/10/11 08:53:10| WARNING: All url_rewriter processes are busy.
2013/10/11 08:53:10| Consider increasing the number of url_rewriter processes to at least 16 in your config file.
2013/10/11 08:54:04| WARNING: All url_rewriter processes are busy.
2013/10/11 08:54:04| Consider increasing the number of url_rewriter processes to at least 15 in your config file.

kejianshi

Today my RAM used is at 90%. Still haven't hit the swap ever and no lockups or crashes. For me, pfsense works like a champ so long as I don't get silly with the cache settings. Its been 25 days since a restart, shutdown or reboot.

I'm glad yours is working well now.

kejianshi

BTW, what is your RAM and what are your current cache settings? (Incase someone later wants to know what is working)

Ecnerwal

RAM is 16 GB, as stated in my signature (and a few times higher in the thread), as it has been since I actually deployed in February 2013 (two identical systems on different networks, second one deployed in July 2013 - some i386 fiddling before building the 64bit systems for serious use.) One is still running 2.0.3

Cache detuned to 7GB RAM and 160 GB disk with the post-2.1 troubles.
Typical daily throughput in the 30GB range, 150-175 users (per lightsquid - about 100 people, but people with multiple devices get counted twice or three times)

I am a bit disturbed by the url_rewrite_chrildren being set to 50, but asking to be set to 15-16 as though they are still at 5. 50 was picked since the largest number I found was 52, while most were suggesting the teens, so it seemed like it would cover most of the issues with that - but it's not clear that it's "taking."

I also added the vm.pmap.shpgperproc tunable in system tunables due to complaints in the main system log about "Approaching the limit on PV entries" and raised it from 200 to 500.

$ cat /usr/local/etc/squid/squid.conf
# Do not edit manually !
http_port 172.XX.XX.1:3128
http_port 127.0.0.1:3128 transparent
icp_port 0

pid_filename /var/run/squid.pid
cache_effective_user proxy
cache_effective_group proxy
error_directory /usr/pbi/squid-amd64/etc/squid/errors/en
icon_directory /usr/pbi/squid-amd64/etc/squid/icons
visible_hostname localhost
cache_mgr admin@localhost
access_log /var/squid/log/access.log
cache_log /var/squid/log/cache.log
cache_store_log none
logfile_rotate 14
shutdown_lifetime 3 seconds
# Allow local network(s) on interface(s)
acl localnet src  172.XX.XX.0/255.255.192.0
uri_whitespace strip

cache_mem 7200 MB
maximum_object_size_in_memory 3200 KB
memory_replacement_policy heap LFUDA
cache_replacement_policy heap LFUDA
cache_dir aufs /squid/cache 160000 128 256
minimum_object_size 4 KB
maximum_object_size 4000000 KB
offline_mode off
cache_swap_low 50
cache_swap_high 80

# No redirector configured

# Setup some default acls
acl all src 0.0.0.0/0.0.0.0
acl localhost src 127.0.0.1/255.255.255.255
acl safeports port 21 70 80 210 280 443 488 563 591 631 777 901  3128 1025-65535 
acl sslports port 443 563  
acl manager proto cache_object
acl purge method PURGE
acl connect method CONNECT
acl dynamic urlpath_regex cgi-bin \?
cache deny dynamic
http_access allow manager localhost

# Allow external cache managers
acl ext_manager_1 src 127.0.0.1 
http_access allow manager ext_manager_1
acl ext_manager_2 src 172.XX.XX.1 
http_access allow manager ext_manager_2

http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !safeports
http_access deny CONNECT !sslports

# Always allow localhost connections
http_access allow localhost

quick_abort_min 4000 KB
quick_abort_max 0 KB
quick_abort_pct 40
request_body_max_size 0 KB
reply_body_max_size 0 deny all
delay_pools 1
delay_class 1 2
delay_parameters 1 -1/-1 -1/-1
delay_initial_bucket_level 100
# Throttle extensions matched in the url
acl throttle_exts urlpath_regex -i "/var/squid/acl/throttle_exts.acl"
delay_access 1 allow throttle_exts
delay_access 1 deny all

# Custom options
redirect_program /usr/pbi/squidguard-amd64/bin/squidGuard -c /usr/pbi/squidguard-amd64/etc/squidGuard/squidGuard.conf
redirector_bypass off
url_rewrite_children 50
# Setup allowed acls
# Allow local network(s) on interface(s)
http_access allow localnet
# Default block all to be sure
http_access deny all