Caching a sharepoint library with HTTPS reverse proxy
-
Hello,
I have setup a HTTPS reverse proxy accelerator with pfSense and Squid for a SharePoint Online library in Office 365.
The reverse HTTPS is working fine, and I can see all the downloaded documents in the Squid logs.Now I would like to cache the documents to reduce latency for our branch office.
By default the documents have the cache header indication "no-cache" or "cache-private".
Still, I would like to force caching the shared libraries documents (otherwise this setup has no real interest).So my setup is very classic as described below:
User PC <–- request tenant.sharepoint.com ---> pfSense reverse proxy with internal CA certificate <---> Microsoft SharePoint.com online library
My squid conf file:
--------------------------------------------------------------------- # This file is automatically generated by pfSense # Do not edit manually ! http_port 10.10.10.10:3128 icp_port 0 digest_generation off dns_v4_first on pid_filename /var/run/squid/squid.pid cache_effective_user squid cache_effective_group proxy error_default_language en icon_directory /usr/local/etc/squid/icons visible_hostname pfSense Firewall cache_mgr pfsense@mycomp.cloud access_log /var/squid/logs/access.log cache_log /var/squid/logs/cache.log cache_store_log none netdb_filename /var/squid/logs/netdb.state pinger_enable on pinger_program /usr/local/libexec/squid/pinger logfile_rotate 7 debug_options rotate=7 shutdown_lifetime 3 seconds # Allow local network(s) on interface(s) acl localnet src 10.10.10.0/24 forwarded_for on uri_whitespace strip cache_mem 128 MB maximum_object_size_in_memory 20 MB memory_replacement_policy heap GDSF cache_replacement_policy heap LFUDA minimum_object_size 0 KB maximum_object_size 20 MB cache_dir ufs /var/squid/cache 300 16 256 offline_mode on cache_swap_low 90 cache_swap_high 95 cache allow all # Add any of your own refresh_pattern entries above these. refresh_pattern ^ftp: 1440 20% 10080 refresh_pattern ^gopher: 1440 0% 1440 refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 refresh_pattern . 0 20% 4320 refresh_pattern -i \.jpg$ 30 50% 4320 ignore-reload ignore-no-cache ignore-no-store ignore-private refresh_pattern -i \.pdf$ 30 50% 4320 ignore-reload ignore-no-cache ignore-no-store ignore-private refresh_pattern -i \.docx$ 30 50% 4320 ignore-reload ignore-no-cache ignore-no-store ignore-private #Remote proxies # Setup some default acls # ACLs all, manager, localhost, and to_localhost are predefined. acl allsrc src all acl safeports port 21 70 80 210 280 443 488 563 591 631 777 901 4443 3128 3129 1025-65535 acl sslports port 443 563 4443 ---------------------------------------------------------------------
The Squid access log:
Date IP Status Address User Destination
24.08.2017 12:42:18 10.10.10.100 TCP_MISS/200 https://tenant.sharepoint.com/sites/Marketing/Shared%20Documents/picture.jpg
24.08.2017 12:42:17 10.10.10.100 TCP_MISS/200 https://tenant.sharepoint.com/sites/Marketing/Shared%20Documents/large1.pdf
24.08.2017 12:42:16 10.10.10.100 TCP_MISS/200 https://tenant.sharepoint.com/sites/Marketing/Shared%20Documents/large1.docxThe cache manager info:
Cache information for squid:
Hits as % of all requests: 5min: 0.0%, 60min: 0.0%
Hits as % of bytes sent: 5min: 0.0%, 60min: 0.0%
Memory hits as % of hit requests: 5min: 0.0%, 60min: 0.0%
Disk hits as % of hit requests: 5min: 0.0%, 60min: 0.0%
Storage Swap size: 0 KB
Storage Swap capacity: 0.0% used, 100.0% free
Storage Mem size: 216 KB
Storage Mem capacity: 0.2% used, 99.8% free
Mean Object Size: 0.00 KBIf I retry to download, I keep getting the HTTP_MISS. Can't get any file into the cache.
Microsoft is tagging all documents with the following cache tag:
HTTP/1.1 Cache-Control Header is present: private,max-age=0
private: This response MUST NOT be cached by a shared cache.
max-age: This resource will expire immediately. (0 sec)That's for security reasons I suppose (by default).
But I would like to know if I can override this and evaluate the security risk.
Microsoft offer a lot of online storage space, but without a proxy for caching, it's pretty much useless.Thank you for any help you could give.
-
I forgot to add that I am actually using the WebDAV protocol.
But it seems that Squid support caching WebDAV : http://www.webdav.org/other/proxy.htmlI am not sure that this is going to work:
1 - using WebDAV
2 - over HTTPS
3 - with files tagged with cache-control : no-cache / cache-privateEven it I finally get it working, as I've read somewhere else, I have a high risk to create corruption data in the WebDAV repository.
Please tell me if I'm wrong.
-
Long story short, I am now able to cache Sharepoint documents with the following configuration file:
# This file is automatically generated by pfSense # Do not edit manually ! http_port 10.10.10.10:3128 icp_port 0 digest_generation off dns_v4_first on pid_filename /var/run/squid/squid.pid cache_effective_user squid cache_effective_group proxy error_default_language en icon_directory /usr/local/etc/squid/icons visible_hostname sv-1101-wvp01.virtualdesk.cloud cache_mgr pfsense@virtualdesk.cloud access_log /var/squid/logs/access.log cache_log /var/squid/logs/cache.log cache_store_log none netdb_filename /var/squid/logs/netdb.state pinger_enable on pinger_program /usr/local/libexec/squid/pinger logfile_rotate 7 debug_options rotate=7 shutdown_lifetime 3 seconds # Allow local network(s) on interface(s) acl localnet src 92.222.209.0/24 forwarded_for on uri_whitespace strip cache_mem 128 MB maximum_object_size_in_memory 512 KB memory_replacement_policy heap GDSF cache_replacement_policy heap LFUDA minimum_object_size 0 KB maximum_object_size 20 MB cache_dir ufs /var/squid/cache 100 16 256 offline_mode on cache_swap_low 90 cache_swap_high 95 cache allow all # Cache documents regardless what the server says refresh_pattern .jpg 60 90% 600 override-expire override-lastmod ignore-reload ignore-private refresh_pattern .gif 60 90% 600 override-expire override-lastmod ignore-reload ignore-private refresh_pattern .png 60 90% 600 override-expire override-lastmod ignore-reload ignore-private refresh_pattern .txt 60 90% 600 override-expire override-lastmod ignore-reload ignore-private refresh_pattern .doc 60 90% 600 override-expire override-lastmod ignore-reload ignore-private refresh_pattern .docx 60 90% 600 override-expire override-lastmod ignore-reload ignore-private refresh_pattern .xls 60 90% 600 override-expire override-lastmod ignore-reload ignore-private refresh_pattern .xlsx 60 90% 600 override-expire override-lastmod ignore-reload ignore-private refresh_pattern .pdf 60 90% 600 override-expire override-lastmod ignore-reload ignore-private # Setup acls acl allsrc src all http_access allow all request_body_max_size 0 KB delay_pools 1 delay_class 1 2 delay_parameters 1 -1/-1 -1/-1 delay_initial_bucket_level 100 delay_access 1 allow allsrc # Reverse Proxy settings https_port 10.10.10.10:443 accel cert=/usr/local/etc/squid/599eae0080989.crt key=/usr/local/etc/squid/599eae0080989.key cache_peer tenant.sharepoint.com parent 443 0 no-query no-digest originserver login=PASSTHRU connection-auth=on ssl sslflags=DONT_VERIFY_PEER front-end-https=auto name=rvp_sharepoint deny_info TCP_RESET allsrc
But unfortunately, it is not working yet.
The WebDAV client (Windows) will not accept to download from the cache.I will receive errors from SQUID :
TCP_OFFLINE_HIT_ABORTED/000
(see attachment)
-
Found the right configuration with the help of the Squid Users mailing list.
I had to add different options to ignore cache control and force the cache to keep and serve the content.
But it's working now.
For the record, I'm posting the working Squid Configuration below.http_port 10.10.10.10.108:3128 icp_port 0 digest_generation off dns_v4_first on pid_filename /var/run/squid/squid.pid cache_effective_user squid cache_effective_group proxy error_default_language en icon_directory /usr/local/etc/squid/icons visible_hostname pfSense Firewall cache_mgr pfsense@virtualdesk.cloud access_log /var/squid/logs/access.log cache_log /var/squid/logs/cache.log cache_store_log none netdb_filename /var/squid/logs/netdb.state pinger_enable on pinger_program /usr/local/libexec/squid/pinger logfile_rotate 7 debug_options rotate=7 shutdown_lifetime 3 seconds forwarded_for on uri_whitespace strip refresh_pattern -i \.(jpg|gif|png|txt|docx|xlsx|pdf) 30240 100% 43800 override-expire ignore-private ignore-reload store-stale cache_mem 128 MB maximum_object_size_in_memory 20480 KB memory_replacement_policy lru cache_replacement_policy lru minimum_object_size 0 KB maximum_object_size 50 MB cache_dir ufs /var/squid/cache 100 16 256 offline_mode on cache_swap_low 90 cache_swap_high 95 cache allow all # Add any of your own refresh_pattern entries above these. refresh_pattern ^ftp: 1440 20% 10080 refresh_pattern ^gopher: 1440 0% 1440 refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 refresh_pattern . 0 20% 4320 #ACL allow all acl allsrc src all http_access allow allsrc request_body_max_size 0 KB delay_pools 1 delay_class 1 2 delay_parameters 1 -1/-1 -1/-1 delay_initial_bucket_level 100 delay_access 1 allow allsrc # Reverse Proxy settings https_port 10.10.10.10.108:443 accel cert=/usr/local/etc/squid/599eae0080989.crt key=/usr/local/etc/squid/599eae0080989.key defaultsite=tenant.sharepoint.com vhost # cache_peer 13.107.6.151 parent 443 0 ignore-cc no-query no-digest originserver login=PASSTHRU connection-auth=on round-robin ssl sslflags=DONT_VERIFY_PEER front-end-https=auto name=rvp_sharepoint