Caching a sharepoint library with HTTPS reverse proxy



  • Hello,

    I have setup a HTTPS reverse proxy accelerator with pfSense and Squid for a SharePoint Online library in Office 365.
    The reverse HTTPS is working fine, and I can see all the downloaded documents in the Squid logs.

    Now I would like to cache the documents to reduce latency for our branch office.
    By default the documents have the cache header indication "no-cache" or "cache-private".
    Still, I would like to force caching the shared libraries documents (otherwise this setup has no real interest).

    So my setup is very classic as described below:

    User PC <–- request tenant.sharepoint.com ---> pfSense reverse proxy with internal CA certificate <---> Microsoft SharePoint.com online library

    My squid conf file:

    
    ---------------------------------------------------------------------
    # This file is automatically generated by pfSense
    # Do not edit manually !
    
    http_port 10.10.10.10:3128
    icp_port 0
    digest_generation off
    dns_v4_first on
    pid_filename /var/run/squid/squid.pid
    cache_effective_user squid
    cache_effective_group proxy
    error_default_language en
    icon_directory /usr/local/etc/squid/icons
    visible_hostname pfSense Firewall
    cache_mgr pfsense@mycomp.cloud
    access_log /var/squid/logs/access.log
    cache_log /var/squid/logs/cache.log
    cache_store_log none
    netdb_filename /var/squid/logs/netdb.state
    pinger_enable on
    pinger_program /usr/local/libexec/squid/pinger
    
    logfile_rotate 7
    debug_options rotate=7
    shutdown_lifetime 3 seconds
    # Allow local network(s) on interface(s)
    acl localnet src  10.10.10.0/24
    forwarded_for on
    uri_whitespace strip
    
    cache_mem 128 MB
    maximum_object_size_in_memory 20 MB
    memory_replacement_policy heap GDSF
    cache_replacement_policy heap LFUDA
    minimum_object_size 0 KB
    maximum_object_size 20 MB
    cache_dir ufs /var/squid/cache 300 16 256
    offline_mode on
    cache_swap_low 90
    cache_swap_high 95
    cache allow all
    # Add any of your own refresh_pattern entries above these.
    refresh_pattern ^ftp:    1440  20%  10080
    refresh_pattern ^gopher:  1440  0%  1440
    refresh_pattern -i (/cgi-bin/|\?) 0  0%  0
    refresh_pattern .    0  20%  4320
    refresh_pattern -i \.jpg$ 30 50% 4320 ignore-reload ignore-no-cache ignore-no-store ignore-private
    refresh_pattern -i \.pdf$ 30 50% 4320 ignore-reload ignore-no-cache ignore-no-store ignore-private
    refresh_pattern -i \.docx$ 30 50% 4320 ignore-reload ignore-no-cache ignore-no-store ignore-private
    
    #Remote proxies
    
    # Setup some default acls
    # ACLs all, manager, localhost, and to_localhost are predefined.
    acl allsrc src all
    acl safeports port 21 70 80 210 280 443 488 563 591 631 777 901 4443 3128 3129 1025-65535
    acl sslports port 443 563 4443
    ---------------------------------------------------------------------
    
    

    The Squid access log:

    Date IP Status Address User Destination
    24.08.2017 12:42:18 10.10.10.100 TCP_MISS/200 https://tenant.sharepoint.com/sites/Marketing/Shared Documents/picture.jpg
    24.08.2017 12:42:17 10.10.10.100 TCP_MISS/200 https://tenant.sharepoint.com/sites/Marketing/Shared Documents/large1.pdf
    24.08.2017 12:42:16 10.10.10.100 TCP_MISS/200 https://tenant.sharepoint.com/sites/Marketing/Shared Documents/large1.docx

    The cache manager info:

    Cache information for squid:
    Hits as % of all requests: 5min: 0.0%, 60min: 0.0%
    Hits as % of bytes sent: 5min: 0.0%, 60min: 0.0%
    Memory hits as % of hit requests: 5min: 0.0%, 60min: 0.0%
    Disk hits as % of hit requests: 5min: 0.0%, 60min: 0.0%
    Storage Swap size: 0 KB
    Storage Swap capacity: 0.0% used, 100.0% free
    Storage Mem size: 216 KB
    Storage Mem capacity: 0.2% used, 99.8% free
    Mean Object Size: 0.00 KB

    If I retry to download, I keep getting the HTTP_MISS. Can't get any file into the cache.

    Microsoft is tagging all documents with the following cache tag:

    HTTP/1.1 Cache-Control Header is present: private,max-age=0
    private: This response MUST NOT be cached by a shared cache.
    max-age: This resource will expire immediately. (0 sec)

    That's for security reasons I suppose (by default).
    But I would like to know if I can override this and evaluate the security risk.
    Microsoft offer a lot of online storage space, but without a proxy for caching, it's pretty much useless.

    Thank you for any help you could give.



  • I forgot to add that I am actually using the WebDAV protocol.
    But it seems that Squid support caching WebDAV : http://www.webdav.org/other/proxy.html

    I am not sure that this is going to work:

    1 - using WebDAV
    2 - over HTTPS
    3 - with files tagged with cache-control : no-cache / cache-private

    Even it I finally get it working, as I've read somewhere else, I have a high risk to create corruption data in the WebDAV repository.

    Please tell me if I'm wrong.



  • Long story short, I am now able to cache Sharepoint documents with the following configuration file:

    
    # This file is automatically generated by pfSense
    # Do not edit manually !
    
    http_port 10.10.10.10:3128
    icp_port 0
    digest_generation off
    dns_v4_first on
    pid_filename /var/run/squid/squid.pid
    cache_effective_user squid
    cache_effective_group proxy
    error_default_language en
    icon_directory /usr/local/etc/squid/icons
    visible_hostname sv-1101-wvp01.virtualdesk.cloud
    cache_mgr pfsense@virtualdesk.cloud
    access_log /var/squid/logs/access.log
    cache_log /var/squid/logs/cache.log
    cache_store_log none
    netdb_filename /var/squid/logs/netdb.state
    pinger_enable on
    pinger_program /usr/local/libexec/squid/pinger
    
    logfile_rotate 7
    debug_options rotate=7
    shutdown_lifetime 3 seconds
    # Allow local network(s) on interface(s)
    acl localnet src  92.222.209.0/24
    forwarded_for on
    uri_whitespace strip
    
    cache_mem 128 MB
    maximum_object_size_in_memory 512 KB
    memory_replacement_policy heap GDSF
    cache_replacement_policy heap LFUDA
    minimum_object_size 0 KB
    maximum_object_size 20 MB
    cache_dir ufs /var/squid/cache 100 16 256
    offline_mode on
    cache_swap_low 90
    cache_swap_high 95
    cache allow all
    
    # Cache documents regardless what the server says
    refresh_pattern .jpg 60 90% 600 override-expire override-lastmod ignore-reload ignore-private
    refresh_pattern .gif 60 90% 600 override-expire override-lastmod ignore-reload ignore-private
    refresh_pattern .png 60 90% 600 override-expire override-lastmod ignore-reload ignore-private
    refresh_pattern .txt 60 90% 600 override-expire override-lastmod ignore-reload ignore-private
    refresh_pattern .doc 60 90% 600 override-expire override-lastmod ignore-reload ignore-private
    refresh_pattern .docx 60 90% 600 override-expire override-lastmod ignore-reload ignore-private
    refresh_pattern .xls 60 90% 600 override-expire override-lastmod ignore-reload ignore-private
    refresh_pattern .xlsx 60 90% 600 override-expire override-lastmod ignore-reload ignore-private
    refresh_pattern .pdf 60 90% 600 override-expire override-lastmod ignore-reload ignore-private
    
    # Setup acls
    acl allsrc src all
    http_access allow all
    
    request_body_max_size 0 KB
    delay_pools 1
    delay_class 1 2
    delay_parameters 1 -1/-1 -1/-1
    delay_initial_bucket_level 100
    delay_access 1 allow allsrc
    
    # Reverse Proxy settings
    https_port 10.10.10.10:443 accel cert=/usr/local/etc/squid/599eae0080989.crt key=/usr/local/etc/squid/599eae0080989.key
    cache_peer tenant.sharepoint.com parent 443 0 no-query no-digest originserver login=PASSTHRU connection-auth=on ssl sslflags=DONT_VERIFY_PEER front-end-https=auto name=rvp_sharepoint
    deny_info TCP_RESET allsrc
    
    

    But unfortunately, it is not working yet.
    The WebDAV client (Windows) will not accept to download from the cache.

    I will receive errors from SQUID :

    TCP_OFFLINE_HIT_ABORTED/000

    (see attachment)




  • Found the right configuration with the help of the Squid Users mailing list.
    I had to add different options to ignore cache control and force the cache to keep and serve the content.
    But it's working now.
    For the record, I'm posting the working Squid Configuration below.

    
    http_port 10.10.10.10.108:3128
    icp_port 0
    digest_generation off
    dns_v4_first on
    pid_filename /var/run/squid/squid.pid
    cache_effective_user squid
    cache_effective_group proxy
    error_default_language en
    icon_directory /usr/local/etc/squid/icons
    visible_hostname pfSense Firewall
    cache_mgr pfsense@virtualdesk.cloud
    access_log /var/squid/logs/access.log
    cache_log /var/squid/logs/cache.log
    cache_store_log none
    netdb_filename /var/squid/logs/netdb.state
    pinger_enable on
    pinger_program /usr/local/libexec/squid/pinger
    
    logfile_rotate 7
    debug_options rotate=7
    shutdown_lifetime 3 seconds
    forwarded_for on
    uri_whitespace strip
    
    refresh_pattern -i \.(jpg|gif|png|txt|docx|xlsx|pdf) 30240 100% 43800 override-expire ignore-private ignore-reload store-stale
    
    cache_mem 128 MB
    maximum_object_size_in_memory 20480 KB
    memory_replacement_policy lru
    cache_replacement_policy lru
    minimum_object_size 0 KB
    maximum_object_size 50 MB
    cache_dir ufs /var/squid/cache 100 16 256
    offline_mode on
    cache_swap_low 90
    cache_swap_high 95
    cache allow all
    
    # Add any of your own refresh_pattern entries above these.
    refresh_pattern ^ftp:    1440  20%  10080
    refresh_pattern ^gopher:  1440  0%  1440
    refresh_pattern -i (/cgi-bin/|\?) 0  0%  0
    refresh_pattern .    0  20%  4320
    
    #ACL allow all
    acl allsrc src all
    http_access allow allsrc
    
    request_body_max_size 0 KB
    delay_pools 1
    delay_class 1 2
    delay_parameters 1 -1/-1 -1/-1
    delay_initial_bucket_level 100
    delay_access 1 allow allsrc
    
    # Reverse Proxy settings
    https_port 10.10.10.10.108:443 accel cert=/usr/local/etc/squid/599eae0080989.crt key=/usr/local/etc/squid/599eae0080989.key defaultsite=tenant.sharepoint.com vhost
    
    #
    cache_peer 13.107.6.151 parent 443 0 ignore-cc no-query no-digest originserver login=PASSTHRU connection-auth=on round-robin ssl sslflags=DONT_VERIFY_PEER front-end-https=auto name=rvp_sharepoint
    
    

Log in to reply