Correctly cache windows updates with Squid3\. (And fix bad range_offset_limit)



  • Hello,

    After finding a problem with the built refresh patterns under dynamic cache settings, as well as several forum posts here and the linked documentation on squid tuning, that have bad examples on how to implement windows update caching, there seems to be a misunderstanding on how and why to set the proper settings to cache windows update content without affecting other traffic.  So I'm posting  this out to clear up how to set the settings in squid to get the desired behaviour of forcing windows updates to cache, without causing problems with other traffic.

    So the first step to caching windows updates, the obvious, is to set a refresh_pattern to force the cache to retain windows update files.  This part I think everyone has and is in the all the docs so no need to elaborate on this.

    The devil in the details is the second part; windows update uses chunked downloads to quietly retrieve updates in the background.  Since chunked or ranged updates are not a complete object, they are not cache-able, only the download of a complete file is cache-able.  Which results in many many workstations all downloading updates and nothing getting cached.  So, we need Squid to fetch a full file, not a ranged chunk, so that subsequent requests can be satisfied from the cache.

    And this is not a problem, Squid3 has a configuration directive that changes a ranged download into a full file retrieval; it's the range_offset_limit, which controls how squid handles range offset download request.  And when set to -1 causes squid to ignore the range request and retrieve the whole file.  Thus setting this directive to force squid to download the full file instead of a chunk results in the first windows workstation to request the file causes squid to retrieve the whole file such that subsequent requests can be satisfied from the cache.

    All good yes?  Just what we want to happen to get those windows updates into our cache, yes?

    Now here is where the problems are occurring.  The range_offset_limit directive accepts an ACL to limit the scope of what the setting applies too.  And using an ACL means you can have different values for different purposes, but also means that you do not need to change the overall system default behaviour that allows squid to handle ranged downloads, just to force caching of windows updates.  But without an ACL specified, which as with all squid directives that have optional ACL's, the lack of an ACL is a wild-card match for all traffic.  And squid evaluates multiple range_offset_limit directives on a first-match basis.  Thus not using an ACL to limit what the range_offset_limit -1 applies to, the first statement without an ACL is a wild-card match such that it applies to all traffic, not just the windows updates that we specifically want to force download.

    Many examples show the use of range_offset_limit set to -1 to force the update files to be downloaded and cached, but do not show the use of ACL's to limit the scope of the setting to just the intended traffic, without overriding the system default.  Which then causes other problems with content that should be chunked but instead squid is downloading full files.  Not bad with small files, but with very large files, squid will appear to be unresponsive and possibly even slow down performance wise as squid is trying to download multiple large files that should have been chunks of a file until it can get a copy into it's cache.

    The solution to not break ranged download for the unintended traffic, but make sure it's used for our Windows Update downloads is to use an ACL to restrict the range_request_limit -1 to just the items we want to force to cache.

    So here is an example of setting the range_offset_limit to only apply to windows updates to get the forced caching behaviour we want for windows updates, but not affect other traffic.

    
    acl Windows_Update dstdomain windowsupdate.microsoft.com
    acl Windows_Update dstdomain .update.microsoft.com
    acl Windows_Update dstdomain download.windowsupdate.com
    acl Windows_Update dstdomain www.download.windowsupdate.com
    
    range_offset_limit -1 Windows_Update
    
    refresh_pattern -i microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip) 4320 80% 43200 reload-into-ims
    refresh_pattern -i windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip) 4320 80% 43200 reload-into-ims
    
    

    The use of the acl "Windows_Update" allows the range_offset_limit -1 to only apply to the patterns matched by the stacked ACL "Windows_Update", which in this case is a list of known domains used for windows updates, and no other traffic, thus leaving all non windows-update traffic to behave according to the default (0) where squid will forward the range request.  And of course other forms of ACL's could be used if you wanted to be more specific, but suffice to say, using any ACL with the range_offset_limit -1 keeps it from applying to traffic it should not be used for.  After the files are retrieved, then the refresh_patterns control how long the files are retained in cache.

    And of course, this applies to any traffic whereby you want to override the default chunk download behaviour, but don't want to force for all traffic.  Just setup an ACL and apply it to your range_offset_limit.

    Now, onto the problem with the current squid3 built in refresh patterns under the dynamic content cache configuration settings is that when any of the built it patterns are enabled, in addition to the refresh_patterns added to squid.conf, each one adds a "refresh_pattern -1" directive, with no ACL to the generated squid.conf.  And further, that these statements will appear in the generated squid.conf prior to any other gui configured sections so that they will be the first match before any other user settings and thus override the overall default behaviour.

    The resulting behaviour is that enabling any of the built-in refresh patterns sets the range_offset_limit from the default of 0 to -1 for all traffic.  So until this is fixed in the pfSense GUI, and if you do not want this behaviour applied to all traffic, you should disable the use of the built in patterns and use your own patterns that correctly implement ACL's in order to restrict the scope of the over-riding of the range_request_limit in order to not unexpectedly break other traffic.

    See the Squid docs for more details about the range_offset_limit directive.
    http://www.squid-cache.org/Doc/config/range_offset_limit/

    And the docs here should be corrected to properly implement range_offset_limit with an ACL, instead of a warning that other things might break.

    https://doc.pfsense.org/index.php/Squid_Package_Tuning#Caching_Windows_Updates

    So if you're having problems with squid misbehaving with chunked downloads, check your squid.conf for unexpected range_offset_limit entries that may be over-riding the default for all traffic.


  • Banned

    All this shit will go to /dev/null with next version - https://github.com/pfsense/pfsense-packages/pull/1146. Never worked properly/reliably, outdated, useless, things changing all time. If people want to abuse Squid for nonsense like distributing Windows updates, they need to do it manually. There's also some crap like Avast (I know for fact it's completely no-op, they've switched to streaming updates that occur hundreds times every day), or Symantec - exactly the same. WTF.

    There^^^



  • Windows 10 updates are .PSF files.  You might want to update your regex.



  • @KOM:

    Windows 10 updates are .PSF files.  You might want to update your regex.

    Thank you for the note on Windows 10 updates.  But these were not "my" regex.  For the sake of example; I extracted these from the built-ins, (that have now been removed from the webgui).

    However, my point was not to focus on windows updates specifically, I only used that as an example that everyone seems familiar with, in the context of overriding default squid behaviour to force caching of content, in order to demonstrate how to limit the scope of the range_offset_limit, without causing problems with other traffic.

    This concept can be applied to any traffic for which you need to override the default behaviour for a specific set of traffic, matched by ACL, without the need to change the default for all traffic.

    Overall, I see people keep mucking with this setting, along with quick_abort_min -1, which will cause squid to appear to zombie download content, generally manifesting as "performance" problems that manifest over time, because they mess with these settings to affect one specific type of traffic and not realize what it does to other traffic, that was working fine.

    And given the amount of complaints about performance problems with Squid that are indicative of this very problem, and the amount of bad examples found throughout the internet (not just related to pfsense); ever more so with pre-built configs being removed from the pfsense gui in favour of users adding their own custom configs, there is a much larger chance of people simply cutting and pasting poorly implemented configurations that will lead them back to the same unexpected behaviour and performance problems.

    If anything, I think the pfsense gui for squid could be augmented to help users create a proper ACL scoped configurations instead of changing the overall default behaviour, simple by enforcing that any added configurations be wrapped in an ACL so that users can't unintentionally override the default for all traffic, just to address an issue with a limited set of traffic.

    The gui could, instead of having pre-built configs, have add config lists where users could create a "Traffic Override" that would require an ACL name, sanity check ACL patterns, and insure the the configs generated for a specific set of traffc, are limited by the ACL, so that users can generate limited scoped traffic override configurations, without causing themselves more problems, that they then blame on pfsense / squid, because they don't understand how these settings affect various forms of other traffic.



  • I am running the new 2.3 pfSense. What is the recommend refresh settings for squid in order to get optimal performance? We are Windows 10 based and the kids game a lot. I'm sure a portion of our use could be cached and shared with others in the house.



  • I am newbie in pfSense. Just I did not understand where I enter the code that you said. I enable the "Cache Dynamic Content" cof course and insert all the code in "Custom refresh_patterns"?



  • Hello guys,
    I am Brazilian and I am with this same problem, I am wanting to block this automatic download of Windows Update, and here I have some O.S Win7 and Win10, and wanted to make this block via squid. Could anyone help? Thanks for help.