Squid + https

techtester-m

I want to try squid on pfSense.

Since https connections encrypt everything including the URLs so none, not even the ISP or the local router, can see anything, how does squid manages to get the entire URL in order to cache it?

Sorry if it's a noob question...

DaddyGo

@techtester-m

hi,

Good Squid setup takes a lot of time and effort...
the answer to your question is....... an intermediate cert. uses for https - (MITM)

like:

techtester-m

@DaddyGo But how does it do that exactly?

DaddyGo

@techtester-m

as you do, when you browse with Firefox/Chrome, etc. and then examine inside (https) and with intermediate Cert. pass behind the proxy

it has to be said that in many cases this only works with a problem, for example at bank and government sites

techtester-m

@DaddyGo But unlike the browser which has all the data, when it reaches the pfSense router it's all already encrypted. I'm sorry but I don't understand exactly how it's possible or allowed lol

Actually these all the questions (I hope) I have about Squid:

How does squid's ssl interception/MITM work exactly?
When a client asks for a cached https page squid needs to serve it with a new certificate. Could that be a trusted certificate, in order to not have to install certificates on every client?
Does every request for a cached content needs a different certificate or something like that?
What happens when a client received a cached content but now request or does something new in that website? Obviously the new request/action which is not cached will try to reach the actual website but with what certificate/key, since a different one was signed by squid. Will it simply create a new connection/session with that website and fetch new certs/keys etc.?

I'd appreciate it very much If you could take the time to answer these and bear with me for a little bit.

Thank you,

Gertjan

@techtester-m said in Squid + https:

Could that be a trusted certificate, in order to not have to install certificates on every client?

A (your !) cert has to be install don every device that have to use your proxy.
And worse.It could be any certificate,and set to be trusted on every users device.
This enables the browser to actually accept connections when it want to connect to "your-bank.tld" but it receives a reply from "yourpfsense'.tld". Normally, the browser would yell, and signal a big huge intrusion alarm, which is what happens.
Now, the browser accepts silently the fake certificate from the web server it connected to, the device where squid is running. On the squid side, everything is unpacked, and thus readable, and squid connects on your behalf to the real "your-bank.tld" site over SSL as normal. Replies that come back are unpacked on squid's side, and repacked in the SSL connections between squid and your device.

Btw : these are my words. I think I'm not far of here.

https traffic is always 'set' as non cacheable. Browser won't cache https traffic.

techtester-m

@Gertjan said in Squid + https:

On the squid side, everything is unpacked, and thus readable, and squid connects on your behalf to the real "your-bank.tld" site over SSL as normal. Replies that come back are unpacked on squid's side, and repacked in the SSL connections between squid and your device.

I understand the part where Squid is the middle man and everything but how does it know what the client has asked from the real website in order to ask it on his behalf and cache it (https)? The first time a webpage is visited by the client it goes to the real one, right? Unless all requests go through Squid/MITM?

What I fail to understand is where/when/how Squid can "decrypt/unpack" the https traffic of the client? Due to the fact that the client has installed that cert or...?

DaddyGo

@techtester-m

I see you are quite enthusiastic in Squid theme.
Do you want to learn?

I have a well-prepared acquaintance (Marcelo) here on the forum, can help you
(he recently wrote that he have more time, because the COVID anyway)

I can ping here @mcury... and if he have time you can learn a lot from him

techtester-m

@DaddyGo I just like to know everything (almost, within the reasonable limits of my understanding) about what I'm working on, using, dealing with etc. haha
I hate the "It just works, so don't touch it" approach :)

DaddyGo

@techtester-m said in Squid + https:

I hate the "It just works, so don't touch it" approach :)

Perfectly good approach.
I think Marcelo will show up soon, if only not spend his vacation now.

techtester-m

@DaddyGo That's the visualization of my question haha...
Screen Shot 2020-07-15 at 12.43.03.png

DaddyGo

@techtester-m

I understand your question, but now you are next...
the best way to learn, if you do everything yourself and just get guidance from colleagues

https://www.howtoforge.com/filtering-https-traffic-with-squid
https://wiki.squid-cache.org/Features/SslBump

of course have to run a test copy of Squid and practice the steps and watch what happens

if you get completely stuck, then it is our job to help

techtester-m

@DaddyGo Ok...I've read it. So in order to intercept Squid does exactly this:
Screen Shot 2020-07-15 at 14.05.12.png

Correct?

DaddyGo

@techtester-m

exactly, somehow so
now the test time can come
(if you use it at home, create a test environment)

you will run into a lot of problems, mainly with Android, iOS devices, so you’ll see what you need to shape

next dose of curriculum (for example):
http://www.webdnstools.com/articles/squid-proxy-whitelist

I give a sample file:
https://www.dropbox.com/sh/pp3m9reh2eikks2/AADIJmyBKZ4cZKOqs3A-eBGva?dl=0

techtester-m

@DaddyGo Ok...
Few more if you may :)

As I understand it so far it goes something like this (and please correct me where I'm wrong) -

(1) Client --> Squid interception of PK, aka MITM --> destination.

(2) It will do the above only for what is in its whitelist.

(3) A private self created CA has to be installed on every client that needs to use the Squid proxy, so it could dynamically generate certificates. Using Let's Encrypt or anything of the sort won't work here and also be a big no no.

(4) Everything in the whitelist will be cached (up to X G/MB limit), including files.

(5) A destination that is not on the whitelist will be blocked? If so then I think I prefer DNS blocking.

(6) Is their an option to cache/intercept only certain destinations (whitelist or something) and treat all the others as usual without "squidding" them?

DaddyGo

@techtester-m

Yes
no, these are exceptions as they require special rules and accesses
(otherwise they do not work)
note the whole world cannot be whitelisted
Yes, with this method... ("Using Let's Encrypt or anything of the sort won't work here and also be a big no no.") yes

half true, (including files - it would require awful storage capacity - depending on what file we are talking about)
No, the whitelist is necessary, because of the above, I also recommend DNS blocking in SOHO environment...
(Squid is a big boys game, because of big systems and proxy capabilities)
6.Yes by bypassing the proxy (ACL, PAC,etc)
https://wiki.squid-cache.org/ConfigExamples/Authenticate/Bypass

techtester-m

@DaddyGo said in Squid + https:

no, these are exceptions as they require special rules and accesses
(otherwise they do not work)

So what do you do with exceptions? Let them bypass squid and go as usual or "fine tuning" them?

@DaddyGo said in Squid + https:

half true, (including files - it would require awful storage capacity - depending on what file we are talking about)

For the sake of learning let's say there's enough storage, even though an organized NAS would be MUCH better for any business/company. Does squid has the option to set a maximum X G/MB for caching? Sort of a circular caching where a new file would push and old one(s) when that maximum X is met.

@DaddyGo said in Squid + https:

No, the whitelist is necessary, because of the above

Perhaps it's only semantics here but the whitelist in this case is what you want Squid to authenticate for you (MITM) and cache and has nothing to do with restrictions, blocking etc.?

Screen Shot 2020-07-15 at 15.21.54.png
Ok...Semantics again. Squid does both access restrictions AND caching?

Gertjan

There are still some sites left on the Internet that accept http (NON SSL) requests.
Install Squid, and focus on the classic 'intercept port 80' or 'use a wpad file' and have your browser been 'redirected to a 8080 proxy port'.
This way, you can see how it works in the old days- I guess, for that type of set-up, many examples are available.

When you feel up to it, include 'https' in the addition.

Take note : it's an on-going battle. I wonder how one proxies the sites that have HSTS ( https://fr.wikipedia.org/wiki/HTTP_Strict_Transport_Security - it's yet another anti MITM ) activated. I use HSTS on all my sites.

DaddyGo

@Gertjan said in Squid + https:

Take note : it's an on-going battle. I wonder how one proxies the sites that have HSTS ( https://fr.wikipedia.org/wiki/HTTP_Strict_Transport_Security - it's yet another anti MITM ) activated. I use HSTS on all my sites.

exactly, I referred to problems such as those described by @Gertjan
and then there are the mobile OP systems....etc.

never give up, but I think Squid and similar solutions are coming to an end

or "fine tuning" them? - yes otherwise behind a proxy, WhatsApp, etc. will be dead
(because it is not only one encrypted process going on in the background)

like Apple stuff:

*.phobos.itunes-apple.com.akadns.net
*.gateway.push-apple.com.akadns.net
*.ax.itunes.apple.com
*.mesu.apple.com
*.phobos.apple.com
*.albert.gcsis-apple.com.akadns.net
*.ax.init.itunes.apple.com
*.init.itunes.apple.com
*.oscp.apple.com
*.deploy.static.akamaitechnologies.com
*.itunes.apple.com.edgekey.net
*.swcdn.apple.com
*.swdownload.apple.com
*.swquery.apple.com
*.swscan.apple.com
*.appldnld.apple.com
*.suconfig.apple.com
*.serverstatus.apple.com
*.gs.apple.com
*.apple.com
*.updates.cdn-apple.com

@techtester-m "For the sake of learning let's say there's enough storage, even though an organized NAS would be MUCH better for any business/company. Does squid has the option to set a maximum X G/MB for caching? Sort of a circular caching where a new file would push and old one(s) when that maximum X is met."

just think of a multi-gig Windows update here

and has nothing to do with restrictions, blocking etc.?

don't think of Squid as pfBlockerNG for example
and we didn’t even talk about SquidGuard then

++++edit:
just watch the Squid cache directory...

techtester-m

@Gertjan @DaddyGo Ok...as I see it, the headache that comes with Squid/MITM solutions just in order to cache data, makes it not so worth it. A business would prefer solutions like NAS or a local server with let's say Elasticsearch or something like that. If an employee needs a certain file first search the local storage.

For simply caching web pages it's almost redundant and would make sense only in places without a broadband or for very large organizations.