Content Filtering HTTPS WITHOUT a Proxy?

empbac

Hi Kababayan,

"COntent filtering is filtering the web content itself seems you want oionly url or dns. go dns filtering you can use domains listed in squidguard to make eassier table."

Well, I am using DNSBL in pfBlockerNG, but I don't see any options for using the target categories that SquidGuard downloads. But if these lists could be integrated there then I think this would be a satisfactory solution. Do you have any advice for getting these categories set up under DNSBL / pfBlockerNG? Or would some other tool be needed? Would I need to write a script to accomplish this?

chris4916

The way you describe it is, to me, misleading ;)
If you intend to filter based on HTTP content, then there is no way to do it without looking at protocol (here HTTP) in order to examine content and make decision.
Currently the best way to look at HTTP content is to have proxy in the middle. If goal is to look at HTTPS, you have to break tunnel with SSL-bump (MITM).

All stuff based on DNS is not bad but purpose is different and can't be stamped as "HTTP content filtering".

empbac

@chris4916

True, as I have learned, this is not exactly "Content / Web / HTTP Filtering," it's actually site-blocking. So it can be done on the IP or DNS level. I think URL parsing would actually be content filtering, correct? And as I understand it, the full URL is invisible to a non-MiTM-SSL proxy because it is already encrypted upon arrival.

I have more or less resolved the issue, though I'm still not 100% satisfied. I now have firewall rules with aliases which I use to block specific sites, such as FaceBook. I include "facebook.com, apps.facebook.com, fb.com, fbcdn.com," etc in the alias and the alias resolves to IPs at a set interval. I block all DNS traffic out except from the pfSense itself, so only it can resolve DNS queries and the IPs it resolves for clients should be the same that it resolves for the aliases. I also have squidGuard turned on for category-based HTTP filtering (actual content filtering), and DNSBL / pfBlockerNG for blocking ads. It would be nice to get the categories of the blacklist in squidGuard (using shallalist.de) into DNSBL also, and block DNS requests by specified category that way, thus covering HTTPS as well as HTTP – kind of annoying that it only applies to URL filtering, and not DNS filtering.

Thanks to everyone who commented and to Chris and Allan of the TechSNAP podcast who fielded a question of mine regarding this. Their recommendations led to this final setup.

chris4916

@empbac:

True, as I have learned, this is not exactly "Content / Web / HTTP Filtering," it's actually site-blocking. So it can be done on the IP or DNS level. I think URL parsing would actually be content filtering, correct? And as I understand it, the full URL is invisible to a non-MiTM-SSL proxy because it is already encrypted upon arrival.

Well, this is not that simple.
1 - blocking IP may block more than one single "site", e.g. if this IP points to reverse proxy or web server with vhosts. BTW, blocking this IP will also block other protocols. For sure, you may decide that if HTTP content is wrong, then other content should be wrong too. This is basically the blockerNG approach. Why not. But definitely not HTTP filtering ;)
2 - When using proxy in explicit mode (not transparent) without MITM, content is within a tunnel, correct, nevertheless, one part of the URL (the left part describing "host") is not so that CONNECT method can apply. This is what is used by HTTP proxy to filter HTTPS "per domain" without having to break HTTPS tunnel

This means that with explicit proxy, you can easily, without having to maintain IP addresses (because you will have to maintain it BTW), prevent access to HTTPS we sites based on domain name, without DNS-based solution

daleq

@sichent:

We (diladele.com) were thinking of adding a pfsense package that would do DNS blocking on pfSense level with ability to apply it to some networks/ip subnets/ip addresses only. This would be a golang based DNS server running on pfsense host instead of usual bind. Not sure if this is a good idea though as you seem to be the only one requiring this :)

Squid is already there and should be able to do the job ideally. Sonic Wall cannot magically block within HTTPS - they just block the connection attempts with TCP resets or so. Without MITM it is not possible to look into the HTTPS content and thus content filtering is not possible too.

I am very interested in this. I'd like to implement some level of HTTPS (and HTTP) blacklisting via domain/IP list (provided via subscription) since I am not willing/able to put up with client configuration of certificates for MITM. I would think that every small business/organization offering internet would be interested in a simple, somewhat effective solution since it is a significant step up from "zero".

Untangle has

[ ] Process HTTPS traffic by SNI (Server Name Indication) information if present
    [ ] Process HTTPS traffic by hostname in server certificat when SNI information not present
        [ ] Process HTPPS traffic by server IP if both SNI and certificate hostname information are not available

and this seems to be the limit of what one can do with HTTPS without MITM. But, this is "Good Enough" for me.

I don't know if possible, but adding

[ ] Enforce safe search on popular search engines

for HTTPS would be great too.

Thanks.

empbac

@chris4916

Yes, the IP-to-multiple-sites scenario is one I hadn't considered previously, but it was mentioned on the TechSNAP podcast. Basically, even if DNS resolves the aliases to IP addresses, this is effectively IP blocking. While this could cause false positives, etc, when restricted to specific sites (ie, FaceBook) this seems to be the most acceptable setup for our needs.

Your other idea of using an explicit proxy is one I toyed around with as well, setting up WPAD to push out proxy settings. I don't think we can ask or expect our customers (these are Catholic parishes) to set their phones, etc, to use a proxy, and WPAD looks hit or miss depending on the device. I would rather not have to change anything on client devices themselves. Otherwise, for a much more controlled environment, this could be a solid solution.

@daleq

A category-based subscription loaded into pfBlockerNG or aliases would seem a good way to do what we want, and it seems that it shouldn't be too difficult to program, but so far the most similar solution I have found is just to manually block aliases over port 443. The alias can be set up with multiple domains, but no wildcards. These names are resolved on a timer, which I have set on mine to 60 seconds.

While pfSense is a great enterprise-grade firewall, I'd like to see some simple features that allow it to compete with more consumer-grade routers also, maybe even a default config that allows it to work out of the box (on 192.168.1.1 LAN and DHCP on WAN). I don't see any serious security appliances in stores like Best Buy or Microcenter, so this seems to be an untapped market. Unless I'm missing something. I think quite a few people would pay $200+ for a pure firewall/router that works out of the box with simple wizards and configs for people who don't understand networking in any depth. Easy web site blocking would be a great feature to include in such a package.

I could be mistaken, but I think you can't force safe search over HTTPS, as that would be a form of HTTPS content filtering, which would not work without MiTM-SSL.

chris4916

For sure, if your point is to do it relying on transparent proxy, the only way to do it is to implement SSL Bump (MITM)

Anything else is only workaround, not HTTP filtering but IP filtering. However if it fits your needs, why not? ;) In such case, don't bother to implement any HTTP proxy: rely on IP filtering for all protocols ;)

victorlclopes

Hi there,

sorry to revive a post from last year, but I think the question remains and, if some of you are still intrigued with this as I am and trying to find a better solution than proxy for content filtering, I would like to add a few more twists and see if you have ideas to share.

Building on SonicWall's capabilities, as some of you stated here, I've recently come across a newer implementation of Palo Alto. The guys at Palo Alto are promising to replace proxy servers with "next-generation firewall", App-ID, User-ID and some other commercial names.

Another software that mentions AppID (OpenAppID), as you know, is Snort, that since some time now have ways of detecting applications based on signature and take action on that, etc.

Another important aspect, that lead us towards proxy implementations, is the need for authentication. Palo Alto and similar vendors are apparently capable of passively identify users and provide you with AD group-based policies and such, by analyzing authentication processes performed at your domain controller, for example, and by doing so they can link a certain client IP to a user without directly asking said user for explicit authentication.

So the question is: is there now a way of performing content filtering with pfSense, using any available package, having user/group-based policies, without the need for proxy?
In other words, is there any progress being made towards what these commercial firewall solutions are claiming to do, in order to have a similar approach in pfSense?

Some experiences

We currently have in production both the "just firewall, no proxy at all" and the "explicit proxy with authentication, proxy filter and all the bells and whistles" implementations of pfSense, in different scenarios.

On the "just firewall" box we've embraced simplicity, and we're are blocking a few websites using firewall aliases. It's some what effective depending on the website. Controlling "which user access what" is not a concern.

On the "full proxy" box we have URL categories, AD group-based profiles, usage reports per user, custom target categories, etc. Squid and SquidGuard are an effective option for proxy and proxy filter. With automatic proxy discover (WPAD) we don't have problems with standard Internet access using browsers.

The nightmare

The problem nowadays is that every single piece of software connects to some destination on the Internet. Developers can now assume that 100% of its costumers have an active Internet connection and with cloud computing this is such a trend.
The thing is, most applications we use have problems with proxy. Some support proxy, but can't authenticate (like Outlook), some support it, but have to be manually configured (like Skype), and many of them don't even support proxy at all. And I'm not just talking about "home" and "end user" class software like Skype, Dropbox, etc. Many financial and government apps don't support it, and those security plug-ins for internet banking, and most installers (these apps that have lightweight installers and download its content during setup).

Then you find yourself making exception after exception on your pfSense box in order to accommodate all this non-proxy access.
Apps that connect to some cloud-based service and some web sites that are hosted on Amazon, for example, are almost impossible to be allowed in a simple firewall rule, because they have hundreds of IP addresses behind them, often on a custom scheme of redundancy (not traditional DNS round robin, for example).

Some companies I know have given up on controlling Internet access and they've accepted defeat on this, concluding that either "you have a fast internet access with no user complains" or "you have proxy with authentication and a huge administrative effort, with a lot of user complains".

dexener

The same question here.

wwatanabe

Plus One !

An article suggests tha SonicWall also uses MITM to block HTTPS content.

https://www.sonicwall.com/en-us/support/knowledge-base/170505508942849

They call DPI-SSL but it seems like a MITM/SSL-Bump solution.

Regards.