Content Filtering HTTPS WITHOUT a Proxy?



  • QUESTIONS:

    • Is it possible to build a content filtering package that uses blacklist categories, but instead of a proxy, it automatically manages a group of firewall rules?
    • Is there a package which already does this? especially that I could use on pfSense?
    • Am I misunderstanding something? MUST content filtering rely on a proxy server? Is SonicWall tricking me?

    BACKGROUND:

    I'm trying to show my boss that pfSense works as well or better than SonicWall so that we can start installing pfSense with clients. For some of our sites, we need a firewall that does content filtering (blocking porn, hate speech, violence, ads, etc by target category). SonicWall does this easily. Even without a license for their CF service, which provides the updated category lists, you can still enter a domain in a box and that domain is blocked over HTTP and over HTTPS. For HTTP, there is a redirect to a "blocked" page. For HTTPS there is no redirect, but there is an error, and the attempt is logged regardless of being HTTP or HTTPS.

    SonicWall apparently needs no proxy to do this. While connected through pfSense with Squid and SquidGuard, I used a proxy detection site and it does detect that I'm using a proxy. However, when I connect through the SonicWall with content filtering, no proxy is detected.

    I believe the difference is at what stage the request is checked. With Squid, the URL is sent to the service and Squid then parses it and decides whether to block it. It can only decide based on URL keywords or expressions, though, and by the time it hits Squid the DNS name has already been resolved and the IP address itself forwarded to SquidGuard. By that point, SquidGuard can no longer block an HTTPS request, as it sees the CONNECT method as encrypted. I may not be explaining this quite correctly, but it does appear that there is no way for Squid, as a transparent proxy, to do anything about HTTPS.

    So how does SonicWall do this so easily, without apparently requiring a proxy? My thought is that their lists (IP and domain-name-based, they say) in the blacklist categories are added as automatic firewall rules, thus blocking access at a much lower level than what SquidGuard does.

    I have set up the MiTM SSL-Bump feature and used it successfully, but we can't manage root CAs for our clients' devices, so that is not a solution here. I have not tried WPAD, but I have concerns about smart phones, etc, having problems with it and that still doesn't explain how SonicWall seems to accomplish filtering without WPAD. I have also played with pfBlockerNG and DNSBL, but there are no target categories there, so I consider that lacking for our purposes. I also am aware of OpenDNS. That could work, but it does cost something and if it can work then I don't see why there can't be a DNSBL specifically for content filtering right on the pfSense itself (access for DNS is already locked down so only the resolver on the pfSense can serve DNS queries). If I understand correctly, dansguardian works the same way as SquidGuard, and isn't offered as a package in pfSense 2.3.2. My questions are not so much about how I can apply a work-around, and more about the fundamental structure of the appliance. If there is no non-proxy categorized content filter for pfSense, could one feasibly be built? Is that something that anyone is working on? Applying updatable target categories to pfBlockerNG's DNSBL would be acceptable IMO, since DNS can be enforced. If it can be done then that would make pfSense more competitive with vendors such as SonicWall. I'm not saying that I could write such a package, but I would certainly like to use one if it is available and relatively stable.

    I've done quite a bit of research and testing with this now, but if I have missed something, please point it out.


  • Banned

    Dig into DNS filtering direction (dnsmasq)



  • Thanks for your reply.

    I have looked a bit into dnsmasq, and it appears that it does not support content filtering. I found one tutorial explaining that content filtering can be done with dnsmasq and OpenDNS, but then OpenDNS would work with or without dnsmasq. I think that content filtering at the DNS level is a good way to go, and should do exactly what we want it to do, but there is no package which accomplishes this locally. All I really need here is a way to do content filtering similar to the SonicWall, where no WPAD or MiTM SSL-Bump is necessary, yet HTTPS sites are still blocked (showing an error, not a redirect).

    The only reason I am not yet turning to OpenDNS is that it seems like pfSense should have a way to easily do the same thing locally, telling my boss we need to sign up for OpenDNS to manage categories is not ideal (even though the SonicWall itself requires a subscription for this), and the SonicWall does not need such an additional account – though, maybe it does use something like OpenDNS and they just don't advertise it. I will have to sniff the traffic to see where DNS requests are going in order to figure out if that's how it does it. But, SonicWall also blocks requests based on IP address, which OpenDNS would not do -- turning on "Block IP addresses" in SquidGuard + OpenDNS would work though. Still, this does not appear to be the way SonicWall does it.

    It just annoys me that this setup is so far impossible on pfSense, and I love pfSense -- I'd much rather use it than any other firewall. But if we are to sell pfSense boxes to our clients then we need to retain the same functionality we get with pricy closed-source firewalls. And this seems like it should be simple. I'm not just looking for a work-around, but a real solution. And if there isn't one, maybe there should be... maybe a package could be created that does exactly this job.


  • Banned

    We (diladele.com) were thinking of adding a pfsense package that would do DNS blocking on pfSense level with ability to apply it to some networks/ip subnets/ip addresses only. This would be a golang based DNS server running on pfsense host instead of usual bind. Not sure if this is a good idea though as you seem to be the only one requiring this :)

    Squid is already there and should be able to do the job ideally. Sonic Wall cannot magically block within HTTPS - they just block the connection attempts with TCP resets or so. Without MITM it is not possible to look into the HTTPS content and thus content filtering is not possible too.



  • COntent filtering is filtering the web content itself seems you want oionly url or dns. go dns filtering you can use domains listed in squidguard to make eassier table.



  • Hi sichent,

    Thanks again for the reply. I believe that in pfSense 2.3.2, which I installed recently, unbound DNS Resolver is active by default and bind DNS Forwarder is present, but not active. Would you actually need to build a complete DNS server to block requests at the DNS level? Or could the DNS Resolver unbound be configured / tweaked to pass DNS requests first to a blacklist to see if the request should be blocked? I imagine that if it's decided that something is blocked, the request could be resolved as a "blocked" page, HTTPS or no – not sure if this would break DNSSEC though.

    I also thought it might be possible to skip DNS and instead use IP blacklists based on category (that can be automatically updated) that would automatically manage firewall groups, or aliases which can then easily be blocked in the firewall rules (just add the alias categories you want to block). I don't know how logging would be implemented here.

    The ACLs are not vital for our purposes, but to be robust I would imagine that this would definitely be desired if the rest of it were to be implemented.

    I have seen several people say that squid should be able to block HTTPS requests without WPAD or MiTM, but on further research this appears to not be the case. So, what bothers me is that SonicWall does this in a straight-forward way, without sniffing HTTPS traffic, as you explain. Maybe this really isn't an important feature for most people. Maybe the MiTM stuff is so good for corporate networks that this is sufficient for most. However, I have seen several posts by people asking how to get squidGuard to block HTTPS requests, and the short answer is that it can't. So, I don't think I'm the ONLY one interested in this, but I could still be representative of a minority. In my particular case, I just want to convince my boss that pfSense kicks SonicWall's ass, which it does anyway, though he might not agree.

    I guess that if something like this low-level blacklist were to be built, standard requirements would be:

    • Updateable black lists (shallalist.de, etc) that manage IP aliases (firewall rule-based) or groups of domain names (DNS-based) by target category; a third party would probably be used for the actual lists, and I'm not sure if there are IP- or domain-specific blacklists;
    • Automatically generated firewall rules or DNS redirects to a "blocked" page based on these lists;
    • A way to apply these generated rule-sets based on source IPs or ACLs; if aliases could be managed then rules could be created manually in the firewall IMO, without much hassle;

    Just some ideas... but if this is not economically viable for you guys, if it's not worth your time, then I understand. But I know I'm not the first to bang my head against a wall on this one. I'm not a developer, so maybe there are aspects I'm not realizing here, but I don't know why this would have to be overly complex, I think the most important piece of the puzzle would be getting updated lists to generate firewall rules, or DNS redirect rules, depending on method.

    Long story short, while WPAD and MiTM SSL-Bump may be enough for many people in controlled networks with locked-down devices, and while some may not mind HTTPS at all, I believe the Internet is moving closer to HTTPS as the standard web protocol. As soon as porn sites and so on start employing HSTS so as to bypass these kinds of filters (maybe these already exist) then the content filtering we can do with pfSense will be obsolete, UNLESS we use those far more robust and intrusive features. Right now it's mostly Facebook that people want to block, but specific rules have to be added to block it by IP, and I think that eventually it will be a lot more than just Facebook that will need to be blocked. Do you disagree with this?

    Again, I appreciate your time. I'd like to see a package that does it, but if the ultimate answer is that it's not worth anyone's serious attention because 'better' alternatives already exist, then I can accept that. I do wonder how difficult it would be to script something that would load IP black lists into aliases by category... If I could do that then the rest is trivial.



  • Hi Kababayan,

    "COntent filtering is filtering the web content itself seems you want oionly url or dns. go dns filtering you can use domains listed in squidguard to make eassier table."

    Well, I am using DNSBL in pfBlockerNG, but I don't see any options for using the target categories that SquidGuard downloads. But if these lists could be integrated there then I think this would be a satisfactory solution. Do you have any advice for getting these categories set up under DNSBL / pfBlockerNG? Or would some other tool be needed? Would I need to write a script to accomplish this?



  • The way you describe it is, to me, misleading  ;)
    If you intend to filter based on HTTP content, then there is no way to do it without looking at protocol (here HTTP) in order to examine content and make decision.
    Currently the best way to look at HTTP content is to have proxy in the middle. If goal is to look at HTTPS, you have to break tunnel with SSL-bump (MITM).

    All stuff based on DNS is not bad but purpose is different and can't be stamped as "HTTP content filtering".



  • @chris4916

    True, as I have learned, this is not exactly "Content / Web / HTTP Filtering," it's actually site-blocking. So it can be done on the IP or DNS level. I think URL parsing would actually be content filtering, correct? And as I understand it, the full URL is invisible to a non-MiTM-SSL proxy because it is already encrypted upon arrival.

    I have more or less resolved the issue, though I'm still not 100% satisfied. I now have firewall rules with aliases which I use to block specific sites, such as FaceBook. I include "facebook.com, apps.facebook.com, fb.com, fbcdn.com," etc in the alias and the alias resolves to IPs at a set interval. I block all DNS traffic out except from the pfSense itself, so only it can resolve DNS queries and the IPs it resolves for clients should be the same that it resolves for the aliases. I also have squidGuard turned on for category-based HTTP filtering (actual content filtering), and DNSBL / pfBlockerNG for blocking ads. It would be nice to get the categories of the blacklist in squidGuard (using shallalist.de) into DNSBL also, and block DNS requests by specified category that way, thus covering HTTPS as well as HTTP – kind of annoying that it only applies to URL filtering, and not DNS filtering.

    Thanks to everyone who commented and to Chris and Allan of the TechSNAP podcast who fielded a question of mine regarding this. Their recommendations led to this final setup.



  • @empbac:

    True, as I have learned, this is not exactly "Content / Web / HTTP Filtering," it's actually site-blocking. So it can be done on the IP or DNS level. I think URL parsing would actually be content filtering, correct? And as I understand it, the full URL is invisible to a non-MiTM-SSL proxy because it is already encrypted upon arrival.

    Well, this is not that simple.
    1 - blocking IP may block more than one single "site", e.g. if this IP points to reverse proxy or web server with vhosts. BTW, blocking this IP will also block other protocols. For sure, you may decide that if HTTP content is wrong, then other content should be wrong too. This is basically the blockerNG approach. Why not. But definitely not HTTP filtering  ;)
    2 - When using proxy in explicit mode (not transparent) without MITM, content is within a tunnel, correct, nevertheless, one part of the URL (the left part describing "host") is not so that CONNECT method can apply. This is what is used by HTTP proxy to filter HTTPS "per domain" without having to break HTTPS tunnel

    This means that with explicit proxy, you can easily, without having to maintain IP addresses (because you will have to maintain it BTW), prevent access to HTTPS we sites based on domain name, without DNS-based solution



  • @sichent:

    We (diladele.com) were thinking of adding a pfsense package that would do DNS blocking on pfSense level with ability to apply it to some networks/ip subnets/ip addresses only. This would be a golang based DNS server running on pfsense host instead of usual bind. Not sure if this is a good idea though as you seem to be the only one requiring this :)

    Squid is already there and should be able to do the job ideally. Sonic Wall cannot magically block within HTTPS - they just block the connection attempts with TCP resets or so. Without MITM it is not possible to look into the HTTPS content and thus content filtering is not possible too.

    I am very interested in this.  I'd like to implement some level of HTTPS (and HTTP) blacklisting via domain/IP list (provided via subscription) since I am not willing/able to put up with client configuration of certificates for MITM.  I would think that every small business/organization offering internet would be interested in a simple, somewhat effective solution since it is a significant step up from "zero".

    Untangle has

    [ ] Process HTTPS traffic by SNI (Server Name Indication) information if present
        [ ] Process HTTPS traffic by hostname in server certificat when SNI information not present
            [ ] Process HTPPS traffic by server IP if both SNI and certificate hostname information are not available
    

    and this seems to be the limit of what one can do with HTTPS without MITM.  But, this is "Good Enough" for me.

    I don't know if possible, but adding

    [ ] Enforce safe search on popular search engines
    

    for HTTPS would be great too.

    Thanks.



  • @chris4916

    Yes, the IP-to-multiple-sites scenario is one I hadn't considered previously, but it was mentioned on the TechSNAP podcast. Basically, even if DNS resolves the aliases to IP addresses, this is effectively IP blocking. While this could cause false positives, etc, when restricted to specific sites (ie, FaceBook) this seems to be the most acceptable setup for our needs.

    Your other idea of using an explicit proxy is one I toyed around with as well, setting up WPAD to push out proxy settings. I don't think we can ask or expect our customers (these are Catholic parishes) to set their phones, etc, to use a proxy, and WPAD looks hit or miss depending on the device. I would rather not have to change anything on client devices themselves. Otherwise, for a much more controlled environment, this could be a solid solution.

    @daleq

    A category-based subscription loaded into pfBlockerNG or aliases would seem a good way to do what we want, and it seems that it shouldn't be too difficult to program, but so far the most similar solution I have found is just to manually block aliases over port 443. The alias can be set up with multiple domains, but no wildcards. These names are resolved on a timer, which I have set on mine to 60 seconds.

    While pfSense is a great enterprise-grade firewall, I'd like to see some simple features that allow it to compete with more consumer-grade routers also, maybe even a default config that allows it to work out of the box (on 192.168.1.1 LAN and DHCP on WAN). I don't see any serious security appliances in stores like Best Buy or Microcenter, so this seems to be an untapped market. Unless I'm missing something. I think quite a few people would pay $200+ for a pure firewall/router that works out of the box with simple wizards and configs for people who don't understand networking in any depth. Easy web site blocking would be a great feature to include in such a package.

    I could be mistaken, but I think you can't force safe search over HTTPS, as that would be a form of HTTPS content filtering, which would not work without MiTM-SSL.



  • For sure, if your point is to do it relying on transparent proxy, the only way to do it is to implement SSL Bump (MITM)

    Anything else is only workaround, not HTTP filtering but IP filtering. However if it fits your needs, why not?  ;) In such case, don't bother to implement any HTTP proxy: rely on IP filtering for all protocols  ;)



  • Hi there,

    sorry to revive a post from last year, but I think the question remains and, if some of you are still intrigued with this as I am and trying to find a better solution than proxy for content filtering, I would like to add a few more twists and see if you have ideas to share.

    Building on SonicWall's capabilities, as some of you stated here, I've recently come across a newer implementation of Palo Alto. The guys at Palo Alto are promising to replace proxy servers with "next-generation firewall", App-ID, User-ID and some other commercial names.

    Another software that mentions AppID (OpenAppID), as you know, is Snort, that since some time now have ways of detecting applications based on signature and take action on that, etc.

    Another important aspect, that lead us towards proxy implementations, is the need for authentication. Palo Alto and similar vendors are apparently capable of passively identify users and provide you with AD group-based policies and such, by analyzing authentication processes performed at your domain controller, for example, and by doing so they can link a certain client IP to a user without directly asking said user for explicit authentication.

    So the question is: is there now a way of performing content filtering with pfSense, using any available package, having user/group-based policies, without the need for proxy?
    In other words, is there any progress being made towards what these commercial firewall solutions are claiming to do, in order to have a similar approach in pfSense?

    Some experiences

    We currently have in production both the "just firewall, no proxy at all" and the "explicit proxy with authentication, proxy filter and all the bells and whistles" implementations of pfSense, in different scenarios.

    On the "just firewall" box we've embraced simplicity, and we're are blocking a few websites using firewall aliases. It's some what effective depending on the website. Controlling "which user access what" is not a concern.

    On the "full proxy" box we have URL categories, AD group-based profiles, usage reports per user, custom target categories, etc. Squid and SquidGuard are an effective option for proxy and proxy filter. With automatic proxy discover (WPAD) we don't have problems with standard Internet access using browsers.

    The nightmare

    The problem nowadays is that every single piece of software connects to some destination on the Internet. Developers can now assume that 100% of its costumers have an active Internet connection and with cloud computing this is such a trend.
    The thing is, most applications we use have problems with proxy. Some support proxy, but can't authenticate (like Outlook), some support it, but have to be manually configured (like Skype), and many of them don't even support proxy at all. And I'm not just talking about "home" and "end user" class software like Skype, Dropbox, etc. Many financial and government apps don't support it, and those security plug-ins for internet banking, and most installers (these apps that have lightweight installers and download its content during setup).

    Then you find yourself making exception after exception on your pfSense box in order to accommodate all this non-proxy access.
    Apps that connect to some cloud-based service and some web sites that are hosted on Amazon, for example, are almost impossible to be allowed in a simple firewall rule, because they have hundreds of IP addresses behind them, often on a custom scheme of redundancy (not traditional DNS round robin, for example).

    Some companies I know have given up on controlling Internet access and they've accepted defeat on this, concluding that either "you have a fast internet access with no user complains" or "you have proxy with authentication and a huge administrative effort, with a lot of user complains".



  • The same question here.



  • Plus One !

    An article suggests tha SonicWall also uses MITM to block HTTPS content.

    https://www.sonicwall.com/en-us/support/knowledge-base/170505508942849

    They call DPI-SSL but it seems like a MITM/SSL-Bump solution.

    Regards.


Log in to reply