In Page filtering (not just ip/URL blacklisting)



  • I am hoping that there is a way to perform the simple task of filtering the code in a web page before it gets to a user.  So far, all I have found is Squid / Squidguard for domain blocking but nothing for filtering a web page directly.

    I used Admuncher years ago as it filtered HTML pages directly allowing me to modify anything in a page very easily. Adblock Plus has this capability but it's difficult to learn and it's only a web filter.  I want this ability in a Firewall  / Proxy server and I have yet to see it anywhere.

    I don't know if this is a functionality that I can add to PFSENSE or not.  I need to filter out the actual code in a webpage (i.e.  custom JS snippets or CSS style, etc etc).

    While blacklisting URLs is fine, the real threat is still in the code in a page.  If I can't block a popup loop in site, then it is no good to me.

    Thanks!



  • This will probably be difficult to find and implement, with the progression to HTTPS, browser-encrypted traffic. You could either

    a) do a man-in-the-middle approach, decrypting and then reencrypting traffic, but you will be in a position to view private information such as credit cards and such. The 'lock' in a users browser will be false.
    b) implement dns-based protection, and get the ad-servers blocked via dns. This is the better approach these days, due to the aforementioned HTTPS standardisation.



  • Thanks but this is exactly why DNS sinkholes are so limited.  HTTP / HTTPS traffic must be filtered properly and not just blocking DNS / IPs which can be easily spoofed / changed / etc.

    Your security at work is meaningless when you can't stop a simple script from taking down your computers.

    So is there any way to setup a MITM (Man in the Middle) with pfsense / squid and filter the source properly?



  • There is/are. I certainly came across a few of them while searching these forums and others for pfSense. I'm afraid I don't have any links for you, though…

    But I will comment on one thing - your MITM approach will likely violate any PCI-DSS or HIPAA security regulations, if you fall under those categories. Just sayin'  ;)



  • Thanks.  As for MITM & HIPPA / PCI-DSS, I would then consider wanting to route that traffic to a private connection that isn't monitored.  I would think think would require knowledge of the sites in question (i.e. working with the business closely).  This shouldn't be too hard if you have a simple policy to approve all internet needs prior to being allowed access.

    If  a user wants to shop online, then they would know we would have their information stored for auditing purposes and thus if this is a problem, then don't shop online.  The site will be red flagged anyways.

    I'm just tired of the unmanaged state of affairs with the way the internet is being deemed too important to filter / manage.  How easy can it be to connect to a remote proxy server / VPN site and 'hide' your actions.

    Having secured transactions is needed for business purposes only, not for normal internet surfing.



  • For filtering a webpage content e2guardian is in the works https://forum.pfsense.org/index.php?topic=87526.0

    For https you can use wpad.
    Sites that have popup loops are normally bad sites which can be blocked by squidguard and a block list.
    You can also enforce google safe search for all clients.

    Here is my current best setup for web filtering.
    https://forum.pfsense.org/index.php?topic=112335.0



  • @aGeekHere:

    For https you can use wpad.
    Sites that have popup loops are normally bad sites which can be blocked by squidguard and a block list.
    You can also enforce google safe search for all clients.

    Not that simple, although this is the beginning of the right approach.

    Explicit (meaning not transparent) proxy is mandatory otherwise HTTPS goes direct.
    (one could intercept even HTTPS in addition to SSL-Bump… this is another approach but not that simple)

    WPAD will, basic, get rid of the burden of manually configuring each and every device so this it relies on proxy. No more (not less)
    Then, second step, proxy.pac content will tell browser when to use proxy (and which ones) or not.

    Next step is proxy configuration:

    • blacklists at Squidguard level will prevent to access unwanted domains (including proxies, redirectors...  :P) and direct IP  ;D ;D
    • page content can not obviously be controlled for HTTPS flow (unless you enable SSL-bump, AKA MITM)
    • ad-removal is partially done with blacklist too

Log in to reply