Content Filtering - implementation thread to support bounty posted.

doc_holiday

This is a thread to discuss the current bounty on Content Filtering: http://forum.pfsense.org/index.php/topic,2703.0.html

Today, I put up Dansguardian on a Linux box. My preference would be for it to be installed on my PFsense box, but I want to make sure that I am happy before trying a mod. Besides, it is beyond my skillset to fully integrate it into PFsense, hence my support of the bounty.

I learned a few interesting things today though. There seems to me to be two clear options to implement this. One would be through DansGuardian. Unfortunately this has restrictive licensing agreements I am told. However, what is to prevent someone from creating a plug for DansGuardian to be added to PFsense? I just downloaded it and installed it on a Linux box, hence my question.

The other option is squidguard. It seems you need "Shalla's url blacklist", but interesting enough there is also a commercial source which requires a subscription for http://urlblacklist.com/?sec=home This is the same data that DansGuardian uses.

I don't know a lot about m0n0wall, but the little that I have read up on this, putting this into an embedded distribution is going to be quite a feat. It would seem to me that the simplest route would be that CF is an option for generic PC installs as an initial starting point.

Comments?

Guest

Just some background information on both of these solutions for you to consider:

SquidGuard: this is a redirector for squid, it is limited to blocking http traffic based solely on whether or not a match occurs against the requested URL. This solution is extremely fast (compared to DansGuardian) but has the limitation of being only able to filter based on the URL. Also, squidGuard is not under active development and hasn't been for many years. There are a number of unpatched bugs and limitations which are being worked on by a couple of forks, none of which is really ready for prime-time yet.

DansGuardian: this is a full http proxy, meant to be used as a peer to squid. It has a lot more full-featured scanning features like scanning webpage elements for viruii and doing filtering based on key words found in the contents of the web page itself. Aside from the very odd and confusing licensing, the limitations of DansGuardian are that its pretty slow when you're doing full content inspection and that there are a variety of ways to configure it to work with squid. The seeming preferred method is to have two separate instances of squid running with DansGuardian in between (this is taken from the squid mailing list, which I encourage you to read).

Both solutions are pretty memory intensive and have their individual limitations. Obviously DansGuardian has nicer features, but its a pretty painful thing to set up properly. Given all the problems that the squid package, by itself, has had, doing a one-size-fits-all setup with squid+dansguardian is going to be pretty damned hard.

doc_holiday

Yes, my tests on my box show this is intensive. I have it running on a PII-400 with 378mb ram and its no speed demon. (especially since this is all it is doing pretty much) I can easily see how if you wanted to throw a real load at this you would need some serious beef behind it.

Regardless, your points are well taken. The more I work on this, the more I realise this is fairly complex. BTW, webmin has a nice module for tweaking DansGuardian at the moment.

@submicron:

Given all the problems that the squid package, by itself, has had, doing a one-size-fits-all setup with squid+dansguardian is going to be pretty damned hard.

The bounty is now $300 for a generic pc install, plus an additional $400 for an embedded version. It's obvious it isn't going to be easy because no one is jumping at it! ;)

Guest

Yeah, Content Filtering isn't exactly sexy or interesting and the chances of this turning into a major support nightmare are pretty strong. As I understand it, the squid package annoys Scott immensely, so you can imagine squid coupled with something like Dansguardian.

dvserg

I for their own necessities has maked this
http://forum.pfsense.org/index.php/topic,3111.0.html
I do not know as this place to project (if this it is necessary to any body)

doc_holiday

@dvserg:

I for their own necessities has maked this
http://forum.pfsense.org/index.php/topic,3111.0.html
I do not know as this place to project (if this it is necessary to any body)

Very interesting. Thanks for sharing your work!

databeestje

I am not quite sure I want to go the route of dansguardian yet. That and I have other more pressing issues.

I personally think squid auth is more important then filtering. That's because of personal and work bias. I have a working squid with auth installation on linux, so if I can migrate it to pfsense in a workable fashion that gets priority.

And my company doesn't bother with filtering. Most dutch people are probably to liberal that the filter would likely end up annoying a lot of people.

Squid with auth means people are not anonymous whilst accessing whatever they normally do. It's not untill a manager puts in a request that we even look at it.

DignionASP

Content Filtering can also be done in a more easy way. I know that DansGuardian is a complete and automatic solution but implementing it on a CF is probably not the fastest way to have that option in the embedded version. So… let's look in a different way. At home I have a normal Netgear router. Although verry basic, it has an 'content filter' and a service (port) filter). The content filter is just a tekst box where you can put in words en webadresses. It is time based and you can exclude one ipadres. The service blocker is a port filter and you can block a specific service (https, telnet, Quake, NetMeetink, or numberd tcp, udp, tcp/udp, etc....), is also time-based and ip adres/range based. When somebody try to visit a forbiden site the get a nice message in the explorer.
I think my Soekris 4601 must be powerfull enough to support an option like this and a bit more expanded. Multiple time-schedules, more ipadresses to exclude /include, an option to block all trafic execept a list of specific websites and services (ports). Editable message (html based). The netgear router does also have an option to mail a daily report who violated the rules.

I believe that this is a much easier way of contentblocking, faster to implement and a lot quicker available. More importand, enough functionality to support home and small offices.

Who wants to pick up this quest and can come-up with a working solution in let's say 2 month's or so.

Donation 175$

jeroen234

that looks to me a lot more work then just make a webinterface for a existing program like dansguardian

rwalker

Dansgaurdian is not as nice as URLfilter - http://www.urlfilter.net/. I have used both on IPCop before and URLfilter is much easier and seems faster. At least on the same hardware it seemed to not load it down as much. Also there are some nice additional add-ons that go with it.

dvserg

Possible make several alternative packages, but I do not see as created packages possible to install without change(modification) the code pfsense.
May be a developers will add possibility to indicate other catalogues(directories) a package from GUI?

dvserg

Once more redirector for squid _http://www.rejik.ru/index_en.html
Very popular redirector project in .RU zone

jonesboy

i use squid as my content filter already.. I believe the biggest issue would be to build a gui for it.. just do a search on bsd + transparent squid proxy…