HTTP filtering based on user agent

  • Hi all

    I have been handed a request to block a big list of user agents (bots) to a web server behind our firewall. I have never done anything like this before so am unsure how to proceed. I think I may need to add the Squid package and use as a reverse proxy but I also keep coming across mention of Snort as well.

    Could someone just point me in the right direction ie. I am not looking for detailed instructions as I am happy to do the research, I'm just not sure about what to use at present.

    Many thanks

  • LAYER 8 Global Moderator

    What is your webserver running.. Its possible to do this with a simple .htaccess, nginx also has simple way to block bots and useragents, etc.

  • Thanks for the response.

    The web server is Apache but the client has said they don't want to do it on the actual server itself.

    So would using nginx be a better solution than using Squid for example ?

  • The web server is Apache but the client has said they don't want to do it on the actual server itself.

    <facepalm>Yes, why take advantage of a built-in capability that could solve the problem in a few lines when you could instead roll out and configure some clunky package to do the same job with the added benefit of increased complexity and an additional point of failure?

    IT would be much easier without users getting in the way  ;D

    Like John said, either Squid or HAProxy, or perhaps Snort.  A reverse proxy is probably lighter than an IDS for this job.</facepalm>

  • LAYER 8 Global Moderator

    Good luck filtering stopping said user agents that are via https this way..

    If your customer is too stupid to do this on their own server, then show him how to do it - this sort of block makes zero sense to do at the firewall.. Now if you were running a load balancer (reverse proxy) and you had say multiple servers behind it serving up content then ok might make sense to filter it at the single point vs having to configure all the different servers, etc.  And if you were offloading the https to the load balancer as well so that it could see the user agents in the https..

    All of the major httpd support this even IIS can do it ;) Doing this at your firewall is the wrong way to go about this..

  • Thanks everyone for their responses.

    I agree and most of my research points to the same thing ie. do it on the web server itself rather than try to offload to a firewall but customer knows how to do that and insists they want to offload it to firewall (I agree about users and getting in the way of IT  :D).

    So since yesterday I have loaded up HA proxy package on a test Pfsense firewall have got it forwarding from an outside VIP to the web server behind the firewall (there is no load balancing going on).

    I have also use a basic acl filter to just test blocking certain IPs and that worked as well but I cannot find a way to have those IPs in a file and get HAProxy to load from a file rather than have to manually enter one by one.

    I need to do this as it is a large list of user agent bots they want to block.

    I know loading from a file can be done but does anyone know if it can be done using the HAProxy package on pfSense rather than have HAProxy loaded on a separate server ?

    Thanks again

  • Just realised there is a separate forum for packages so will ask there about HAProxy setup.

    Many thanks for the help to all.

  • LAYER 8 Global Moderator

    So your customer is not doing anything on https?  They do not listen on it or serve up these pages via https?  Since for you to block it via the proxy your going to break end to end encryption and would be doing mitm.. Which would in theory give you access to all https traffic..

    They are ok with this?

  • Not as far as I am aware no although I will double check with them.

Log in to reply