Any regex for URL and domains for sanitizing blacklists like shallalist etc?

  • Every now and then I update few categories of shallalist's block list and add them as Target Categories into Squidguard, which in turn complains about URLs wouldn't be URLs, like

    "" is not an URL
    "" is not an URL

    I then go and delete/correct these settings manually.

    Does anyone have a regex for use with sed or awk to do that job automatically?

    find * -type f -exec sed -i -r '/([0-9]{1,3}.){3}[0-9]{1,3}/d' {} ;
    deletes URLs with IP addresses, or

    find * -type f -exec sed -i -r ':a;N;$!ba;s/\n/ /g' {} ;
    replaces line break with space.

Log in to reply