Any regex for URL and domains for sanitizing blacklists like shallalist etc?



  • Every now and then I update few categories of shallalist's block list and add them as Target Categories into Squidguard, which in turn complains about URLs wouldn't be URLs, like

    "1.2.3.4/xy" is not an URL
    "1.2.3.4:88/xy" is not an URL

    I then go and delete/correct these settings manually.

    Does anyone have a regex for use with sed or awk to do that job automatically?

    Like,
    find * -type f -exec sed -i -r '/([0-9]{1,3}.){3}[0-9]{1,3}/d' {} ;
    deletes URLs with IP addresses, or

    find * -type f -exec sed -i -r ':a;N;$!ba;s/\n/ /g' {} ;
    replaces line break with space.