SquidGuard - Local characters in regular expressions - Not supported
-
2.1-RELEASE (i386)
built on Wed Sep 11 18:16:22 EDT 2013
FreeBSD 8.3-RELEASE-p11squidGuard-squid3 1.4_4 pkg v.1.9.5
Migrating external proxy (FreeBSD based) I found regular expression using local european characters, such ó or ñ
I put it into pfSense squidGuard and save. I had a pfSense message saying that system is restoring configuration:
Mar 16 22:39:40 php: /pkg_edit.php: XML error: Undeclared entity error at line 1043 in /conf/config.xml Mar 16 22:39:40 php: /pkg_edit.php: pfSense is restoring the configuration /cf/conf/backup/config-1395005929.xml Mar 16 22:39:40 php: /pkg_edit.php: New alert found: pfSense is restoring the configuration /cf/conf/backup/config-1395005929.xml Mar 16 22:39:40 check_reload_status: Syncing firewall
Fortunately this didn't cause system reboot and I only loss my regular expression.
![Captura de 2014-03-16 22:55:20.png](/public/imported_attachments/1/Captura de 2014-03-16 22:55:20.png)
![Captura de 2014-03-16 22:55:20.png_thumb](/public/imported_attachments/1/Captura de 2014-03-16 22:55:20.png_thumb) -
You should not use national symbols in URL / Expressions. In the HTTP URL must use Lat symbols [a-zA-Z] only.
All national URLs in the browsers URL automaticly will converted to the Punicode, and SquidGuard sees these puniсode as is too. -
It doesn't work…
Tried with:
http://www.charset.org/punycode.php?decoded=coño&encode=Normal+text+to+Punycode#results
https://www.google.com/webhp?hl=ca#hl=ca&q=coño&safe=active
Tested with xn–coo-8ma and co%C3%B1o
Any idea?
Thanks!
-
It doesn't work…
Tried with:
http://www.charset.org/punycode.php?decoded=coño&encode=Normal+text+to+Punycode#results
https://www.google.com/webhp?hl=ca#hl=ca&q=coño&safe=active
Tested with xn–coo-8ma and co%C3%B1o
Any idea?
Thanks!
You can look squid or squidGuard logs to see how this request is really transmitted to the network
I meant punicodes use for domain part of the URL
-
squidGuard uses regex perl.
So I tried, at console, things like:
echo "ñ" | grep -e "\x241"
echo "ñ" | grep -e "\xF1"
echo "ñ" | grep -e "\u00F1"
echo "ñ" | grep -e "\xc3\xb1"
echo "ñ" | grep -e "%C3%B1"
echo "ñ" | grep -e "\x{241}"
without any result.
My old (FreeBSD) proxy works with ISO8859-15 locale and I have regular expressions with latin characters for squidGuard.
-
squidGuard uses regex perl.
So I tried, at console, things like:
echo "ñ" | grep -e "\x241"
echo "ñ" | grep -e "\xF1"
echo "ñ" | grep -e "\u00F1"
echo "ñ" | grep -e "\xc3\xb1"
echo "ñ" | grep -e "%C3%B1"
echo "ñ" | grep -e "\x{241}"
without any result.
My old (FreeBSD) proxy works with ISO8859-15 locale and I have regular expressions with latin characters for squidGuard.
Browse youtube with you characters and explore squd or squidGuard logs for looking you URLs
-
The idiotic IDN idea itself left aside, the problem seems to be with:
- not using CDATA for the field
- even with that, htmlspecialchars() producing outright broken junk
@OP: When you look at /conf/config.xml.bad like:
less -N /conf/config.xml.bad
and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.
Normally, you can only use
< > ' " &
entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).
-
The idiotic IDN idea itself left aside, the problem seems to be with:
- not using CDATA for the field
- even with that, htmlspecialchars() producing outright broken junk
@OP: When you look at /conf/config.xml.bad like:
less -N /conf/config.xml.bad
and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.
Normally, you can only use
< > ' " &
entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).
Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.
-
Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.
I'm very sorry, because after surfing a lot about the ñ character I see that squidGuard doesn't support it.
Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.
I think this behaviour could be because they have misconfigured the locale in the server.
In my old squid+squidGuard server (FreeBSD) I have some rules using ñ and other accent latin characters.
But this morning I tested it and they doesn't work!
So, I would like to apologize for the time you devoted to this topic.
Thanks,
Josep
-
Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.
Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?
Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.
Should use the character table equivalent (escaped). Anyway, things like this strongly suggest you should just move to Dansguarding and forget all of this.
-
As a sequel to this… so apparently anything outside of ISO 8859-1 charset configured via the web GUI will get screwed by the pfSense on POST (i.e., on saving your config via the GUI). So indeed I'd suggest everyone here to just give up. Any effort here is pretty much wasted until pfSense grows itself a proper Unicode support.
-
Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?
URL parametres coded as %AA%BB%CC%20, i think what this is way must use for regular expressions
-
Sounds reasonable… Whatever, as said above, without UTF-8 available in the GUI this is pretty much a pointless exercise. :(
-
Example: When I search at Google for White Stork in Spanish, latin characters aren't encoded on screen
https://www.google.com/webhp?hl=es#hl=es&q=cigüeña
However, copying and pasting the URL looks like encoded:
https://www.google.com/webhp?hl=es#hl=es&q=cig%C3%BCe%C3%B1a
http://en.wikipedia.org/wiki/White_Stork
I will try a new time using this encoding in squidGuard expressions, but I think I tried and didn't work.
-
I'm just thinking…
Is the squid proxy itself set to "encode"? When do the urls get passed through to SquidGuard?What to do with requests that have whitespace characters in the URI
strip: The whitespace characters are stripped out of the URL. This is the behavior recommended by RFC2396. deny: The request is denied. The user receives an "Invalid Request" message. allow: The request is allowed and the URI is not changed. The whitespace characters remain in the URI. encode: The request is allowed and the whitespace characters are encoded according to RFC1738. chop:The request is allowed and the URI is chopped at the first whitespace.
-
That is only for spaces. I didn't see any more squid directive about other characters.
-
No UTF support for perl version in pfSense…
http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8
Do not use Perl versions prior to 5.8.1. Although support for UTF-8 began with v5.6.0, regular expressions do not work even in the next release, v5.6.1. v5.8.1 added some speed improvements. (By the way, PHP will not have UTF-8 support until v6.0.) By Perl 5.14, Unicode support is for the most part clean and smooth.
[2.1-RELEASE][admin@pfsense.localdomain]/root(61): find / -name perl /usr/local/bin/perl /usr/pbi/squid-i386/bin/perl /usr/pbi/squid-i386/lib/perl5/5.16/perl /usr/pbi/squidguard-squid3-i386/bin/perl /usr/pbi/squidguard-squid3-i386/lib/perl5/5.16/perl [2.1-RELEASE][admin@pfsense.localdomain]/root(62): perl -v This is perl 5, version 16, subversion 3 (v5.16.3) built for i386-freebsd-thread-multi-64int
http://www.freebsd.org/cgi/ports.cgi?query=squidguard&stype=all
squidGuard code it seems to be very old also.
For the moment, I will continue using squidGuard knowing this limitation. In the future I will test DansGuardian package.
http://contentfilter.futuragts.com/wiki/doku.php?id=language_and_encoding_effects_on_phrase_matching
-
I use squid and squidguard since ten years and i never had problems with any characters with squidguard. I discover pfsense and i am disappointed with accent and special characters in regular expression …A restoration XML is made . Problemes comes from XML file of pfsense (config.xml) and iso instead of utf-8 support like says Doktornotor.
:-[ :-[