• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

SquidGuard - Local characters in regular expressions - Not supported

Scheduled Pinned Locked Moved pfSense Packages
18 Posts 5 Posters 6.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B
    bellera
    last edited by Mar 16, 2014, 9:56 PM

    2.1-RELEASE (i386)
    built on Wed Sep 11 18:16:22 EDT 2013
    FreeBSD 8.3-RELEASE-p11

    squidGuard-squid3 1.4_4 pkg v.1.9.5

    Migrating external proxy (FreeBSD based) I found regular expression using local european characters, such ó or ñ

    I put it into pfSense squidGuard and save. I had a pfSense message saying that system is restoring configuration:

    Mar 16 22:39:40 	php: /pkg_edit.php: XML error: Undeclared entity error at line 1043 in /conf/config.xml
    Mar 16 22:39:40 	php: /pkg_edit.php: pfSense is restoring the configuration /cf/conf/backup/config-1395005929.xml
    Mar 16 22:39:40 	php: /pkg_edit.php: New alert found: pfSense is restoring the configuration /cf/conf/backup/config-1395005929.xml
    Mar 16 22:39:40 	check_reload_status: Syncing firewall
    

    Fortunately this didn't cause system reboot and I only loss my regular expression.
    ![Captura de 2014-03-16 22:55:20.png](/public/imported_attachments/1/Captura de 2014-03-16 22:55:20.png)
    ![Captura de 2014-03-16 22:55:20.png_thumb](/public/imported_attachments/1/Captura de 2014-03-16 22:55:20.png_thumb)

    1 Reply Last reply Reply Quote 0
    • D
      dvserg
      last edited by Mar 17, 2014, 5:14 AM

      You should not use national symbols in URL / Expressions. In the HTTP URL must use Lat symbols [a-zA-Z] only.
      All national URLs in the browsers URL automaticly will converted to the Punicode, and SquidGuard sees these puniсode as is too.

      SquidGuardDoc EN  RU Tutorial
      Localization ru_PFSense

      1 Reply Last reply Reply Quote 0
      • B
        bellera
        last edited by Mar 17, 2014, 7:16 AM

        It doesn't work…

        Tried with:

        http://www.charset.org/punycode.php?decoded=coño&encode=Normal+text+to+Punycode#results

        https://www.google.com/webhp?hl=ca#hl=ca&q=coño&safe=active

        Tested with xn–coo-8ma and co%C3%B1o

        Any idea?

        Thanks!

        1 Reply Last reply Reply Quote 0
        • D
          dvserg
          last edited by Mar 17, 2014, 7:25 AM

          @bellera:

          It doesn't work…

          Tried with:

          http://www.charset.org/punycode.php?decoded=coño&encode=Normal+text+to+Punycode#results

          https://www.google.com/webhp?hl=ca#hl=ca&q=coño&safe=active

          Tested with xn–coo-8ma and co%C3%B1o

          Any idea?

          Thanks!

          You can look squid or squidGuard logs to see how this request is really transmitted to the network

          I meant punicodes use for domain part of the URL

          SquidGuardDoc EN  RU Tutorial
          Localization ru_PFSense

          1 Reply Last reply Reply Quote 0
          • B
            bellera
            last edited by Mar 17, 2014, 7:37 PM

            squidGuard uses regex perl.

            So I tried, at console, things like:

            echo "ñ"  | grep -e "\x241"

            echo "ñ" | grep -e "\xF1"

            echo "ñ" | grep -e "\u00F1"

            echo "ñ" | grep -e "\xc3\xb1"

            echo "ñ" | grep -e "%C3%B1"

            echo "ñ" | grep -e "\x{241}"

            without any result.

            My old (FreeBSD) proxy works with ISO8859-15 locale and I have regular expressions with latin characters for squidGuard.

            1 Reply Last reply Reply Quote 0
            • D
              dvserg
              last edited by Mar 18, 2014, 5:03 AM

              @bellera:

              squidGuard uses regex perl.

              So I tried, at console, things like:

              echo "ñ"  | grep -e "\x241"

              echo "ñ" | grep -e "\xF1"

              echo "ñ" | grep -e "\u00F1"

              echo "ñ" | grep -e "\xc3\xb1"

              echo "ñ" | grep -e "%C3%B1"

              echo "ñ" | grep -e "\x{241}"

              without any result.

              My old (FreeBSD) proxy works with ISO8859-15 locale and I have regular expressions with latin characters for squidGuard.

              Browse youtube with you characters and explore squd or squidGuard logs for looking you URLs

              SquidGuardDoc EN  RU Tutorial
              Localization ru_PFSense

              1 Reply Last reply Reply Quote 0
              • D
                doktornotor Banned
                last edited by Mar 18, 2014, 11:33 AM Mar 18, 2014, 11:15 AM

                The idiotic IDN idea itself left aside, the problem seems to be with:

                • not using CDATA for the field
                • even with that, htmlspecialchars() producing outright broken junk

                @OP: When you look at /conf/config.xml.bad like:

                
                less -N /conf/config.xml.bad
                
                

                and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.

                Normally, you can only use

                < > ' " &
                

                entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).

                1 Reply Last reply Reply Quote 0
                • D
                  dvserg
                  last edited by Mar 18, 2014, 12:58 PM

                  @doktornotor:

                  The idiotic IDN idea itself left aside, the problem seems to be with:

                  • not using CDATA for the field
                  • even with that, htmlspecialchars() producing outright broken junk

                  @OP: When you look at /conf/config.xml.bad like:

                  
                  less -N /conf/config.xml.bad
                  
                  

                  and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.

                  Normally, you can only use

                  < > ' " &
                  

                  entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).

                  Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

                  SquidGuardDoc EN  RU Tutorial
                  Localization ru_PFSense

                  1 Reply Last reply Reply Quote 0
                  • B
                    bellera
                    last edited by Mar 18, 2014, 1:20 PM

                    Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

                    I'm very sorry, because after surfing a lot about the ñ character I see that squidGuard  doesn't support it.

                    Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.

                    I think this behaviour could be because they have misconfigured the locale in the server.

                    In my old squid+squidGuard server (FreeBSD) I have some rules using ñ and other accent latin characters.

                    But this morning I tested it and they doesn't work!

                    So, I would like to apologize for the time you devoted to this topic.

                    Thanks,

                    Josep

                    1 Reply Last reply Reply Quote 0
                    • D
                      doktornotor Banned
                      last edited by Mar 18, 2014, 2:58 PM

                      @dvserg:

                      Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

                      Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?

                      @bellera:

                      Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.

                      Should use the character table equivalent (escaped). Anyway, things like this strongly suggest you should just move to Dansguarding and forget all of this.

                      1 Reply Last reply Reply Quote 0
                      • D
                        doktornotor Banned
                        last edited by Mar 18, 2014, 3:22 PM

                        As a sequel to this… so apparently anything outside of ISO 8859-1 charset configured via the web GUI will get screwed by the pfSense on POST (i.e., on saving your config via the GUI). So indeed I'd suggest everyone here to just give up. Any effort here is pretty much wasted until pfSense grows itself a proper Unicode support.

                        1 Reply Last reply Reply Quote 0
                        • D
                          dvserg
                          last edited by Mar 18, 2014, 4:45 PM

                          @doktornotor:

                          Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?

                          URL parametres coded as %AA%BB%CC%20, i think what this is way must use for regular expressions

                          SquidGuardDoc EN  RU Tutorial
                          Localization ru_PFSense

                          1 Reply Last reply Reply Quote 0
                          • D
                            doktornotor Banned
                            last edited by Mar 18, 2014, 5:00 PM

                            Sounds reasonable… Whatever, as said above, without UTF-8 available in the GUI this is pretty much a pointless exercise. :(

                            1 Reply Last reply Reply Quote 0
                            • B
                              bellera
                              last edited by Mar 18, 2014, 5:57 PM

                              Example: When I search at Google for White Stork in Spanish, latin characters aren't encoded on screen

                              https://www.google.com/webhp?hl=es#hl=es&q=cigüeña
                              

                              However, copying and pasting the URL looks like encoded:

                              https://www.google.com/webhp?hl=es#hl=es&q=cig%C3%BCe%C3%B1a
                              

                              http://en.wikipedia.org/wiki/White_Stork

                              I will try a new time using this encoding in squidGuard expressions, but I think I tried and didn't work.

                              1 Reply Last reply Reply Quote 0
                              • T
                                Tikimotel
                                last edited by Mar 18, 2014, 9:14 PM

                                I'm just thinking…
                                Is the squid proxy itself set to "encode"? When do the urls get passed through to SquidGuard?

                                What to do with requests that have whitespace characters in the URI

                                strip: The whitespace characters are stripped out of the URL. This is the behavior recommended by RFC2396.
                                deny: The request is denied. The user receives an "Invalid Request" message.
                                
                                allow: The request is allowed and the URI is not changed. The whitespace characters remain in the URI.
                                
                                encode: The request is allowed and the whitespace characters are encoded according to RFC1738.
                                
                                chop:The request is allowed and the URI is chopped at the first whitespace.
                                
                                1 Reply Last reply Reply Quote 0
                                • B
                                  bellera
                                  last edited by Mar 18, 2014, 11:43 PM

                                  That is only for spaces. I didn't see any more squid directive about other characters.

                                  1 Reply Last reply Reply Quote 0
                                  • B
                                    bellera
                                    last edited by Mar 19, 2014, 12:14 AM

                                    No UTF support for perl version in pfSense…

                                    http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8

                                    Do not use Perl versions prior to 5.8.1. Although support for UTF-8 began with v5.6.0, regular expressions do not work even in the next release, v5.6.1. v5.8.1 added some speed improvements. (By the way, PHP will not have UTF-8 support until v6.0.) By Perl 5.14, Unicode support is for the most part clean and smooth.

                                    [2.1-RELEASE][admin@pfsense.localdomain]/root(61): find / -name perl
                                    /usr/local/bin/perl
                                    /usr/pbi/squid-i386/bin/perl
                                    /usr/pbi/squid-i386/lib/perl5/5.16/perl
                                    /usr/pbi/squidguard-squid3-i386/bin/perl
                                    /usr/pbi/squidguard-squid3-i386/lib/perl5/5.16/perl
                                    [2.1-RELEASE][admin@pfsense.localdomain]/root(62): perl -v
                                    
                                    This is perl 5, version 16, subversion 3 (v5.16.3) built for i386-freebsd-thread-multi-64int
                                    

                                    http://www.freebsd.org/cgi/ports.cgi?query=squidguard&stype=all

                                    squidGuard code it seems to be very old also.

                                    For the moment, I will continue using squidGuard knowing this limitation. In the future I will test DansGuardian package.

                                    http://contentfilter.futuragts.com/wiki/doku.php?id=language_and_encoding_effects_on_phrase_matching

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      christian14
                                      last edited by May 13, 2014, 7:28 AM May 12, 2014, 8:38 PM

                                      I use squid and squidguard since ten years and i never had problems with any characters with squidguard. I discover pfsense and i am  disappointed with accent and special characters in regular expression …A restoration XML is made . Problemes comes from XML file of pfsense (config.xml) and iso instead of utf-8 support like says Doktornotor.

                                      :-[ :-[

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                        [[user:consent.lead]]
                                        [[user:consent.not_received]]