Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SquidGuard - Local characters in regular expressions - Not supported

    Scheduled Pinned Locked Moved pfSense Packages
    18 Posts 5 Posters 6.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      dvserg
      last edited by

      @bellera:

      It doesn't work…

      Tried with:

      http://www.charset.org/punycode.php?decoded=coño&encode=Normal+text+to+Punycode#results

      https://www.google.com/webhp?hl=ca#hl=ca&q=coño&safe=active

      Tested with xn–coo-8ma and co%C3%B1o

      Any idea?

      Thanks!

      You can look squid or squidGuard logs to see how this request is really transmitted to the network

      I meant punicodes use for domain part of the URL

      SquidGuardDoc EN  RU Tutorial
      Localization ru_PFSense

      1 Reply Last reply Reply Quote 0
      • belleraB
        bellera
        last edited by

        squidGuard uses regex perl.

        So I tried, at console, things like:

        echo "ñ"  | grep -e "\x241"

        echo "ñ" | grep -e "\xF1"

        echo "ñ" | grep -e "\u00F1"

        echo "ñ" | grep -e "\xc3\xb1"

        echo "ñ" | grep -e "%C3%B1"

        echo "ñ" | grep -e "\x{241}"

        without any result.

        My old (FreeBSD) proxy works with ISO8859-15 locale and I have regular expressions with latin characters for squidGuard.

        1 Reply Last reply Reply Quote 0
        • D
          dvserg
          last edited by

          @bellera:

          squidGuard uses regex perl.

          So I tried, at console, things like:

          echo "ñ"  | grep -e "\x241"

          echo "ñ" | grep -e "\xF1"

          echo "ñ" | grep -e "\u00F1"

          echo "ñ" | grep -e "\xc3\xb1"

          echo "ñ" | grep -e "%C3%B1"

          echo "ñ" | grep -e "\x{241}"

          without any result.

          My old (FreeBSD) proxy works with ISO8859-15 locale and I have regular expressions with latin characters for squidGuard.

          Browse youtube with you characters and explore squd or squidGuard logs for looking you URLs

          SquidGuardDoc EN  RU Tutorial
          Localization ru_PFSense

          1 Reply Last reply Reply Quote 0
          • D
            doktornotor Banned
            last edited by

            The idiotic IDN idea itself left aside, the problem seems to be with:

            • not using CDATA for the field
            • even with that, htmlspecialchars() producing outright broken junk

            @OP: When you look at /conf/config.xml.bad like:

            
            less -N /conf/config.xml.bad
            
            

            and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.

            Normally, you can only use

            < > ' " &
            

            entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).

            1 Reply Last reply Reply Quote 0
            • D
              dvserg
              last edited by

              @doktornotor:

              The idiotic IDN idea itself left aside, the problem seems to be with:

              • not using CDATA for the field
              • even with that, htmlspecialchars() producing outright broken junk

              @OP: When you look at /conf/config.xml.bad like:

              
              less -N /conf/config.xml.bad
              
              

              and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.

              Normally, you can only use

              < > ' " &
              

              entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).

              Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

              SquidGuardDoc EN  RU Tutorial
              Localization ru_PFSense

              1 Reply Last reply Reply Quote 0
              • belleraB
                bellera
                last edited by

                Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

                I'm very sorry, because after surfing a lot about the ñ character I see that squidGuard  doesn't support it.

                Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.

                I think this behaviour could be because they have misconfigured the locale in the server.

                In my old squid+squidGuard server (FreeBSD) I have some rules using ñ and other accent latin characters.

                But this morning I tested it and they doesn't work!

                So, I would like to apologize for the time you devoted to this topic.

                Thanks,

                Josep

                1 Reply Last reply Reply Quote 0
                • D
                  doktornotor Banned
                  last edited by

                  @dvserg:

                  Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

                  Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?

                  @bellera:

                  Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.

                  Should use the character table equivalent (escaped). Anyway, things like this strongly suggest you should just move to Dansguarding and forget all of this.

                  1 Reply Last reply Reply Quote 0
                  • D
                    doktornotor Banned
                    last edited by

                    As a sequel to this… so apparently anything outside of ISO 8859-1 charset configured via the web GUI will get screwed by the pfSense on POST (i.e., on saving your config via the GUI). So indeed I'd suggest everyone here to just give up. Any effort here is pretty much wasted until pfSense grows itself a proper Unicode support.

                    1 Reply Last reply Reply Quote 0
                    • D
                      dvserg
                      last edited by

                      @doktornotor:

                      Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?

                      URL parametres coded as %AA%BB%CC%20, i think what this is way must use for regular expressions

                      SquidGuardDoc EN  RU Tutorial
                      Localization ru_PFSense

                      1 Reply Last reply Reply Quote 0
                      • D
                        doktornotor Banned
                        last edited by

                        Sounds reasonable… Whatever, as said above, without UTF-8 available in the GUI this is pretty much a pointless exercise. :(

                        1 Reply Last reply Reply Quote 0
                        • belleraB
                          bellera
                          last edited by

                          Example: When I search at Google for White Stork in Spanish, latin characters aren't encoded on screen

                          https://www.google.com/webhp?hl=es#hl=es&q=cigüeña
                          

                          However, copying and pasting the URL looks like encoded:

                          https://www.google.com/webhp?hl=es#hl=es&q=cig%C3%BCe%C3%B1a
                          

                          http://en.wikipedia.org/wiki/White_Stork

                          I will try a new time using this encoding in squidGuard expressions, but I think I tried and didn't work.

                          1 Reply Last reply Reply Quote 0
                          • T
                            Tikimotel
                            last edited by

                            I'm just thinking…
                            Is the squid proxy itself set to "encode"? When do the urls get passed through to SquidGuard?

                            What to do with requests that have whitespace characters in the URI

                            strip: The whitespace characters are stripped out of the URL. This is the behavior recommended by RFC2396.
                            deny: The request is denied. The user receives an "Invalid Request" message.
                            
                            allow: The request is allowed and the URI is not changed. The whitespace characters remain in the URI.
                            
                            encode: The request is allowed and the whitespace characters are encoded according to RFC1738.
                            
                            chop:The request is allowed and the URI is chopped at the first whitespace.
                            
                            1 Reply Last reply Reply Quote 0
                            • belleraB
                              bellera
                              last edited by

                              That is only for spaces. I didn't see any more squid directive about other characters.

                              1 Reply Last reply Reply Quote 0
                              • belleraB
                                bellera
                                last edited by

                                No UTF support for perl version in pfSense…

                                http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8

                                Do not use Perl versions prior to 5.8.1. Although support for UTF-8 began with v5.6.0, regular expressions do not work even in the next release, v5.6.1. v5.8.1 added some speed improvements. (By the way, PHP will not have UTF-8 support until v6.0.) By Perl 5.14, Unicode support is for the most part clean and smooth.

                                [2.1-RELEASE][admin@pfsense.localdomain]/root(61): find / -name perl
                                /usr/local/bin/perl
                                /usr/pbi/squid-i386/bin/perl
                                /usr/pbi/squid-i386/lib/perl5/5.16/perl
                                /usr/pbi/squidguard-squid3-i386/bin/perl
                                /usr/pbi/squidguard-squid3-i386/lib/perl5/5.16/perl
                                [2.1-RELEASE][admin@pfsense.localdomain]/root(62): perl -v
                                
                                This is perl 5, version 16, subversion 3 (v5.16.3) built for i386-freebsd-thread-multi-64int
                                

                                http://www.freebsd.org/cgi/ports.cgi?query=squidguard&stype=all

                                squidGuard code it seems to be very old also.

                                For the moment, I will continue using squidGuard knowing this limitation. In the future I will test DansGuardian package.

                                http://contentfilter.futuragts.com/wiki/doku.php?id=language_and_encoding_effects_on_phrase_matching

                                1 Reply Last reply Reply Quote 0
                                • C
                                  christian14
                                  last edited by

                                  I use squid and squidguard since ten years and i never had problems with any characters with squidguard. I discover pfsense and i am  disappointed with accent and special characters in regular expression …A restoration XML is made . Problemes comes from XML file of pfsense (config.xml) and iso instead of utf-8 support like says Doktornotor.

                                  :-[ :-[

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.