Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SquidGuard - Local characters in regular expressions - Not supported

    Scheduled Pinned Locked Moved pfSense Packages
    18 Posts 5 Posters 6.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      doktornotor Banned
      last edited by

      The idiotic IDN idea itself left aside, the problem seems to be with:

      • not using CDATA for the field
      • even with that, htmlspecialchars() producing outright broken junk

      @OP: When you look at /conf/config.xml.bad like:

      
      less -N /conf/config.xml.bad
      
      

      and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.

      Normally, you can only use

      < > ' " &
      

      entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).

      1 Reply Last reply Reply Quote 0
      • D
        dvserg
        last edited by

        @doktornotor:

        The idiotic IDN idea itself left aside, the problem seems to be with:

        • not using CDATA for the field
        • even with that, htmlspecialchars() producing outright broken junk

        @OP: When you look at /conf/config.xml.bad like:

        
        less -N /conf/config.xml.bad
        
        

        and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.

        Normally, you can only use

        < > ' " &
        

        entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).

        Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

        SquidGuardDoc EN  RU Tutorial
        Localization ru_PFSense

        1 Reply Last reply Reply Quote 0
        • belleraB
          bellera
          last edited by

          Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

          I'm very sorry, because after surfing a lot about the ñ character I see that squidGuard  doesn't support it.

          Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.

          I think this behaviour could be because they have misconfigured the locale in the server.

          In my old squid+squidGuard server (FreeBSD) I have some rules using ñ and other accent latin characters.

          But this morning I tested it and they doesn't work!

          So, I would like to apologize for the time you devoted to this topic.

          Thanks,

          Josep

          1 Reply Last reply Reply Quote 0
          • D
            doktornotor Banned
            last edited by

            @dvserg:

            Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

            Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?

            @bellera:

            Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.

            Should use the character table equivalent (escaped). Anyway, things like this strongly suggest you should just move to Dansguarding and forget all of this.

            1 Reply Last reply Reply Quote 0
            • D
              doktornotor Banned
              last edited by

              As a sequel to this… so apparently anything outside of ISO 8859-1 charset configured via the web GUI will get screwed by the pfSense on POST (i.e., on saving your config via the GUI). So indeed I'd suggest everyone here to just give up. Any effort here is pretty much wasted until pfSense grows itself a proper Unicode support.

              1 Reply Last reply Reply Quote 0
              • D
                dvserg
                last edited by

                @doktornotor:

                Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?

                URL parametres coded as %AA%BB%CC%20, i think what this is way must use for regular expressions

                SquidGuardDoc EN  RU Tutorial
                Localization ru_PFSense

                1 Reply Last reply Reply Quote 0
                • D
                  doktornotor Banned
                  last edited by

                  Sounds reasonable… Whatever, as said above, without UTF-8 available in the GUI this is pretty much a pointless exercise. :(

                  1 Reply Last reply Reply Quote 0
                  • belleraB
                    bellera
                    last edited by

                    Example: When I search at Google for White Stork in Spanish, latin characters aren't encoded on screen

                    https://www.google.com/webhp?hl=es#hl=es&q=cigüeña
                    

                    However, copying and pasting the URL looks like encoded:

                    https://www.google.com/webhp?hl=es#hl=es&q=cig%C3%BCe%C3%B1a
                    

                    http://en.wikipedia.org/wiki/White_Stork

                    I will try a new time using this encoding in squidGuard expressions, but I think I tried and didn't work.

                    1 Reply Last reply Reply Quote 0
                    • T
                      Tikimotel
                      last edited by

                      I'm just thinking…
                      Is the squid proxy itself set to "encode"? When do the urls get passed through to SquidGuard?

                      What to do with requests that have whitespace characters in the URI

                      strip: The whitespace characters are stripped out of the URL. This is the behavior recommended by RFC2396.
                      deny: The request is denied. The user receives an "Invalid Request" message.
                      
                      allow: The request is allowed and the URI is not changed. The whitespace characters remain in the URI.
                      
                      encode: The request is allowed and the whitespace characters are encoded according to RFC1738.
                      
                      chop:The request is allowed and the URI is chopped at the first whitespace.
                      
                      1 Reply Last reply Reply Quote 0
                      • belleraB
                        bellera
                        last edited by

                        That is only for spaces. I didn't see any more squid directive about other characters.

                        1 Reply Last reply Reply Quote 0
                        • belleraB
                          bellera
                          last edited by

                          No UTF support for perl version in pfSense…

                          http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8

                          Do not use Perl versions prior to 5.8.1. Although support for UTF-8 began with v5.6.0, regular expressions do not work even in the next release, v5.6.1. v5.8.1 added some speed improvements. (By the way, PHP will not have UTF-8 support until v6.0.) By Perl 5.14, Unicode support is for the most part clean and smooth.

                          [2.1-RELEASE][admin@pfsense.localdomain]/root(61): find / -name perl
                          /usr/local/bin/perl
                          /usr/pbi/squid-i386/bin/perl
                          /usr/pbi/squid-i386/lib/perl5/5.16/perl
                          /usr/pbi/squidguard-squid3-i386/bin/perl
                          /usr/pbi/squidguard-squid3-i386/lib/perl5/5.16/perl
                          [2.1-RELEASE][admin@pfsense.localdomain]/root(62): perl -v
                          
                          This is perl 5, version 16, subversion 3 (v5.16.3) built for i386-freebsd-thread-multi-64int
                          

                          http://www.freebsd.org/cgi/ports.cgi?query=squidguard&stype=all

                          squidGuard code it seems to be very old also.

                          For the moment, I will continue using squidGuard knowing this limitation. In the future I will test DansGuardian package.

                          http://contentfilter.futuragts.com/wiki/doku.php?id=language_and_encoding_effects_on_phrase_matching

                          1 Reply Last reply Reply Quote 0
                          • C
                            christian14
                            last edited by

                            I use squid and squidguard since ten years and i never had problems with any characters with squidguard. I discover pfsense and i am  disappointed with accent and special characters in regular expression …A restoration XML is made . Problemes comes from XML file of pfsense (config.xml) and iso instead of utf-8 support like says Doktornotor.

                            :-[ :-[

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.