Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SquidGuard - Local characters in regular expressions - Not supported

    Scheduled Pinned Locked Moved pfSense Packages
    18 Posts 5 Posters 6.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • belleraB
      bellera
      last edited by

      2.1-RELEASE (i386)
      built on Wed Sep 11 18:16:22 EDT 2013
      FreeBSD 8.3-RELEASE-p11

      squidGuard-squid3 1.4_4 pkg v.1.9.5

      Migrating external proxy (FreeBSD based) I found regular expression using local european characters, such ó or ñ

      I put it into pfSense squidGuard and save. I had a pfSense message saying that system is restoring configuration:

      Mar 16 22:39:40 	php: /pkg_edit.php: XML error: Undeclared entity error at line 1043 in /conf/config.xml
      Mar 16 22:39:40 	php: /pkg_edit.php: pfSense is restoring the configuration /cf/conf/backup/config-1395005929.xml
      Mar 16 22:39:40 	php: /pkg_edit.php: New alert found: pfSense is restoring the configuration /cf/conf/backup/config-1395005929.xml
      Mar 16 22:39:40 	check_reload_status: Syncing firewall
      

      Fortunately this didn't cause system reboot and I only loss my regular expression.
      ![Captura de 2014-03-16 22:55:20.png](/public/imported_attachments/1/Captura de 2014-03-16 22:55:20.png)
      ![Captura de 2014-03-16 22:55:20.png_thumb](/public/imported_attachments/1/Captura de 2014-03-16 22:55:20.png_thumb)

      1 Reply Last reply Reply Quote 0
      • D
        dvserg
        last edited by

        You should not use national symbols in URL / Expressions. In the HTTP URL must use Lat symbols [a-zA-Z] only.
        All national URLs in the browsers URL automaticly will converted to the Punicode, and SquidGuard sees these puniсode as is too.

        SquidGuardDoc EN  RU Tutorial
        Localization ru_PFSense

        1 Reply Last reply Reply Quote 0
        • belleraB
          bellera
          last edited by

          It doesn't work…

          Tried with:

          http://www.charset.org/punycode.php?decoded=coño&encode=Normal+text+to+Punycode#results

          https://www.google.com/webhp?hl=ca#hl=ca&q=coño&safe=active

          Tested with xn–coo-8ma and co%C3%B1o

          Any idea?

          Thanks!

          1 Reply Last reply Reply Quote 0
          • D
            dvserg
            last edited by

            @bellera:

            It doesn't work…

            Tried with:

            http://www.charset.org/punycode.php?decoded=coño&encode=Normal+text+to+Punycode#results

            https://www.google.com/webhp?hl=ca#hl=ca&q=coño&safe=active

            Tested with xn–coo-8ma and co%C3%B1o

            Any idea?

            Thanks!

            You can look squid or squidGuard logs to see how this request is really transmitted to the network

            I meant punicodes use for domain part of the URL

            SquidGuardDoc EN  RU Tutorial
            Localization ru_PFSense

            1 Reply Last reply Reply Quote 0
            • belleraB
              bellera
              last edited by

              squidGuard uses regex perl.

              So I tried, at console, things like:

              echo "ñ"  | grep -e "\x241"

              echo "ñ" | grep -e "\xF1"

              echo "ñ" | grep -e "\u00F1"

              echo "ñ" | grep -e "\xc3\xb1"

              echo "ñ" | grep -e "%C3%B1"

              echo "ñ" | grep -e "\x{241}"

              without any result.

              My old (FreeBSD) proxy works with ISO8859-15 locale and I have regular expressions with latin characters for squidGuard.

              1 Reply Last reply Reply Quote 0
              • D
                dvserg
                last edited by

                @bellera:

                squidGuard uses regex perl.

                So I tried, at console, things like:

                echo "ñ"  | grep -e "\x241"

                echo "ñ" | grep -e "\xF1"

                echo "ñ" | grep -e "\u00F1"

                echo "ñ" | grep -e "\xc3\xb1"

                echo "ñ" | grep -e "%C3%B1"

                echo "ñ" | grep -e "\x{241}"

                without any result.

                My old (FreeBSD) proxy works with ISO8859-15 locale and I have regular expressions with latin characters for squidGuard.

                Browse youtube with you characters and explore squd or squidGuard logs for looking you URLs

                SquidGuardDoc EN  RU Tutorial
                Localization ru_PFSense

                1 Reply Last reply Reply Quote 0
                • D
                  doktornotor Banned
                  last edited by

                  The idiotic IDN idea itself left aside, the problem seems to be with:

                  • not using CDATA for the field
                  • even with that, htmlspecialchars() producing outright broken junk

                  @OP: When you look at /conf/config.xml.bad like:

                  
                  less -N /conf/config.xml.bad
                  
                  

                  and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.

                  Normally, you can only use

                  < > ' " &
                  

                  entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).

                  1 Reply Last reply Reply Quote 0
                  • D
                    dvserg
                    last edited by

                    @doktornotor:

                    The idiotic IDN idea itself left aside, the problem seems to be with:

                    • not using CDATA for the field
                    • even with that, htmlspecialchars() producing outright broken junk

                    @OP: When you look at /conf/config.xml.bad like:

                    
                    less -N /conf/config.xml.bad
                    
                    

                    and post the offending line logged in syslog with a couple of lines of context, maybe we'll move somewhere here.

                    Normally, you can only use

                    < > ' " &
                    

                    entities with XML. Stuff like ó or ñ will crap out with "Undeclared entity error" unless sticked into CDATA (or taken care of in the DTD).

                    Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

                    SquidGuardDoc EN  RU Tutorial
                    Localization ru_PFSense

                    1 Reply Last reply Reply Quote 0
                    • belleraB
                      bellera
                      last edited by

                      Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

                      I'm very sorry, because after surfing a lot about the ñ character I see that squidGuard  doesn't support it.

                      Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.

                      I think this behaviour could be because they have misconfigured the locale in the server.

                      In my old squid+squidGuard server (FreeBSD) I have some rules using ñ and other accent latin characters.

                      But this morning I tested it and they doesn't work!

                      So, I would like to apologize for the time you devoted to this topic.

                      Thanks,

                      Josep

                      1 Reply Last reply Reply Quote 0
                      • D
                        doktornotor Banned
                        last edited by

                        @dvserg:

                        Are you sure, what squidGuard services config supported national symbols ? It's a primary problem, not config.xml or GUI.

                        Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?

                        @bellera:

                        Some people says that putting ñ in squidGuard regular expressions crashes squidGuard.

                        Should use the character table equivalent (escaped). Anyway, things like this strongly suggest you should just move to Dansguarding and forget all of this.

                        1 Reply Last reply Reply Quote 0
                        • D
                          doktornotor Banned
                          last edited by

                          As a sequel to this… so apparently anything outside of ISO 8859-1 charset configured via the web GUI will get screwed by the pfSense on POST (i.e., on saving your config via the GUI). So indeed I'd suggest everyone here to just give up. Any effort here is pretty much wasted until pfSense grows itself a proper Unicode support.

                          1 Reply Last reply Reply Quote 0
                          • D
                            dvserg
                            last edited by

                            @doktornotor:

                            Using whatever character's escaped equivalent in the expession lists should work. Well, if it does not, then input sanitation should be applied. Also, what's exactly being done here? So you save, say "ñ" as "ñ" into config.xml - now I'm wonder what's gonna end up in the squidquard configuration and how's it gonna match perl "\x{0241}" ?

                            URL parametres coded as %AA%BB%CC%20, i think what this is way must use for regular expressions

                            SquidGuardDoc EN  RU Tutorial
                            Localization ru_PFSense

                            1 Reply Last reply Reply Quote 0
                            • D
                              doktornotor Banned
                              last edited by

                              Sounds reasonable… Whatever, as said above, without UTF-8 available in the GUI this is pretty much a pointless exercise. :(

                              1 Reply Last reply Reply Quote 0
                              • belleraB
                                bellera
                                last edited by

                                Example: When I search at Google for White Stork in Spanish, latin characters aren't encoded on screen

                                https://www.google.com/webhp?hl=es#hl=es&q=cigüeña
                                

                                However, copying and pasting the URL looks like encoded:

                                https://www.google.com/webhp?hl=es#hl=es&q=cig%C3%BCe%C3%B1a
                                

                                http://en.wikipedia.org/wiki/White_Stork

                                I will try a new time using this encoding in squidGuard expressions, but I think I tried and didn't work.

                                1 Reply Last reply Reply Quote 0
                                • T
                                  Tikimotel
                                  last edited by

                                  I'm just thinking…
                                  Is the squid proxy itself set to "encode"? When do the urls get passed through to SquidGuard?

                                  What to do with requests that have whitespace characters in the URI

                                  strip: The whitespace characters are stripped out of the URL. This is the behavior recommended by RFC2396.
                                  deny: The request is denied. The user receives an "Invalid Request" message.
                                  
                                  allow: The request is allowed and the URI is not changed. The whitespace characters remain in the URI.
                                  
                                  encode: The request is allowed and the whitespace characters are encoded according to RFC1738.
                                  
                                  chop:The request is allowed and the URI is chopped at the first whitespace.
                                  
                                  1 Reply Last reply Reply Quote 0
                                  • belleraB
                                    bellera
                                    last edited by

                                    That is only for spaces. I didn't see any more squid directive about other characters.

                                    1 Reply Last reply Reply Quote 0
                                    • belleraB
                                      bellera
                                      last edited by

                                      No UTF support for perl version in pfSense…

                                      http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8

                                      Do not use Perl versions prior to 5.8.1. Although support for UTF-8 began with v5.6.0, regular expressions do not work even in the next release, v5.6.1. v5.8.1 added some speed improvements. (By the way, PHP will not have UTF-8 support until v6.0.) By Perl 5.14, Unicode support is for the most part clean and smooth.

                                      [2.1-RELEASE][admin@pfsense.localdomain]/root(61): find / -name perl
                                      /usr/local/bin/perl
                                      /usr/pbi/squid-i386/bin/perl
                                      /usr/pbi/squid-i386/lib/perl5/5.16/perl
                                      /usr/pbi/squidguard-squid3-i386/bin/perl
                                      /usr/pbi/squidguard-squid3-i386/lib/perl5/5.16/perl
                                      [2.1-RELEASE][admin@pfsense.localdomain]/root(62): perl -v
                                      
                                      This is perl 5, version 16, subversion 3 (v5.16.3) built for i386-freebsd-thread-multi-64int
                                      

                                      http://www.freebsd.org/cgi/ports.cgi?query=squidguard&stype=all

                                      squidGuard code it seems to be very old also.

                                      For the moment, I will continue using squidGuard knowing this limitation. In the future I will test DansGuardian package.

                                      http://contentfilter.futuragts.com/wiki/doku.php?id=language_and_encoding_effects_on_phrase_matching

                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        christian14
                                        last edited by

                                        I use squid and squidguard since ten years and i never had problems with any characters with squidguard. I discover pfsense and i am  disappointed with accent and special characters in regular expression …A restoration XML is made . Problemes comes from XML file of pfsense (config.xml) and iso instead of utf-8 support like says Doktornotor.

                                        :-[ :-[

                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.