Welcome to Geeklog, Anonymous Friday, April 26 2024 @ 11:40 am EDT

Geeklog Forums

error when searching specialchars


Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138
special - like our native slovenian characters
? and š
especially
(? stands for c with ? )
edit;
now i noticed ... when i change global locale settings to sl_SI (for slovenia)
char ž produces the same error
(all 3 have problem cšž)

the error ocurrs only when i change language to slovenian (in css is defined windows-1250)
and it reads:

An error has occurred:
(This text is only displayed to users in the group 'Root'Wink

2 - preg_replace() [function.preg-replace]: Compilation failed: invalid UTF-8 string at offset 11 @ /snip/httpdocs/lib-common.php line 5889

Text Formatted Code
array(7) {
  ["text"]=>
  string(24) "Geeklog Project Homepage"
  ["query"]=>
  string(1) "?"
  ["class"]=>
  string(1) "b"
  ["mywords"]=>
  array(1) {
    [0]=>
    string(1) "?"
  }
  ["searchword"]=>
  string(1) "?"
  ["before"]=>
  string(11) "/(?<!\p{L})"
  ["after"]=>
  string(11) "(?!\p{L})/u"
}


why is there a
Text Formatted Code
$_CONF['default_charset'] = 'iso-8859-1';

in siteconfig.php
?
do i need to change that?

 Quote

Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138

Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138
what can i do?

pls
 Quote

Status: offline

Laugh

Site Admin
Admin
Registered: 09/27/05
Posts: 1468
Location:Canada
I would report it as a bug in the Geeklog Bug Tracker http://project.geeklog.net/. Geeklog 1.8.0 should be released in April.
One of the Geeklog Core Developers.
 Quote

Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138
did that allready
 Quote

Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138
since i could not get a fix for that error
i comented out allmost all of the function that makes me trouble

so now it reads
Text Formatted Code
function COM_highlightQuery( $text, $query, $class = 'highlight' )
{
/*     
    // escape PCRE special characters
    $query = preg_quote($query, '/');

    $mywords = explode(' ', $query);
    foreach ($mywords as $searchword)
    {
        if (!empty($searchword))
        {
            $before = "/(?!(?:[^<]+>|[^>]+<\/a>))\b";
            $after = "\b/i";
            if ($searchword <> utf8_encode($searchword)) {
                 if (@preg_match('/^\pL$/u', urldecode('%C3%B1'))) { // Unicode property support
                      $before = "/(?<!\p{L})";
                      $after = "(?!\p{L})/u";
                 } else {
                      $before = "/";
                      $after = "/u";
                 }
            }
            $text = preg_replace($before . $searchword . $after, "<span class=\"$class\">\\0</span>", '<!-- x -->' . $text . '<!-- x -->' );
        }
    }
*/
    return $text;
}

so now it just recives text and sends it back
Razz
i hope

if someone understands where the problem is - i will be glad

i have few requests for upgrades - but i do not dare

anyone sees any problem with what i comented out?
will it give me trouble elsewhere?
 Quote

Status: offline

Laugh

Site Admin
Admin
Registered: 09/27/05
Posts: 1468
Location:Canada
Commenting out that code shouldn't give you any problems, it just will not return the requested search query highlighed in the text.

Sorry we couldn't look into the issue sooner. Someone should before 1.8.0 is released.

Tom
One of the Geeklog Core Developers.
 Quote

Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138
pls do
and when u do - do contact me
i tried to document the issue as well as i could
and i think it goes kinda deep
all the way to missrepresenting utf characters - if that is at all possible
 Quote

Status: offline

Roccivic

Forum User
Moderator
Registered: 05/19/10
Posts: 136
Quote by: gape

why is there a

Text Formatted Code
$_CONF['default_charset'] = 'iso-8859-1';

in siteconfig.php
?
do i need to change that?


Yes, you need to change that. You can do that by wiping out your geeklog installation and then when reinstalling, at step 2 tick the checkbox near "Use UTF-8". I just tested this and it fixed the problem Smile

Haven't traced the actual problem though, but only the non-UTF language files are affected...

Rouslan
 Quote

Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138
this geeklog was reinstalled once before
i think that was enough ...


the q was - why is there (in siteconfig) this line
i dont remember putting it there
furthermore
i would never set such a charset ...

i removed it and it did not help ...


use utf8 is for multilanguage sites
afaik

i could use that, but i prefer smaller charsets for one-language sites (in my head thats still faster)
furthermore - this is not solution for sites that use something else but utf8
and thats majority of my sites

 Quote

Status: offline

Roccivic

Forum User
Moderator
Registered: 05/19/10
Posts: 136
Well, if you don't want to reinstall, I think that you are stuck with ISO-8859-1. We don't have charset migration for the database, yet...

Meanwhile try the below code, it works just fine for me for both UTF-8 and ISO-8859-1 Big Grin

Rouslan

Text Formatted Code
/**
* Highlight the words from a search query in a given text string.
*
* @param    string  $text   the text
* @param    string  $query  the search query
* @param    string  $class  html class to use to highlight
* @return   string          the text with highlighted search words
*
*/
function COM_highlightQuery($text, $query, $class = 'highlight')
{
    if (!empty($text) && !empty($query)) {
        $flag = false;
        if (!mb_check_encoding($text, 'UTF-8')) { // convert strings to UTF if needed
            $text  = utf8_encode($text);
            $query = utf8_encode($query);
            $flag  = true;
        }
        // escape PCRE special characters
        $query = preg_quote($query, '/');

        $mywords = explode(' ', $query);
        foreach ($mywords as $searchword) {
            if (!empty($searchword)) {
                if (@preg_match('/^\pL$/u', urldecode('%C3%B1'))) {
                    // Unicode property support
                    $before = "/(?<!\p{L})";
                    $after = "(?!\p{L})/u";
                 } else {
                    $before = "/";
                    $after = "/u";
                 }
                $text = preg_replace($before . $searchword . $after,
                                     "<span class=\"$class\">\\0</span>",
                                     '<!-- x -->' . $text . '<!-- x -->');
            }
        }
        if ($flag) { // if we need to convert a string back to ISO-8859-1
            $text = utf8_decode($text);
        }
    }

    return $text;
}
 Quote

Status: offline

::Ben

Forum User
Full Member
Registered: 01/14/05
Posts: 1569
Location:la rochelle, France
Maybe this can help: Converting a Geeklog site to UTF-8.

::Ben
I'm available to customise your themes or plugins for your Geeklog CMS
 Quote

Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138
Quote by: Roccivic

Well, if you don't want to reinstall, I think that you are stuck with ISO-8859-1. We don't have charset migration for the database, yet...

Meanwhile try the below code, it works just fine for me for both UTF-8 and ISO-8859-1 Big Grin

Rouslan

Text Formatted Code
/**
* Highlight the words from a search query in a given text string.
*
* @param    string  $text   the text
* @param    string  $query  the search query
* @param    string  $class  html class to use to highlight
* @return   string          the text with highlighted search words
*
*/
function COM_highlightQuery($text, $query, $class = 'highlight')
{
    if (!empty($text) && !empty($query)) {
        $flag = false;
        if (!mb_check_encoding($text, 'UTF-8')) { // convert strings to UTF if needed
            $text  = utf8_encode($text);
            $query = utf8_encode($query);
            $flag  = true;
        }
        // escape PCRE special characters
        $query = preg_quote($query, '/');

        $mywords = explode(' ', $query);
        foreach ($mywords as $searchword) {
            if (!empty($searchword)) {
                if (@preg_match('/^pL$/u', urldecode('%C3%B1'))) {
                    // Unicode property support
                    $before = "/(?<!p{L})";
                    $after = "(?!p{L})/u";
                 } else {
                    $before = "/";
                    $after = "/u";
                 }
                $text = preg_replace($before . $searchword . $after,
                                     "<span class="$class">\0</span>",
                                     '<!-- x -->' . $text . '<!-- x -->');
            }
        }
        if ($flag) { // if we need to convert a string back to ISO-8859-1
            $text = utf8_decode($text);
        }
    }

    return $text;
}



sry
but it produces a
Text Formatted Code
2 - preg_replace() [function.preg-replace]: Compilation failed: invalid UTF-8 string at offset 11 @ /snip/httpdocs/lib-common.php line 5895


line reads:
Text Formatted Code
  '<!-- x -->' . $text . '<!-- x -->');
 Quote

Status: offline

Roccivic

Forum User
Moderator
Registered: 05/19/10
Posts: 136
@gape: The code that you posted is not an exact copy of the code that I posted. Maybe that could be a problem...
I checked again, it still works for me. Anyway, that's probably not the best solution because it would replace stuff inside links...

Did you know that you can comment out just a part of that function, like below, so that it will only highlight ASCII search words? Maybe it could be the best temporary solution, while we fix it properly in a later release...

Rouslan

Text Formatted Code
function COM_highlightQuery($text, $query, $class = 'highlight')
{
    if (!empty($text) && !empty($query)) {

        // escape PCRE special characters
        $query = preg_quote($query, '/');

        $mywords = explode(' ', $query);
        foreach ($mywords as $searchword) {
            if (!empty($searchword)) {
                $before = "/(?!(?:[^<]+>|[^>]+<\/a>))\b";
                $after = "\b/i";
                /*if ($searchword <> utf8_encode($searchword)) {
                    if (@preg_match('/^\pL$/u', urldecode('%C3%B1'))) {
                        // Unicode property support
                        $before = "/(?<!\p{L})";
                        $after = "(?!\p{L})/u";
                     } else {
                        $before = "/";
                        $after = "/u";
                     }
                }*/
                $text = preg_replace($before . $searchword . $after,
                                     "<span class=\"$class\">\\0</span>",
                                     '<!-- x -->' . $text . '<!-- x -->');
            }
        }
    }

    return $text;
}
 Quote

Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138
Quote by: Roccivic

@gape: The code that you posted is not an exact copy of the code that I posted. Maybe that could be a problem...


im not sure if i understand this
im using
(1.7.1sr1)
on that particular site
(this) code was changed in 172?
 Quote

Status: offline

Roccivic

Forum User
Moderator
Registered: 05/19/10
Posts: 136
Quote by: gape

Quote by: Roccivic

@gape: The code that you posted is not an exact copy of the code that I posted. Maybe that could be a problem...


im not sure if i understand this
im using
(1.7.1sr1)
on that particular site
(this) code was changed in 172?


I just meant that some quotes are unescaped in your copy, that would cause a syntax error, look near the line:
"<span class=\"$class\">\\0</span>",
Just a bad copy/paste perhaps...
 Quote

Status: offline

gape

Forum User
Full Member
Registered: 05/30/02
Posts: 138
i c what u mean now
when i quoted ur code one \ was lost
i didnt remove it - maybe the forum Razz

in my first code there are 2 //

anyways
your last code works well for me for now


tyvm
 Quote

All times are EDT. The time is now 11:40 am.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content