Welcome to Geeklog, Anonymous Wednesday, October 09 2024 @ 09:51 am EDT

Geeklog Forums

Issue of the multibyte char sets


Status: offline

vvprok

Forum User
Newbie
Registered: 07/07/03
Posts: 10
Geeklog is translated to many languages. It is fine! Smile
However Gl does not work with multibyte characters correctly.

As you know, string related functions strlen, strpos, substr, etc. do not take into account string encoding and works with byte sequence only. In such way, f.e. links plugin incorrecly composes brief string for "whats new" block. It leaves 16 bytes of the link title and then adds "...". As result for uk_UA.UTF-8 locale I got 7 symbols of the title in Ukrainian language and then some garbage symbols before "...". Shocked

And as you also know, there are another set of functions especially for multibyte encoding: mb_strlen, mb_strpos, mb_substr, mb_etc. Very Happy

I already fixed links plugin with mb_* functions (see here).
I simply changed calls
Text Formatted Code
str...(...)

 
to the
Text Formatted Code
mb_str...(..., $LANG_CHARSET)

 
However, it looks quite complicated to be used as total solution for all string related operations.

So, I propose to create lib-strings.php module. It will contain string-related functions. Those functions will hide from Gl code implementation details of the string related code. All of them will look in the next manner:
Text Formatted Code

function gl_strlen($string)
{
    global $LANG_CHARSET;
    return mb_strlen($string, $LANG_CHARSET);
}


 


So, what do you think?
 Quote

Status: offline

sakata

Forum User
Junior
Registered: 12/17/01
Posts: 25
Location:Japan
Hi,
I have created COM_titlesplit function.
see
http://www.geeklog.net/forum/viewtopic.php?showtopic=65070

I think having lib-strings.php is a good idea.

 Quote

All times are EDT. The time is now 09:51 am.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content