Welcome to Geeklog, Anonymous Thursday, May 30 2024 @ 04:02 pm EDT

Geeklog Forums

Issue of the multibyte char sets

Status: offline


Forum User
Registered: 07/07/03
Posts: 10
Geeklog is translated to many languages. It is fine! Smile
However Gl does not work with multibyte characters correctly.

As you know, string related functions strlen, strpos, substr, etc. do not take into account string encoding and works with byte sequence only. In such way, f.e. links plugin incorrecly composes brief string for "whats new" block. It leaves 16 bytes of the link title and then adds "...". As result for uk_UA.UTF-8 locale I got 7 symbols of the title in Ukrainian language and then some garbage symbols before "...". Shocked

And as you also know, there are another set of functions especially for multibyte encoding: mb_strlen, mb_strpos, mb_substr, mb_etc. Very Happy

I already fixed links plugin with mb_* functions (see here).
I simply changed calls
Text Formatted Code

to the
Text Formatted Code
mb_str...(..., $LANG_CHARSET)

However, it looks quite complicated to be used as total solution for all string related operations.

So, I propose to create lib-strings.php module. It will contain string-related functions. Those functions will hide from Gl code implementation details of the string related code. All of them will look in the next manner:
Text Formatted Code

function gl_strlen($string)
    global $LANG_CHARSET;
    return mb_strlen($string, $LANG_CHARSET);


So, what do you think?

Status: offline


Forum User
Registered: 12/17/01
Posts: 25
I have created COM_titlesplit function.

I think having lib-strings.php is a good idea.


All times are EDT. The time is now 04:02 pm.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content