Welcome to Geeklog, Anonymous Friday, May 24 2024 @ 12:12 am EDT

Geeklog Forums

UTF-8 and language option


Status: offline

eg0master

Forum User
Regular Poster
Registered: 07/21/05
Posts: 73
Location:Stockholm
Dirk wrote in a comment:
Quote by Dirk: The language option is now only available when you're using UTF-8.

What is the reason for this? And will it be (or is it already) fixed? Feels a bit silly not showing this option if you don't go utf-8.
Geeklog Plugins: http://plugincms.com
 Quote

tokyoahead

Anonymous
There are tons of reasons.
One of the main ones is that there very few languages that fit the same encoding 100%. As soon as you switch encoding in the languagefile however, all your content becomes unreadable. If you switch to UTF-8 however, you can use all languages without messing up your content.

 Quote

Status: offline

eg0master

Forum User
Regular Poster
Registered: 07/21/05
Posts: 73
Location:Stockholm
That was one reason... I'm not really against utf-8 but for example english and swedish (which are the only languages I use and allow at my sites) both work fine in latin-1. I think most "western" european languages fit into latin-1 encoding.

Still seams like a strange thing to limit this option to utf-8 (if it is only to protect admins from messing up their content) since it only affects multilingual sites with languages with different alphabets as far as I understand, i.e. arabic, latin, greek, russian, japanese, chinese etc. And people running such sites probably already has figured out that they should stick with utf-8 if they do not want to mess up their conmtent.

But I still don't see why I, running only english/swedish should have to switch to utf-8...
Geeklog Plugins: http://plugincms.com
 Quote

tokyoahead

Anonymous
Because exactly those languages are those that do have least effort or loss when switching to UTF-8.

If we would have kept multi-language for all encodings we would have to make sure you only set it to encodings that match your content. Otherwise people come back to us and complain that users cannot read their content anymore. This would cause you to have 20+ languages in your system but only 1 or 2 to choose from, which does not make a lot of sense.

There would have to be a quite wide number of changes and checks in the system, and since GL has multi-language capabilities now this would go overboard.

Its the same question in the end why we do not operate on old PHP or MySQL versions anymore. There is something new, better to accomodate everyone instead of the few lucky ones that can live with Latin-1. Finally there are a couple of billion people that cannot live with it. Remember that most languages have a very specific encoding that fits only that one single language.

Finally UTF-8 is nothing new anymore now. Most Linux-distros have that encoding as their standard setting, all current Windows versions support it.

What is preventing you from using the UTF-8 language files? What effort is it for you to switch?
 Quote

Status: offline

LWC

Forum User
Full Member
Registered: 02/19/04
Posts: 818
BTW, even if eg0master gives up and changes his site's default language to UTF-8, his users would still use whatever it was they used until that point and that could be something that isn't UTF-8. Does it mean he'll have to run an update command in PHPMyAdmin? Is that why we in this site were all moved to English UTF-8? If so, maybe Geeklog should provide a reset certain/all settings screen, kind of like what Media Galley has and then people like eg0master could check "reset users' language to the default" and click go.

But more importantly, seeing Geeklog is headed the way of Unicode, what about finally adding mysql_query("SET NAMES 'UTF8'") to system/databases/mysql.class.php"? The big question is what does it do to Latin databases? Maybe they won't even feel it, but what's for sure without it, everything that isn't Latin turns to Gibberish in the database and you can't really work with PHPMyAdmin. And since Geeklog really needs PHPMyAdmin (for example, for the first paragraph), it's a life changer for admins who use Unicode in their databases.
 Quote

Status: offline

eg0master

Forum User
Regular Poster
Registered: 07/21/05
Posts: 73
Location:Stockholm
There is no big hassle to change to utf-8 except for maybe the content of the database. I'm not sure what will happen there...

And I can buy the argument that users are stupid so in order to help them do the right thing you limit functionality unless you do things the "recommended way". And I think you have answered my question; The reason for disabling the language option is to protect the below average administrators to do the right thing.
I accept it, but I think it is wrong. And somehow it is good you draw your straw to the "convert the world to utf-8" stack.

I was just interested in the reason... Wink
Geeklog Plugins: http://plugincms.com
 Quote

tokyoahead

Anonymous
Quote by LWC: BTW, even if eg0master gives up and changes his site's default language to UTF-8, his users would still use whatever it was they used until that point and that could be something that isn't UTF-8.


You simply delete the non-utf language file, then everyone switches to the default, which is UTF. End even if not, his users wont see any change since it looks all the same for unicode and latin. There is no notion of giving up. Its a new version. There are new features, and changes necessary since we cannot support every notion of every site. Thats why old layouts are not in the core code anymore either. You cannot have everything because our ressources are not endless to support all of this.
 Quote

tokyoahead

Anonymous
Quote by eg0master:The reason for disabling the language option is to protect the below average administrators to do the right thing.I accept it, but I think it is wrong.


Well to be straight forward, you aparently do not know a lot about the unicode encoding. otherwise you would have simply switched languagefiles and never thought about it again, because thats all it takes in your case. You also would have understood that now 90% of all software on the market supports unicode and does not work in proprietary encodings anymore. So from my guess, you would have made the same error that we are trying to prevent with this measure once you wanted to include another language.

You call it below average, I do not. Encoding are a very complicated matter, and it is very normal that people do not know a lot about it, even admins. That is why we decided to step ahead and take the decision. Call it wrong, we surely gave it a lot of thought.
 Quote

Status: offline

1000ideen

Forum User
Full Member
Registered: 08/04/03
Posts: 1298
Quote by tokyoahead: ... otherwise you would have simply switched languagefiles and never thought about it again, because thats all it takes in your case.


Are you sure? Swedish seems to contain some special signs like German. "Följande inlägg ägs" At least there is a description how to do it: http://www.geeklog.net/article.php/200410120657418 It is not so difficult. I only got stuck with $_CONF['site_name'] und $_CONF['site_slogan'] because they are monolingual.

 Quote

Status: offline

LWC

Forum User
Full Member
Registered: 02/19/04
Posts: 818
even if not, his users wont see any change since it looks all the same for unicode and latin

In his case...

So deleting files is what Dirk did. I didn't think about that. But I didn't ask to keep support. Just to give a basic option like "reset" instead of resorting to deleting files or using PHPMyAdmin.

In any case, judging from all this, you really want Geeklog to be Unicode and so you really need to consider supporting Unicode databases. It's all part of the same thing and is one command away (and who knows. Maybe it doesn't even matter to none Unicode databases).
 Quote

Status: offline

LWC

Forum User
Full Member
Registered: 02/19/04
Posts: 818
Are you sure? Swedish seems to contain some special signs like German.

And what makes you think Unicode doesn't have those signs?

I only got stuck with $_CONF['site_name'] und $_CONF['site_slogan'] because they are monolingual.

Ah, I remember that one...took me a while, but I beat this problem. What's for sure, Unicode won't help you there because Geeklog's PHP files must be ANSI (except the language files).
I use Hebrew and so I do stuff like:
Text Formatted Code
<br />$_CONF['site_slogan'] = "something in Hebrew"<br />$_CONF['site_slogan'] = iconv('windows-1255', 'utf-8', $_CONF['site_slogan']);<br />

But the assumptions are:
1) I know the technical name of Hebrew encoding (Windows-1255).
2) I use Unicode in the site.

Knowing this line I can fill my sites with Hebrew words without problems. Before I knew the line it was such a pain.
 Quote

Status: offline

eg0master

Forum User
Regular Poster
Registered: 07/21/05
Posts: 73
Location:Stockholm
Quote by tokyoahead:Well to be straight forward, you aparently do not know a lot about the unicode encoding.

Well I've worked enough with different encodings to know things generally get messed up pretty bad when you don't think and do things "right" from the beginning.

Since I havn't had the opportunity (yet) to experiment with the beta release I might see a problem that isn't there, but I'm uncertain how the database content will be handled when switching to utf-8 (when it contains those pesky special characters). I do not like to be in a situation where I have to convert content in the database (unless somebody already has a script that does this for all possible tables).

But all in all you probably made a good call in the worldwide conversion to utf-8 since there are too many applications out there not supporting unicode today.
Geeklog Plugins: http://plugincms.com
 Quote

Status: offline

Dirk

Site Admin
Admin
Registered: 01/12/02
Posts: 13073
Location:Stuttgart, Germany
The database will have to be converted. There's an older story by Oliver (somewhere in the "Geeklog" topic, I assume) that explains how to do that.

bye, Dirk
 Quote

tokyoahead

Anonymous
Quote by LWC: really need to consider supporting Unicode databases.


well for MySQL all you can do is switch the collation to unicode. This means that the "natural sorting" will be according to the unicode charachter instead of the first letter of the multibyte string. However, that does not necessarily mean it works better. The important thing is more that you do not switch encoding without changing the content of the database. You can run a completely wrong sorting order and default charset of mysql as long as you dont switch in the middle, since that will mess things up.
 Quote

tokyoahead

Anonymous
Quote by 1000ideen: I only got stuck with $_CONF['site_name'] und $_CONF['site_slogan'] because they are monolingual.


well you might have a problem switching those depending on the language, but you can write a little if/then clause into your config.php where you give different names depending on the users language. Aditionally you do not need to use the iconv, you can pull out a unicode text editor and write the values in there. Then you switch back to your normal editor (if that one is not unicode) and simply continue to edit config.php in the future. With a non-unicode editor you wont be able to read a for example japanese site-title, but you will not mess it up when saving it after editing another variable of config.php
 Quote

Status: offline

1000ideen

Forum User
Full Member
Registered: 08/04/03
Posts: 1298
Yea, "a little if/then clause" IF tokyoahead helps me THEN I can copy it...

@ LWC Actually I did use UTF-8 for the $_CONF['site_slogan'] "hält länger frisch" without problems.

I tried converting manually with 2 small sites which I use as CMS. Mainly I had to change the headlines of the stories and blocks. The stories contained no special signs as they were written with FCKeditor.
 Quote

All times are EDT. The time is now 12:12 am.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content