Wedge

Public area => The Pub => Topic started by: Arantor on October 11th, 2011, 03:30 PM

Title: English British support
Post by: Arantor on October 11th, 2011, 03:30 PM
I was wondering about this, since I figure it was inevitably going to come up (if nothing else, I'll probably end up wanting to do it)

Now, I figure there's a way that we can do that and make life quite easy to support the English British set without too much extra work.

loadLanguage already loads English standard anyway, as a fallback, so the required strings should always be declared. But if you're doing that, you can be very cheeky and just have English British quite literally be *only the changed strings*. If there's no difference between English and English British for a given string, why define it again?

There is one caveat: right now, the load-fallback is actually a setting, though there's no UI for it, because I figure you almost never would change it (and if you did want to change it, it's not like you couldn't just tweak the code manually... if you're at the stage where it would make a difference, you're going to be up to making code changes)

Curious to know what the thoughts are on this, both technically and generally.
Title: Re: English British support
Post by: Nao on October 11th, 2011, 03:54 PM
I'd tend to say it's more amusing/exciting to only keep the relevant entries.

The load fallback is here to stay, if you ask me. I think most people really prefer to have an English string rather than nothing.

Possibilities to explore...

- add a version number to every single language file. Increase version number whenever a change is made to the English version. Compare current version against English version to determine whether we need to load the English fallback (and warn the admin that their language file is broken.)

- store language files in a serialized array in the database. (or in the cache. I think there's already a cache level for language files. A bad one IIRC, because of the variables involved.) When caching fallback & non-fallback, check whether the English version has some extra strings, and only store those.
Title: Re: English British support
Post by: Arantor on October 11th, 2011, 04:01 PM
Quote
I'd tend to say it's more amusing/exciting to only keep the relevant entries.
That's exactly it. English/British is the most insane language pack to exist because 98% of it is the same as the main English set.
Quote
The load fallback is here to stay, if you ask me.
I can't see it disappearing either, in which case we might as well remove the internal disable option for it.
Quote
add a version number to every single language file
That's not a bad idea. A little bit tricky to get right, but certainly doable.
Quote
store language files in a serialized array in the database.
The variable nature before made it practically impossible to get this one right, and the old language cache was incredibly unreliable, mostly because of Modifications.*.php and the fact that files weren't re-pulled through the cache after modification, so if a mod edited a language file, it never recached it until it naturally fell out the cache.

The real question is whether the benefit outweighs the hassle. I'm not sure what benefit it does actually have to do it for all strings, especially with the way different files reuse strings the way they do.

That said, what we *can* do is put edited strings in the DB, so instead of users having to make files editable through permissions, they just get to put it into the DB.
Title: Re: English British support
Post by: Dismal Shadow on October 11th, 2011, 04:20 PM
That's amazing.
That's amasing?

Color.
Colour?
Title: Re: English British support
Post by: Arantor on October 11th, 2011, 04:24 PM
Quote from ~DS~ on October 11th, 2011, 04:20 PM
That's amazing.
That's amasing?
Nope, just amazing.
Quote
Color.
Colour?
Colour is the correct spelling, of course :P As is centre, serialise, initialise, humour, flavour etc.
Posted: October 11th, 2011, 04:21 PM

Also, I'm sure I mentioned I'd like to do a pirate language pack, which would similarly be based on English :whistle:
Title: Re: English British support
Post by: spoogs on October 11th, 2011, 04:24 PM
Will never forget making my school lose the spelling bee when I was in the 7th grade because they couldn't spell colour and favourite properly. An apology years later didn't help much either.
Title: Re: English British support
Post by: Dismal Shadow on October 11th, 2011, 04:27 PM
Quote from Arantor on October 11th, 2011, 04:24 PM
Quote from ~DS~ on October 11th, 2011, 04:20 PM
That's amazing.
That's amasing?
Nope, just amazing.
Quote
Color.
Colour?
Colour is the correct spelling, of course :P As is centre, serialise, initialise, humour, flavour etc.
Posted: October 11th, 2011, 04:21 PM

Also, I'm sure I mentioned I'd like to do a pirate language pack, which would similarly be based on English :whistle:
Do it, DO IT. It would be fun t' imitatin' pirate.
Title: Re: English British support
Post by: Arantor on October 11th, 2011, 04:30 PM
Oh, adding a pirate language pack is something I started to do back in the RC3 days. I just didn't finish it because I was frustrated at the things I'm fixing now.

That, and there was a small problem with dealing with locales, but that's not a problem I have to worry about these days (knowing much more than I did about setlocale 18 months ago)
Title: Re: English British support
Post by: Norodo on October 11th, 2011, 06:45 PM
Weh. Who needs the american way of spelling it anyway? British English and French should be standard!
Title: Re: English British support
Post by: MultiformeIngegno on October 11th, 2011, 10:31 PM
Yeah, British English should be the only one (no American English at all)!
Title: Re: English British support
Post by: Arantor on October 11th, 2011, 10:58 PM
I'd have to go through and fix a number of strings in the core, and if I did that I'd want to use colour bbc instead of color and so on :/

There is another thing I'd do with this overhaul, actually. There's something that's annoyed me for a long time: the way languages are displayed to users for choosing.

If you have a list of languages, and you list them as English, French, Spanish... you're Anglicising it. It should be the proper language, e.g. English, Francais, Espanol. (Accents included, I just couldn't be bothered typing them)

Now, the reason it's done how it is right now is simply ease of programming and performance: it's easy enough to look up index.*.php files and split the * on _ and upper-case the first letters. That's how you end up with Spanish (or worse, Spanish Es, a construction that's somewhere between meaningless, irritating and unintuitive)

Instead, if you build and store a list, you can actually load the files themselves to get the right string, so you can actually have the files themselves contain a proper language-dependent string holding the proper form of that language's name in that language... can't say more logical than that IMO.

(It would replace the current getLanguages() process, and would cache the value inside $modSettings. I have no problem with only setting it from the admin panel and have the admin panel be the one to set it, so it's only set if the user asks to recache it, or when you add a new language.)
Title: Re: English British support
Post by: Dismal Shadow on October 12th, 2011, 06:16 PM
We need a Yoda language. :P
Title: Re: English British support
Post by: Arantor on October 12th, 2011, 07:36 PM
Tempt me, you should not.
Title: Re: English British support
Post by: Dismal Shadow on October 12th, 2011, 07:49 PM
Yoda is a not a language per se but a translation.

"No one wants to die. Even people who want to go to heaven don’t want to die to get there. And yet death is the destination we all share. No one has ever escaped it. And that is as it should be, because Death is very likely the single best invention of Life. It is Life’s change agent. It clears out the old to make way for the new. Right now the new is you, but someday not too long from now, you will gradually become the old and be cleared away. Sorry to be so dramatic, but it is quite true."

Yoda:
"To die no one wants.  Even people who want to go to heaven want not to die to get there. And the destination we all share, yet death is.  Ever escaped it, no one has. And as it should be, that is, because very likely the single best invention of life, death is.  Life's change agent, is it.  To make way for the new it clears out the old.  You, right now the new is, but too long from now someday not, gradually become the old and be cleared away, you will.  To be so dramatic sorry, but quite true, it is.  Yes, hmmm."
Title: Re: English British support
Post by: Arantor on October 12th, 2011, 08:09 PM
I know :P From my POV, anything that requires altering a lot of language strings is effectively a language pack candidate.
Title: Re: English British support
Post by: Dismal Shadow on October 12th, 2011, 08:26 PM
I never understood how it works if pirate and yoda are in english? They are automatically translated for you on post or...? 
Title: Re: English British support
Post by: Nao on October 12th, 2011, 08:46 PM
Technical suggestion...

British.language.php or whatever:

$based_on = 'English';
$txt[..] = '...';
$txt[...] = '....';

Load.php:

if (!empty($based_on))
{
  $original_txt = $txt;
  loadLanguage($based_on);
  $txt = array_merge($txt, $original_txt);
}

(And even better -- cache the resulting language...)
Posted: October 12th, 2011, 08:41 PM
Quote from Arantor on October 11th, 2011, 04:01 PM
I can't see it disappearing either, in which case we might as well remove the internal disable option for it.
Sure...
Quote
Quote
add a version number to every single language file
That's not a bad idea. A little bit tricky to get right, but certainly doable.
It's just a bit annoying to maintain once we hit gold... :-/
Or we could have some sort of script that automatically updates the internal version numbers if it finds any difference against the previously stored file (e.g. we just do a md5 of the contents, after removing comments and whitespace.)
Quote
The variable nature before made it practically impossible to get this one right,
I can see it working, with a 'complex_string' function as I said...
For instance, function_exists('something') could be turned into {function_exists:something} and automatically modified by complex_string. Of course, the more choices, the slower it gets... But we can simply check for strpos($string, '{') before continuing. ({, or <we:, or whatever...)
Title: Re: English British support
Post by: Arantor on October 12th, 2011, 09:19 PM
Quote from ~DS~ on October 12th, 2011, 08:26 PM
I never understood how it works if pirate and yoda are in english? They are automatically translated for you on post or...?
That's the thing, it's no different to the code as, say, English British vs English, or Portuguese Brazilian vs Portuguese PT. It's the language around the posts, not the posts themselves...
Quote
(And even better -- cache the resulting language...)
Question. Let's say we have arbitrary language x, based on language y, neither of which is English. Do we load English, then y then x?

Caching is a tricky subject for languages. Caching into the cache folder in the usual fashion? If so, I'm not clear how it helps performance.

Also, other than English British, are there any other language packs that are variants of each other? I remember discussing the above mentioned Portuguese pack, given that the two packs are even closer than British is to English... Is it worth the effort to build something to support it rather than just leveraging a comfortable detail of implementation? If no language is really going to use it, I'd say it wasn't worth the effort.
Quote
(e.g. we just do a md5 of the contents, after removing comments and whitespace.)
No, we should store it as a version number in the file rather than rely on contents as unless we store changes in the DB, the language files should be considered volatile.

Elsewhere I mentioned gathering a list of the languages in a manner that encourages loading the files themselves to get the language's own name from it. Putting the version number would be trivial to load at that point.

As far as variables, I'd say it's dependent on the nature, you don't add per document or brute force changes to strings when they apply to individual strings; I wouldn't throw every string through a test for boardurl for example.
Title: Re: English British support
Post by: Norodo on October 12th, 2011, 09:56 PM
Norwegian Bokmaal and Norwegian Nynorsk are similarish*... But I don't think it's worth creating a thing for it, just make them two different language packs...

*Both of which I can supply, probably, if you need any help, yay!
Title: Re: English British support
Post by: Arantor on October 12th, 2011, 09:58 PM
How different are they? Is it spelling variations like English vs British? Grammar changes?
Title: Re: English British support
Post by: Norodo on October 12th, 2011, 10:09 PM
There are a few grammar changes, but I'm not sure if they'll affect Wedge, prolly not. Mostly spelling changes.
Title: Re: English British support
Post by: Nao on October 12th, 2011, 10:15 PM
Quote from Arantor on October 12th, 2011, 09:19 PM
Question. Let's say we have arbitrary language x, based on language y, neither of which is English. Do we load English, then y then x?
Recursive load yeah... Not very efficient :P Nested languages would require caching, definitely.
Quote
Caching is a tricky subject for languages. Caching into the cache folder in the usual fashion? If so, I'm not clear how it helps performance.
$txt = unserialize('...') is only 10% faster than the regular version on index.french.php. Although this is about 50% faster:

Code: [Select]
$txt += array(
  'lang_locale' => 'fr_FR',
  'lang_dictionary' => 'fr',
  'lang_spelling' => 'french',
  'lang_rtl' => false,
  'lang_capitalize_dates' => false,
  'number_format' => '1 234,00',
  'time_format' => '%e %B %Y à %H:%M',
  ...

Also, the file is smaller by about 5%.

Oh, and yeah, most of the time, this format could be used directly in Wedge, rather than used after caching the files...
But it's probably not really worth the hassle. On my local install, loading index.french.php, even when going through loadLanguage and its complicated setup (template_include etc), only takes a millisecond... versus half a millisecond for a simple include() on the var_export version.
Quote
Quote
(e.g. we just do a md5 of the contents, after removing comments and whitespace.)
No, we should store it as a version number in the file rather than rely on contents as unless we store changes in the DB, the language files should be considered volatile.
No no, I do mean having the version number in the file -- I just mean we should have a script, on our side, that will go through the language files and increase the version number inside the files if they're found to have been changed against our reference files (i.e. the previous version)...

Heck, we could have a similar script for all files, so we don't have to modify manually all of those @version strings...
Quote
Elsewhere I mentioned gathering a list of the languages in a manner that encourages loading the files themselves to get the language's own name from it. Putting the version number would be trivial to load at that point.
Agreed.
Title: Re: English British support
Post by: Arantor on October 13th, 2011, 12:39 AM
Quote
$txt = unserialize('...') is only 10% faster than the regular version on index.french.php. Although this is about 50% faster:
We can only use that once we have built the final array, we cannot use it to combine English with other elements during the build process.

For any given arrays, if the same (non indexed) key occurs in both arrays, the left hand array element will be used, as per http://php.net/manual/en/language.operators.array.php so if we load English into $txt, then use += to attempt to overlay another language, it's going to fail.

Also note that there are some instances where strings are actually reused without necessarily being aware of it.
Quote
No no, I do mean having the version number in the file -- I just mean we should have a script, on our side, that will go through the language files and increase the version number inside the files if they're found to have been changed against our reference files
I'm thinking instead to attach a script to the post_commit hook of SVN, called on any given commit. If the commit updates anything in /languages/, we have an instant log that something needs to be done. It's all automatable. :D
Title: Re: English British support
Post by: Nao on October 13th, 2011, 07:32 AM
- I meant array_merge of course.

- sounds good. Is it what smf used to update tier changelog number? ;)
Title: Re: English British support
Post by: Arantor on October 13th, 2011, 09:36 AM
Quote from Nao on October 13th, 2011, 07:32 AM
- I meant array_merge of course.
That certainly works for the building of arrays for caching, but += is suitable (with aforementioned caveat) for loading language strings from cache.
Quote
- sounds good. Is it what smf used to update tier changelog number? ;)
No. The change log was updated by hand with more than one commit because a previous committer had forgotten to update it.