Bug in xbmc.getLanguage() / Adding ISO 3166-1 Capabilities
#1
The following description refers to v21.

Problem Statement

Under certain conditions, the 'xbmc.getLanguage()' function returns incorrect values.

Details

The function 'xbmc.getLanguage()' can be used in an addon to obtain 6 formatting options related to Kodi's current language configuration:  ENGLISH_NAME, ISO_639_1 and ISO_639_2, either with or without additional regional information.

ENGLISH_NAME (Unformatted string)

The language name returned comes from the 'name' property in the active language addon's addon.xml file.

If the region is also required, it is supplied from the 'name' property of the selected 'region' node in the language addon's langinfo.xml file.  {Issue #1}

ISO_639_1 (Two lower-case character ISO language code.  eg: en, de, fr, ja, etc)

The language code is derived by performing a lookup on the 'English Name' of the language that returns the required 2 character code.  {Issue #2}

If the region code is also required, it is obtained by using the 'locale' property for that region and performing a lookup against an ISO-639-1 table.  {Issue #3}

ISO_639_2 (Three lower-case character ISO language code.  eg: eng, deu [or ger], fra [or fre], jpn, etc)

The language code is derived by performing a lookup on the 'name' of the language that returns the required 3 character code.  {Issue #4}

If the region code is also required, it is obtained by using the 'locale' property for that region and performing a lookup against an ISO 639-2B table.  {Issue #5}

Issues

Issue #1

Although mostly cosmetic, the returned language/region string will look something like "English (Australian)-Australia (24h)" instead of "English-Australia".

Proposed solution: The 'English Name' for the language should be based on a lookup of the 'locale' property [already an ISO 639-1 2 character language code] of the 'language' node in the langinfo.xml file.

Issue #2

Using the name of the language as defined in the 'name' property of the language (as provided in the addon.xml) results in a mismatch when extra details are added to a language, eg: 'English' will match whereas 'English (New Zealand)' will not match.

Proposed solution: This option should return the 2 character ISO-639-1 language code that is already present in the langinfo.xml file.

Issue #3

The 'locale' property for the region actually appears to be an 'ISO 3166-1 alpha-2' country code because it contains two UPPER-CASE characters, unlike ISO-639-1 language code, which is specifically lower-case.

Sometimes a case-blind match will almost work, for example, country code 'DE' (Germany) will match language code 'de' and return 'German' rather than 'Germany'.

Unfortunately, region code 'CA' (Canada) will match the language code 'ca' which is 'Catalan, Valencian' and return the 3 character code 'cat'.

Proposed solution:  A new lookup table needs to be introduced to return a country name from the provided 'ISO 3166-1 alpha-2' country code.

Issue #4

Similar to Issue #2.

Proposed solution:  This option should return the 3 character ISO-639-2 language code based on a lookup of the 2 character ISO-639-1 language code already present in the langinfo.xml file.

Issue #5

If a 3 character region code is required (to match the 3 character language code), then 'ISO 3166-1 alpha-3' would appear to be the most relevant code to use.

Proposed solution:  A lookup of an 'ISO 3166-1 alpha-3' country code based on the 'ISO 3166-1 alpha-2' country code would provide the desired result.

Findings

A number of functions already exist within Kodi to return some of the required values:

The existing function 'CLangInfo.GetLanguageCode()' already returns the 3 character ISO-639-2 language code ('eng').

The existing function 'CLangCodeExpander.ConvertToISO6391()' can be used to convert the 3 character ISO-639-2 language code to the 2 character ISO-639-1 language code ('eng' -> 'en').

The existing function 'CLangCodeExpander.Lookup()' can be used to obtain the language name (in English) from the 3 character ISO-639-2 language code ('eng' -> 'English').

Recommendation

Return values for the 'ENGLISH_NAME' formatting option should remain unchanged for backwards-compatibility reasons.  Addons may already exist that expect and compensate for the erroneous return values.

A new function needs to be created to perform ISO 3166-1 country code lookups.  Options are required to return the full name or the 3 character code from the 2 character code available from the region node in the langinfo.xml.

A new formatting option, ISO_NAME, should be introduced to return the language name and optionally the country name based on the ISO codes already present in the langinfo.xml file.

Conclusion

I have experimented with the proposed changes on a development fork and they appear to work as expected.

Part of these experiments involve producing a new function 'CLangCodeExpander.LookupISO31661()' that provides lookups between ISO 3166-1 2 character code, 3 character code and full name values.  ('IE' vs 'IRL' vs 'Ireland')

I would appreciate feedback on my findings/recommendations before finalising the changes and submitting them for inclusion in the next release.
Reply
#2
I have also seen problems with xbmc.getLanguage() but in testing the problem is in Kodi itself.  Not sure if these proposed fixes correct that.  I see when I set language to English US in Kodi, then switch to some other language and then switch back, Kodi regional settings always goes to AU 24h.

But I didn't really like that getLanguage at all, because I think BCP47 language code is the real way forward.  There was/is a PR WIP by IIRC fbacher to implement ICU language libraries which I think if implemented might solve some of these problems as a side effect.

scott s.
.
Reply
#3
@scott967 - The problem with xbmc.getLanguage() is two-fold:  1) It is not using the appropriate functions that are already available; and 2) some of the required information and functions to return it do not actually exist within Kodi.

If implemented correctly, xbmc.getLanguage() is capable of returning what appears to be a BCP47 language code already.
Reply
#4
After a little more consideration, I am of the opinion that the solution should be implemented when the language and region are loaded and appropriate values cached for other areas of Kodi to use just as language/regions-specific information like date formats are currently.

'CLangInfo::Load' and 'CLangInfo::SetCurrentRegion' would be appropriate places to lookup the correct ISO names and codes and then store them as properties of their respective objects.
Reply
#5
From the PR discussion, I had spent time observing how xbmc.getLanguage() works with region=true.  My assumption is that the user of that function wants to know what resource.language is in effect.  That pretty much works when there is only one regional variant for the language.  The problem is when there are multiple variants, like es_es and es_mx the region settings allow the user to select either es or mx as the region. 

The result is that getting the region setting value doesn't tell you which language resource is active.  Where I came about this is in an addon that can request online database json response with a "language" query field in the api, I guess to get the preferred text in the response.  So using xbmc.getLanguage to set an addon var for that api.  Admit I haven't looked closely at the api to see exactly what gets changed (like , or . as numeric separator) which would influence what is "correct".

But the PR does address the "bug" aspect of the current code.

scott s.
.
Reply
#6
Testing this on master.

I don't see how the function return is useful for example the following Kodi settings:
- UI language English (United States) (resource.language.en_us)
- Region default format Central Europe

Results of test running though all 8 combinations of parameters

Code:
script.testlang:  region true - the current language is Kodilanguage639_1 en-DE
script.testlang:  region true - the current language is Kodilanguage639_2 eng-DEU
script.testlang:  region true - the current language is Kodilanguage_eng English (US)-Central Europe
script.testlang:  region true - the current language is Kodilanguage_3166 English-Germany
script.testlang: region false - the current language is Kodilanguage639_1 en
script.testlang: region false - the current language is Kodilanguage639_2 eng
script.testlang: region false - the current language is Kodilanguage_eng English (US)
script.testlang: region false - the current language is Kodilanguage_3166 English-Germany

I just don't see how getting eg "English-Germany" is at all useful.  I can see en-DE as the locale could be useful, but I really think what's needed somehow is "en-US" , since if this is used to determine language requested from something like tmdb, which has "primary translations" available as a list such as
python:
["en-AU",
  "en-CA",
  "en-GB",
  "en-IE",
  "en-NZ",
  "en-US",
  "eo-EO",
  "es-ES",
  "es-MX",
  "et-EE",
  "eu-ES",
  "fa-IR",
  "fi-FI",
  "fr-CA",
  "fr-FR",]

scott s.
.
Reply
#7
A agree that "English-Germany" is not useful.  However, as far as I can see, the function is performing as designed.  I think that this example could actually be a matter of garbage in, garbage out.

Here is an annotated extract of the langinfo.xml file for the language and region that you indicated.

Extract from langinfo.xml:
<language locale="en">    <=============
##SNIP##
    <region name="Central Europe" locale="DE">    <=============
      <dateshort>YYYY-MM-DD</dateshort>
      <datelong>DDDD, D MMMM YYYY</datelong>
      <time symbolAM="" symbolPM="">H:mmConfuseds</time>
      <tempunit>C</tempunit>
      <speedunit>kmh</speedunit>
      <timezone>CET</timezone>
    </region>
  </regions>
</language>

Here is a link to the Kodi Wiki page describing how langinfo.xml is interpreted.
(Admittedly, this reference could be considered to be circular because I authored the update over a year ago, probably as a result of the research that I did to produce this PR plus one other PR.)

https://kodi.wiki/view/Language_support#...nginfo.xml

In langinfo.xml:

Code:
<language locale="en">
Denotes that the base language is English.

Code:
<region name="Central Europe" locale="DE">
Denotes that the region selected is Germany, although the text name indicates otherwise.

The display region name is "Central Europe", however, the associated regional data states "DE".  If you actually wanted "en-US" then the region locale should be "US", not "DE".  I personally would contest that in Europe "en-EU" or "en-GB" would be more appropriate.

In this example, the region setting has more to do with date and time formats than the actual language/dialect being used.

As for extracting external data from anywhere, the result returned will probably depend on the rigidity, or flexibility, of the platform in question.  A platform may interpret "en-GB" simply as "en".  If "en-GB" is requested and the service only has data for "en-US", maybe it will return the closest match or return no match.  It will depend on the logic implemented by that external platform

I think that the solution to this particular issue would be to update the Kodi language pack as follows:

Code:
<language locale="en">
##SNIP##
    <region name="Central Europe" locale="GB">    <=============
##SNIP##</language>
Even though the region name will still say "Central Europe", the language/region returned will be "en-GB".
Reply
#8
Agree that with your PR, it works as designed.  And an issue I had, where script.tv.next.aired was getting on empty string response should be fixed.  So it's a new requirement/feature.  Actually in my perfect world, we would provide BCP-47 compliant language specifier, so en-Latn-GB and maybe more useful yue-Hant-HK which would allow selection of a proper font for unified CJK unicode.  But that's a lot more work.

Anyway, have no objection to merging the current PR.

Thought about this some more; you might consider removing the ISO 3166 bit to a separate PR and make this eligible for a backport into v20.

scott s.
.
Reply
#9
(2024-05-25, 23:44)scott967 Wrote: Anyway, have no objection to merging the current PR.
Thanks for taking the time and effort to review this.

I assume that you'll also add a comment on GitHub in due course.
(2024-05-25, 23:44)scott967 Wrote: Thought about this some more; you might consider removing the ISO 3166 bit to a separate PR and make this eligible for a backport into v20.
Sorry, this PR has been in limbo for over a year (nobody's fault) so I would not actually trust my memory to go back in and split it into multiple sections to facilitate backporting without messing it up.

I never considered backporting before you mentioned it.  Have you identified any areas of this PR that make it unsuitable for backporting in its entirety?
Reply
#10
You can't change the Python API in a backport; only fix bugs.  As it stands right now for Kodi 21.1, if you call xbmc.getLanguage(xbmc.ISO_639_1) with gui language English (US) and region format English (AU) it returns an empty string.

scott s.
.
Reply
#11
(2024-05-29, 00:43)scott967 Wrote: You can't change the Python API in a backport; only fix bugs.

Is this restriction an issue of technology or policy?

From memory, the only 'change' is adding the 'ISO_NAME' parameter, everything else just fixed what was wrong, this includes the ISO_639_1 results.

Are you saying the 'ISO_NAME' functionality needs to be moved to another PR?

Are you saying that the 'ISO_639_1' results need to stay broken?
Reply
#12
(2024-05-25, 23:44)scott967 Wrote: And an issue I had, where script.tv.next.aired was getting on empty string response should be fixed. 

I haven't followed or even understood your entire discussion, but since @scott967 mentioned the "TV Show Next Aired"-Addon, I would like to take that up here:
The addon runs under Nexus, but under Omega only if certain values ​​are selected for the "Long date format" setting, as described in https://forum.kodi.tv/showthread.php?tid...pid3198635.

From a user perspective, it is incomprehensible that a setting under Kodi influences an addon in this way and furthermore it should not be the task of the addon creator to intercept this behavior.
I therefore would welcome it if this could be fixed in a backport.
Reply
#13
(2024-05-29, 02:51)DeltaMikeCharlie Wrote:
(2024-05-29, 00:43)scott967 Wrote: You can't change the Python API in a backport; only fix bugs.

Is this restriction an issue of technology or policy?

From memory, the only 'change' is adding the 'ISO_NAME' parameter, everything else just fixed what was wrong, this includes the ISO_639_1 results.

Are you saying the 'ISO_NAME' functionality needs to be moved to another PR?

Are you saying that the 'ISO_639_1' results need to stay broken?
Note that I don't have any "merge" authority.  But the general policy for point releases is that the change is first merged into master, then is eligible for a backport only if there is no functionality change, including any api bump.  At least that is my understanding of how it works.

scott s.
.
Reply
#14
(2024-06-05, 22:11)scott967 Wrote: Note that I don't have any "merge" authority.  But the general policy for point releases is that the change is first merged into master, then is eligible for a backport only if there is no functionality change, including any api bump.  At least that is my understanding of how it works.

scott s.
.

So why not just add it to the next release and forget about backporting?
Reply
#15
(2024-05-25, 23:44)scott967 Wrote: Anyway, have no objection to merging the current PR.

Can you please repeat this statement on GitHub PR?

https://github.com/xbmc/xbmc/pull/23110
Reply

Logout Mark Read Team Forum Stats Members Help
Bug in xbmc.getLanguage() / Adding ISO 3166-1 Capabilities0