Sorting issue: Words with accented letters
#1
Noted that Kodi doesn't correctly sort words that contain accented letters.
Standard dictionary sorting for such words is such that accented letters are dealt as if they had no accent.
However, an actual list of movie titles looks like this:
Image
It shows that titles beginning with Á appear after all entries with A, just before B - and they shoudn't.

Entering my third week with Kodi, already know there are some tricks in order to deal with languages other than English.
In fact, to recognize articles in that list, like French Les and Spanish El I've edited AdvancedSettings.xml while Italian (and French) L' required an undocumented entry in that file.
Is there perhaps a way to circumvent this (wrong sorting) bug?
Reply
#2
There is no "Standard dictionary sorting". Sorting (collation) is language-dependent. So we need to know what language settings you have on your system and Kodi.

scott s.
.
Reply
#3
(2017-08-17, 01:42)scott967 Wrote: There is no "Standard dictionary sorting". Sorting (collation) is language-dependent. So we need to know what language settings you have on your system and Kodi.

scott s.
Thank you.
Collating sequences have more than one level:
- in the primary level, diacritical marks, among them accents, are ignored;
- in the secondary level they are taken into account, in accordance to international standards like ISO 14651.
I referred to "standard dictionary sorting" because dictionaries present entries sorted in the primary level - French, German, Portuguese, Spanish dictionaries for example.
(Also Windows sorts according to the primary level, like dictionaries).
In the list I used as an example, the different collating sequences result:

Image

Rephrasing my question:
How to choose a collating sequence level without having to use a whole different interface language?
Is there some configuration file or entry to achieve that?
In particular, I would like to be able to choose primary level collating sequence, like language dictionaries.
Reply
#4
Thanks for clarifying. AFAIK, (and my knowledge isn't deep) Kodi defaults to locale C then attempts to get the system locale from the environment. I believe that sets the collate such as LC_COLLATE variable. When sorting strings, I believe Kodi first applies the selected article removal if appropriate then does a standard C sort. So off the bat I think you have to consider platform differences in the locale (I don't know that windows is equivalent to linux for example).

I don't know, but assume that setting the "language" in the interface settings forces collation to the locale derived from the language.

scott s.
.
Reply
#5
Language setting do not affect the collating sequence.
To test, changed language to Portuguese (Brazilian) and the example list remained in the same sequence as that in English, which I deem incorrect:

Image

Isn't there a way for a user to manually define a collating table, in order to manually fix this?

I ask because in order to add articles (other than The) to be ignored in sorting, there's a way: editing AdvancedSettings.xml.
Perhaps there's a similar way to enter a table to get sorting work as it should.
Reply
#6
Haven't forgotten about this. I do see that with my Windows locale English US, and also Kodi is English US collation is wrong form accented Latin-1 characters. I don't understand C/C++ on Windows enough to figure out just how Kodi sets and uses the collation (really string compare functions).

scott s.
.
Reply
#7
(2017-08-30, 22:11)scott967 Wrote: Haven't forgotten about this. I do see that with my Windows locale English US, and also Kodi is English US collation is wrong form accented Latin-1 characters. I don't understand C/C++ on Windows enough to figure out just how Kodi sets and uses the collation (really string compare functions).

scott s.
Thank you for the feedback; the issue remains unresolved.

It really surprises me that no one besides you responded to this post...

Kodi is a very carefully wrought application and it is widely used - not only by people that don't care about languages that use diacritics (accents) - in fact, English is one of the few that don't.
Doesn't anybody else want to have titles and names sorted the way dictionaries do?
Reply

Logout Mark Read Team Forum Stats Members Help
Sorting issue: Words with accented letters0