Show originaltitle (international original movie title) of movies in an extra field?
#61
carmatana Wrote:I am aware that my movie collection may be different from other users in terms of Akas "messness", I am just trying to help and as you may note my comments are based on very detailed and extensive data.

Yes, I noted and I do appreciate your feedback. Thank you for the extensive reports!

I don't have so much time to test, but made some changes to the AKAs logic. Could you please run through your movies?

http://pastebin.com/raw.php?i=yfV6BfSV
Reply
#62
I will try tomorrow. In the meantime another idea that may help and it is in line with XBMC naming guidelines.

My movie collection is organized one movie per folder:

--Harry Potter 1 (2001) -abcd < Folder
-----HP1.avi
--Harry Potter 2 (2003) -bcde < Folder
-----HP 2.avi

the -abcd and the -bcde after the movie folder are some tags that I use to identify the movie, i.e. it may be a movie that Rossi, my wife, wants to watch, so I put:

Harry Potter 1 (2001) -Rossi

or maybe it is a movie to look when I am in the mood for watching nothing special (Harry Potter 2 (2003) -nothingspecial)

I also use always the "Use folder names for lookups", what if:

Option 1:
IMDB scraper compares the folder name with all the AKAS and the Original Title and gets the most similar, i.e. matching only the first X characters (or the character before the year). Cons: This will depend on not having any typos in the foldername and following XBMC recommended naming. Or worse, may require a kind of fuzzy logic (or maybe with regex it is not so difficult)

Option 2:
To take the foldername with out the year an the other comments as the movie title. Cons: Easiest but will not allow to identify false positives easily as it may look as if the movie was perfectly identified, at least by reading only the title.

Not all users may be happy but can be an option that someones may like to choose and for those that no, they can use the traditional method.

Just some ideas to deal with AKAS mess, I am sure that the new file will be much better but I can bet that is not perfect (or the other way around, it may be perfect but garbage in (Akas) garbe out.

Another con of the 2 options: Again, depends on how each person has organized his movies and how he chooses to scrape it.

Thanks again.
Reply
#63
Hi Olympia,

I tested the latest version, it corrects most of all of the problems in USA-English movies that V2 (Original minus 175-180 lines) brought.

Results are almost identical to the Original version, improved in 1 or 2 titles, got worse in 1 or 2 titles, so no conclusion. At this point, after having looked at the IMDB-AKAS and learnt that there it is far from being well structured, I would be indeferent about using the Original or the Latest version. And can't propose more alternatives with the existing constraints.

At least some of the issues of IMBD and the scraper were made explicit in these posts and may be helpful in the future.

Thanks again for all your attention and support.
Reply
#64
Hmmm... I've just tested and the below examples are ALL fixed:

Code:
9th Company
Black Book
Come and See
Mesrine: Killer Instinct
Mesrine: Public Enemy No. 1
Tae Guk Gi: The Brotherhood of War
The Barbarian Invasions
The Secret of the Grain

Are there other non-OK examples?
Reply
#65
Let me check, I tested the 3 versions with to different settings (usa+intl and only usa) - probably I overwrite some results or confused the outpus or I did not remove the forced title in the movie.nfo file. I was extracting the results with sqlite and not an epxpert (honestly I was falling asleep)

By the way this is something that has to be tested: how good is each version if a movie.nfo with the imdb url not exists., False positives increase or decrease? I will try to run a couple of tests taking this variable into consideration.
Reply
#66
FYI modifying 'usa /international' to only 'usa' in settings.xml doesn't make any sense and completely screw up the scraper logic. You might have better results, but this is only a coincidence (not entirely, but a kind of).

All in all I am not interested in any test results with that hack as it is not useful at all for me.

BTW these changes have no affect on the search results at all. It is the same with all modifications we tried.
Reply
#67
I finished, you were rigth, I was doing something wrong. Here you have my outcomes.

(If you PM your email address I can send you a xls or csv file that are more friendly and maybe more useful)

Scraped Movies: 220 (all with "USA / International")

Mismatches:

Original Version (1V): 18
2nd Version (2V): 33
3rd Version (3V): 13

Movies with mismatch in only one version: 28
Movies with mismatches in two versions: 9
Movies with mismatches in the 3 versions: 6

Combined mismatches (sum of above): 43


MOVIES WITH MISMATCHES IN THE 3 VERSIONS:

Code:
American Title    IMDB_I     Year    Country

The Good, the Bad, the Weird    tt0901487    2008     South Korea
Tae Guk Gi: The Brotherhood of War  tt0386064  2004     South Korea
Ran    tt0089881    1985    Japan / France
Mesrine: Public Enemy #1    tt0411272    2008    France / Canada
Mesrine: Killer Instinct    tt1259014    2008    France / Canada / Italy
Das Experiment    tt0250258    2001    Germany


Maybe I am too strict but, in the case of Mesrine, the American version title (or perhaps I am wrong in my definition of the American title) differs slightly, In 2V and 3V they appear as:

Mesrine: Part 1 - Killer Instinct
Mesrine: Part 2 - Public Enemy #1

In other cases as in Tae Guk Gi and Das Experiment it was clear that there is no way to bring the right title from Akas.

In some cases forcing settings to "USA" was useful but as you said, should be coincidence.

IMPROVEMENTS IN 3V versus 1V

Code:
American Title    IMDB_ID    Year    Country

Time of the Gypsies    tt0097223    1988    UK / Italy / Yugoslavia
The Barbarian Invasions    tt0338135    2003    Canada / France
Black Book    tt0389557    2006    Netherlands / Germany / UK / Belgium
9th Company    tt0417397    2005    Finland / Russia / Ukraine
The Secret of the Grain    tt0487419    2007    France
Swades: We, the People    tt0367110    2004    India
Das Boot    tt0082096    1981    West Germany
Come and see    tt0091251    1985    Soviet Union


WORSE IN 3V THAN 1V

Code:
The Bandit        tt0116231    1996      Turkey (In 3V = Eskiya)
Infernal Affairs    tt0338564    2002       Hong Kong (In 3V = Non-Stop Way)
Monty Python and the Holy Grail    tt0071853    1975    UK (In 3V Mønti Pythøn ik den Høli Gräilen)

Something Relevant is that the first 2 movies were well matched in V2

BETTER IN 2V than 1V or 3V

Code:
Once Upon a Time in the West    tt0064116    1968    Italy / USA
Crouching Tiger, Hidden Dragon    tt0190332    2000    Taiwan / Hong Kong / USA / China
Blood In, Blood Out    tt0106469    1993    USA
Black Cat, White Cat    tt0118843    1998    Federal Republic of Yugoslavia / France / Germany / Austria / Greece / USA

MY CONCLUSION:

So, in general, 2V was better in 6 (these four and the 2 mentioned above) titles than 3V. Something tell me that a better outcome can be achieved by combining some parts of 3V with 2V even when 2V was, in absolute terms, the worst. Obviously, no idea of how can this be done.

I thought that 2V was better for non-USA movies, the funny thing is that these 5 movies have "USA" in the country field.

Happy to make more tests or provide more info if I can help.


pd

Sorry but I couldn't resist. 3V "USA / Intl" vs 3V only "USA" has the same number of mismatches (13), 9 of them in common and the others are different. So, you were right, it seems a random variable.
Reply
#68
Wow, extensive testing! Cheers for this.

Can you please give this one a try?
http://pastebin.com/raw.php?i=3Cbbgdjw

I don't need version comparision. A simple run through is enough for me, just give me titles which are not OK with this newest.
Reply
#69
It is getting better, just 8 mismatches:

Code:
American_Title    IMDB_ID    Year    Country    V4 Title

Das Experiment    tt0250258    2001    Germany    The Experiment
Mesrine: Public Enemy #1    tt0411272    2008    France / Canada    Mesrine: Public Enemy No. 1
The Good, the Bad, the Weird    tt0901487    2008    South Korea    Nom Nom Nom
Monty Python and the Holy Grail    tt0071853    1975    UK    Mønti Pythøn ik den Høli Gräilen
The Shawshank Redemption    tt0111161    1994    USA    Sueño de fuga
Before Sunset    tt0381681    2004    USA    Antes del atardecer
The Butterfly Effect    tt0289879    2004    USA / Canada    El efecto mariposa
Walk the Line    tt0358273    2005    USA / Germany    Johnny & June - Pasión y locura
Reply
#70
OK, here is the latest one:
http://pastebin.com/raw.php?i=W1E4Vkzi

I think that's the best I can do. Can you give it a try?
Reply
#71
Hi Olympia,

Wow, Much more better !!!

4 Mismatches:

1) Das Experiment (2001) - scraper returned "The Experiment"

I guess that there is nothing else to do with this movie, there is no cue in AKAs to find the rigth one, even in TheMovieDB it appears as "The Experiment". Maybe I am wrong in what I think is the right title.

2) The Good, the Bad, the Weird (2008) - scraper returned "Nom Nom Nom"

Idem, nothing to do.

3) The Girl Who Kicked the Hornet's Nest (2009) - result: "The Girl Who Kicked the Hornets' Nest"

This version is the first one having this mismatch. Maybe something can be done.

In Akas you find, among others, the following:

Code:
The Girl Who Kicked the Hornet's Nest     International (imdb display title) (English title)
The Girl Who Kicked the Hornets' Nest     UK (imdb display title)

The International - English title is the rigth one. However, given AKAs "messness" I would say that it is safer and smarter to pick the UK title.

4) Mesrine: Public Enemy #1 (2008) - scraper returned: "Mesrine: Public Enemy No. 1"

In akas, among others, you find:

Code:
Mesrine: Public Enemy No. 1     UK / USA
Mesrine: Public Enemy Number One     UK (alternative spelling) / USA (alternative spelling)
Mesrine: Public Enemy #1     USA (new title)

So it seems like, in the previous title, the scraper (and its designer) is working INTELLIGENTLY when there is ambiguity.


Something worth noting of this version (5th) is that in all 100% USA (or 100% UK or other "english" country) movies, the scraper did a perfect job. This is not the case for versions 2, 3 and 4.

Even in version 1 (original one), the movie "Blood in, blood out" was returned with the alternative, but also correct USA, title "Bound by honor".

I do not know about other users, but as mentioned, I prefer consistency than accuracy and I would not like to be worrying about USA (or alike) movies.

Great job !!! My congratulations.

Are you going to update the scraper in XBMC download center?
Reply
#72
Hi carmatana,

I am glad it's getting better. Beleive me, it's really a beast to deal with... Smile

However, if we're already into it, before I push this to the offical repo, I might try to resolve some more (not sure yet how to to it without screwing up again something else, but I've some ideas).

You're correct in saying nothing to do with 'Das Experiment' and 'Mesrine...'. USA (first matched) title wins over international title in the new logic. We could catch the 'new title' (Mesrine), but I am almost sure other movies would require the old, not the new.

However I feel some motivation to work out the issue with:
- The Girl Who Kicked the Hornet's Nest
- The Good, the Bad, the Weird

Once more, not sure I can do it, but if you're still willing to help and test I would try.
Reply
#73
Sure, I can hlep
Reply
#74
Cool, then lets give this a run:
http://pastebin.com/raw.php?i=SyySbuV5
Reply
#75
Hi Olympia,

Tell me the truth, you are hard-coding the difficult titles in the xml file and that's the reason it is getting better, right?

The overall outcome is 3 mismatches, 1 new and 2 present in the previous version.

Surprisingly, and against my bets and partially against yours, "Das Experiment" and "The Good, the Bad, the Weird" are correctly matched with the american title in this version.

The two that are repeated compared with previous version are:

*) Mesrine: Public Enemy #1
*) The Girl Who Kicked the Hornet's Nest

The new one is:

*) Heartbreaker (2010) - ID: tt1465487, Countries: France / Monaco

The scraper got "L'arnacoeur" that is the French original title.

its akas:

Code:
Heartbreaker     Finland (imdb display title) / International (imdb display title) (English title) / Sweden (imdb display title) / UK
Сердцеед     Russia
Como Arrasar um Coração     Brazil
Der Auftragslover     Germany (imdb display title)
Epangelmatias kardiokataktitis     Greece (transliterated ISO-LATIN-1 title)
Heartbreaker. Licencja na uwodzenie     Poland (imdb display title)
I truffacuori     Italy (imdb display title)
Il truffacuori     Italy (imdb display title)
Los seductores     Spain (imdb display title)
Rompecorazones     Argentina (imdb display title)
Srcolomac     Croatia (imdb display title)
Szívrablók     Hungary (imdb display title)

Weird, I would say this is an easy target, but you are right and I believe you, this is a beast hard to deal with.

And, out of the record because you seem to be no receptive to this type of comments, now this and the previous version are quantitatively and qualitatively more superior than EMM-R mentioned in one of my first posts.

Still willing to help.
Reply

Logout Mark Read Team Forum Stats Members Help
Show originaltitle (international original movie title) of movies in an extra field?0