Posts: 71
Joined: May 2012
My folder format for example is [Ice Age - The Meltdown (2006)]. When imdb tries to scrap this, it will always pick up
http://www.imdb.com/title/tt0795398/ which is actually the video game. Why isnt it picking up the correct title that is almost 100% match? Also why does imdb pick up video game titles, should it be ignoring those entries?
Posts: 3,204
Joined: May 2008
Reputation:
107
olympia
Team-Kodi Member
Posts: 3,204
Don't you have an nfo with the above link in it?
Posts: 71
Joined: May 2012
nfo's are only created after exporting the library, i'm talking about its first initial scan. or a new scan with newly added content.
Posts: 3,204
Joined: May 2008
Reputation:
107
olympia
Team-Kodi Member
Posts: 3,204
What happens if you refresh the movie? Do you see the correct one in the list?
Posts: 71
Joined: May 2012
2012-11-15, 04:36
(This post was last modified: 2012-11-15, 04:42 by edhen.)
Yh i do see the correct 1 after the 1st option. If it didn't take VG into consideration, then no doubt it would scrap the correct 1.
And i dont know how else i could name it any better, considering "Ice Age 2: the meltdown" is the game and "Ice Age: The Meltdown" is the movie, and taking my folder/file name into consideration, it doesn't have the 2 which should be a better match to the movie name imo. While manual work is possible, this does not happen on just one movie, and some movie's may get overlooked, and I do frequently start a scan from scratch from time too time.
Posts: 71
Joined: May 2012
If i export, then its just going to export what its actually already scrapped as, I dont think you understand the issue i'm stating, the fact if this is a bug or not.
Posts: 3,204
Joined: May 2008
Reputation:
107
olympia
Team-Kodi Member
Posts: 3,204
I understand. I just don't see why you are complaining if very few movies fails to scrape. Also don't see why does it make sense to always re-scrape everything from scratch (with this approach you are also hitting the content providers pretty much by the way).
...and finally, you can always place an nfo file only with the correct imdb link in it and then the scraper will use this to scrape.
Let me know if you found a good and safe way to filter out VG from the IMDb search and I will have a look at it.
Posts: 71
Joined: May 2012
2012-11-17, 23:12
(This post was last modified: 2012-11-17, 23:16 by edhen.)
I never ment to seem that i'm complaining (if i was complaining, I would off actually searched and fixed it my self instead off raising the issue here), I'm raising an issue (2 completely different scenarios). I asked a question, which may had been resulting in a bug as such. Also if this issue was the cause off a overseen Regular expression and easily fixable. Then maybe this can be attended too by a dev off imdb scraper for next release. After all, xbmc is fluid as it is, it only seems right for add-ons and extras to follow suite.
I will also have a look into the nfo file routine, The reason i don't stick with the nfo's is because I've had situations where ive decided too change scrappers which requires the nfos too be removed. Hence start again.
I'll have a look into the code for imdb later today and get back too you, I was hoping someone else might had noticed this and had there own fix too share etc.
Posts: 3,204
Joined: May 2008
Reputation:
107
olympia
Team-Kodi Member
Posts: 3,204
It's not a regular expression and/or a scraper issue. XBMC core sorts the search results returned by the scraper on its own trying to list the best match first.
I am not asking you to look at the imdb scraper code. The scraper obviously use imdb search engine. I meant if you can find a _good_ way on the imdb site to filter out VG, then I will make sure the scraper gets updated with your finding.
Posts: 71
Joined: May 2012
2012-11-18, 01:57
(This post was last modified: 2012-11-18, 02:05 by edhen.)
Well, ive made my own scrapper for imdb about a month ago for a script that was executed after a copy process on my system. My script used curl, and regular expressions on the returned html to extract the information required. Considering i haven't read the code etc, I can only assume that it was the same kinda process. Considering IMDB doesn't have an api available for the public, i can only assume the information gathered is done via curl etc. (specially considering scrappers are broken due to html layout changes). for eg.
http://www.imdb.com/title/tt0303016/ you can see in the title that it says video game, and the same goes for
http://www.imdb.com/title/tt0795398/. It is also shown as (VG) in a search result.
Oh and on top off that there is an advanced search feature which can exclude videogames.
Posts: 71
Joined: May 2012
Also, if i have my script create a nfo file within the movie folder. Do i only need to have
Code:
<movie>
<id>(MOVIE_ID)</id>
</movie>
then when it scans, it will get the movie and the rest off the details automatically?
Posts: 26,215
Joined: Oct 2003
Reputation:
187
Unfortunately not - it's a bit messy. You need the imdb url. IIRC we mainly match on tt######, but there may be some more than that, just drop in the whole URL to be sure.
Posts: 71
Joined: May 2012