Kodi Community Forum
Release Universal Movie Scraper - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Support (https://forum.kodi.tv/forumdisplay.php?fid=33)
+--- Forum: Add-on Support (https://forum.kodi.tv/forumdisplay.php?fid=27)
+---- Forum: Information Providers (scrapers) (https://forum.kodi.tv/forumdisplay.php?fid=147)
+----- Forum: Movie Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=302)
+----- Thread: Release Universal Movie Scraper (/showthread.php?tid=129821)



RE: [Release] Universal Scraper - Martijn - 2012-06-06

(2012-06-06, 11:15)john.doe Wrote: Wow, you make it sound so easy. "Just fix RottenTomatoes". Yeah, let's all fix the nearly 1 million movies listed at RT, going through them one by one and ensuring that they all have an IMDb link. Geeze. Sounds so easy when you put it like that.

Call me when you are done with that.
I already did all my movies on themoviedb.org
Including tagging the posters with the correct language. So that accounts for 1500 of them.
Who says you need to do them all yourselves? Nobody said such a thing. You can just start by doing your own collection for starters.

Quote:ACTUALLY SOLVE THE ISSUE AND CREATE A USABLE SCRAPER INSTEAD OF DREAMING ABOUT MANUALLY EDITING THE METADATA OF NEARLY A MILLION MOVIES AT RT.
Why not ask RT to fix it theirselves. Their could easily write the same code so compare their site with IMDB or TMDB and fix anything that is missing. No manual editing involved. They can fix their data way better/faster and easier than we can.

Quote:The last point is the most important one. Rofl In an ideal world, we could fix RT and link every single movie to its IMDb ID, but in reality it's just too much work. So by doing this very accurate workaround, we'd have an accurate scraper either way. In fact the scraper could log messages suggesting that the person contributes the IMDb <-> RT connection it has found for each movie where no such connection existed.

Like already said by olympia. This is currently not possible within scrapers. Perhaps nearing the final release of Frodo something like this could be done.
So lets put this to rest in this thread (or create another one in the scraper section) and start talking about the scraper itself again





RE: [Release] Universal Scraper - john.doe - 2012-06-06

I agree with everything you just said (that people should contribute, that RT should optimally fix it themselves, and that it was news to me that scraper plugins can't do extra http requests on their own - but oh well). Let's get back to the topic. Again to olympia: Thank you so much for this fantastic plugin; the ability to combine sources really adds control over the database. Love it!


RE: [Release] Universal Scraper - olympia - 2012-06-06

No one said scrapers can't do extra http lookups...
However they can't do that sophisticated comparison of actors (btw that would be a lot of work even with python).

Anyway, since it is not usable, I am thinking about removing the option to scrape ratings from RT.


RE: [Release] Universal Scraper - john.doe - 2012-06-06

(2012-06-06, 13:21)olympia Wrote: No one said scrapers can't do extra http lookups...
However they can't do that sophisticated comparison of actors (btw that would be a lot of work even with python).

Anyway, since it is not usable, I am thinking about removing the option to scrape ratings from RT.

Ah so http requests *are* possible. Well, yes, it's a bit of work. The IMDb cast-list is already available from the initial IMDb scrape. From there, it'd be one http request to RT to find movies matching the IMDb title + year, then parsing the returned page to see if a good candidate match was in the results, if so, it's another request to the RT page for that movie, and from there it's a regexp to extract the cast list. Next, it's a loop / tuple array comparison to check if the top 5 cast listed at IMDb all exist on the RT page that we found. Not a massive amount of work, but definitely a bit tricky to implement...

However, all this being said, I support the decision to remove RT; might as well do that since it'd be hassle to implement the stuff I've described and it still wouldn't be 100% perfect (i.e. "Tucker and Dale vs Evil" at IMDb and "Tucker & Dale vs Evil" at RT; a simple example that would probably give results anyway, but there could be worse examples that mean even the trick wouldn't be able to find the movie at RT after all).

I am happy with a combination of TMDb and IMDb in the Universal Scraper, after having quickly realized that RT matching had these issues.

Would have been really cool to be able to go from IMDb ratings to RT critic ratings + critic summary for movies, but it's not the end of the world. We've survived this far without that feature. Wink

Thanks again for this fantastic add-on.


RE: [Release] Universal Scraper - Martijn - 2012-06-06

(2012-06-06, 15:54)john.doe Wrote: Ah so http requests *are* possible. Well, yes, it's a bit of work. The IMDb cast-list is already available from the initial IMDb scrape. From there, it'd be one http request to RT to find movies matching the IMDb title + year, then parsing the returned page to see if a good candidate match was in the results, if so, it's another request to the RT page for that movie, and from there it's a regexp to extract the cast list. Next, it's a loop / tuple array comparison to check if the top 5 cast listed at IMDb all exist on the RT page that we found. Not a massive amount of work, but definitely a bit tricky to implement...

olympia already said that the matching isn't possibly. You can only request data and not match it between different sites


RE: [Release] Universal Scraper - john.doe - 2012-06-06

(2012-06-06, 15:57)Martijn Wrote:
(2012-06-06, 15:54)john.doe Wrote: Ah so http requests *are* possible. Well, yes, it's a bit of work. The IMDb cast-list is already available from the initial IMDb scrape. From there, it'd be one http request to RT to find movies matching the IMDb title + year, then parsing the returned page to see if a good candidate match was in the results, if so, it's another request to the RT page for that movie, and from there it's a regexp to extract the cast list. Next, it's a loop / tuple array comparison to check if the top 5 cast listed at IMDb all exist on the RT page that we found. Not a massive amount of work, but definitely a bit tricky to implement...

olympia already said that the matching isn't possibly. You can only request data and not match it between different sites

Oh dear, all this time I thought scrapers were written in Python, hence my insistence that this could be done. Just saw now that they're written in XML as a series of regexps without the ability to store/use variables. Indeed, this can't be done as it currently stands (and old plans to extend scrapers to allow python never went anywhere).

RT support probably shouldn't be removed, since you've done a lot of work matching your 1500 movies between RT and IMDb, but the RT support should come with a big warning: "Get ready to manually match RT <-> IMDb IDs for movies and contribute your results, or you won't get ratings for all movies; use RT scraping at your own risk".


RE: [Release] Universal Scraper - olympia - 2012-06-07

(2012-06-03, 17:31)sarakum Wrote: is it possible to include metacritic rating scrape into this add on...becoz the metacritic rating is displayed in imdb itself..so like w can scrape tomatometer as well as average rating from RT, it would be great if we can scrape both imdb as well as metacritic rating from IMDB

Added in v1.2.0


[Release] Universal Scraper - mortstar - 2012-06-09

-1 from me for removing RT as a data source. Loads of developers have mentioned their messy data on the API forum...it is possible that they will fix it. A few more voices from a community as large as XBMC may get things moving too.


RE: [Release] Universal Scraper - gabbott - 2012-06-09

How about an option of choosing between themoviedb and imdb for movie tagline?


RE: [Release] Universal Scraper - krish_2k4 - 2012-06-10

just want butt in and say the scraper is working perfectly! don't know what all the fuss is about lol


RE: [Release] Universal Scraper - orescb - 2012-06-17

Thanks for this great scraper. I have a request though Big Grin. I find my taste for movies is closely replicated by the ratings in filmaffity.com. By now you will have already guessed which is my request Blush. And you are right, it would be splendid if you could add filmaffinity to the sources for ratings in your scraper. It may help the fact that there already is a scraper for filmaffinity (the information it retrieves is in spanish, Confused) and we might borrow some parts of that scraper to integrate them into yours. I do not know much about regular expressions or scrappers (I just been reading the wiki) but I may try to help in case you need it, maybe finding a pattern, bringing coffee Blush or whatever.
Regarding the site filmaffinity.com, it is worth mentioning that it matches your tastes with those of other users so you can see the average rating of people with a similar taste to yours. It would be good if we could capture different ratings of movies (oneself's included) to be displayed in the more information section of xbmc and also be able to display the movies by any of those ratings, but I understand this would need a change in the skin and maybe in xbmc itself.
Finally, I find your scrapper so powerful and great that it could be interesting to merge all other scrapers into yours, if possible involving to the other developers in the process.

Anyhow, great work. Thanks for sharing, and for your effort.


RE: [Release] Universal Scraper - olympia - 2012-06-17

Thanks for your words!

There is strict pre-requisite to include other sites. The movie from any site must be able to find by IMDb ID as a unique identifier.
In case you have a look at all the sources the Universal Scraper work with, all those sites have this ability.

In case you let me know the URL to look up a movie on filmaffity.com by its IMDb ID, I will get you this rating.


RE: [Release] Universal Scraper - orescb - 2012-06-17

Thank you for your quick reply olympia.

I am not sure if I understand what you mean. What I understand is that in order to use a website as a source in your scraper, each movie in the site must be associated with the IMDb ID. If this is the case, how is this association in the sites already used by the scraper? I have been looking at those sites and was unable to find a movie by its IMDb ID using the search field. I also tried to find the ID in the website code and was unsuccessful too. Googling around, I found that I could find a movie in rotten tomatoes using the IMDb ID when submitting the following url in the browser 'http://www.rottentomatoes.com/alias?type=imdbid&s=' followed by the ID, so I understand that they link the movies to the ID. I would appreciate if it is not much of a bother that you could tell me how some of the sites you use in the scraper have the movies linked to the IMDb ID so I can find out if filmaffinity supports this feature.

Thanks a lot


RE: [Release] Universal Scraper - olympia - 2012-06-17

It's a site dependent feature. Knowing how other sites work doesn't mean you will figure this out. If filmaffity.com has a forum it's better to ask there.

Most of the sites I have in the Universal Scraper has an API so it's well definied how it is possible to access a movie by its imdb id:
tmdb: http://api.themoviedb.org/3/movie/[imdbid]?api_key=[apikey]&language=[language]
trakttv: http://api.trakt.tv/movie/summary.json/[apikey]/[imdbid]

and so on. I will not list you all... :p


RE: [Release] Universal Scraper - Hans0815 - 2012-06-18

Hello,

thank you for that nice Scrapers. It works very good.
But, is it possible to change it, that i can use it for video-playlist-files too? So not only avi or mpg or any extensions like that. pls or m3u would be great.
I searched the git of xbmc and could not find any function about that, so i think it must be declarated in the sources of the scrapers maybe.
That would be a very nice feature for me.