script.module.metautils dev

  Thread Rating:
  • 2 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Eldorado Offline
Fan
Posts: 508
Joined: May 2009
Reputation: 14
Post: #1
I thought I would get the ball rolling on getting the next piece in the proposed Video Falcon project - common meta data script

I've put the initial version on git - https://github.com/Eldorados/xbmc-metautils

This version was pulled from the master branch of Icefilms, I've basically put everything into it's own script folder and set it up to stand on it's own

Basic functionality:
* addon calls this module with a episode/movie imdb id or title
* module looks in its database for metadata for the required episode/movie
* if not found it scrapes a site for the required metadata, adds it to the database and returns it
* if it is found, it simply returns it from the database
* search by name and return a list of possible matches

How to


When searching by just movie/tv show name I recommend to try and pass in as clean a name as possible, strip anything that is not apart of the actual name eg. many sites display 'The Hangover (2009)', you need to strip (2009) from the name and pass the year in separately

Initialize:
Code:
metaget=metahandlers.MetaData()
You can specify a new cache path by specifying path='<addon data path>' but recommended to use the default

If you wish to download the covers to the cache folders, in prep to release a meta data zip pack to users, specify the preparezip=True option

Code:
metaget=metahandlers.MetaData(preparezip=True)


Search for movie:
Code:
Search by IMDB ID:
meta = metaget.get_meta('movie', movie_name, imdb_id=imdb_id)

Serach by TMDB ID:
meta = metaget.get_meta('movie', movie_name, tmdb_id=tmdb_id)

Search by movie name + year
meta = metaget.get_meta('movie', movie_name, year=year)

Search for tv series:
Code:
Search by IMDB ID:
meta = metaget.get_meta('tvshow',tvshow_name, imdb_id=imdb_id)

Search by name:
meta = metaget.get_meta('tvshow',tvshow_name)

Search for tv show season covers:

By this point you *should* have the imdb id of the tv show, if you don't have it then no results will be returned

get_seasons() returns a dictionary for each season found in order

Code:
season_list = [1,2,3]
seasons = metaget.get_seasons(imdb_id, season_list)

Search for tv show episode:
Code:
season_num = 1
episode_num = 1
episode=metaget.get_episode_meta(imdb_id, season_num, episode_num)

Update watched status:
Will update status in DB from 6 to 7 or reverse depending on what initial value is, 6 = unwatched, 7=watched
Code:
metaget.change_watched('movie', movie_name, imdb_id, year)
or
metaget.change_watched('tvshow', tvshow_name, imdb_id)
or
metaget.change_watched('episode', episode_name, imdb_id, season)


Search for a movie by name and return a list of possible matches:

Code:
search_meta = metaget.search_movies(movie_name)

Returns an array of dictionaries with the following data:
- IMDB ID
- TMDB ID
- Name
- Year


Meta Data being collected


Movies:
Code:
IMDB ID
TMDB ID
Title
Writer
Director
Tagline
Cast & Role
Rating
Duration
Plot
MPAA Rating
Premiered
Year
Genre
Studio
Trailer URL
Thumb URL
Cover URL
Backdrop/Fanart URL
Overlay (watched status)

TV Shows:
Code:
IMDB ID
TheTVDB ID
Title
Rating
Duration
Plot
MPAA Rating
Premiered
Genre
Studio
Cast & Role
Trailer URL
Thumb URL
Cover URL
Backdrop/Fanart URL
Overlay (watched status)

Seasons:
Code:
IMDB ID
TheTVDB ID
Season #
Cover URL
Overlay (watched status)

Episodes:
Code:
IMDB ID
TheTVDB ID
Episode ID
Season #
Episode #
Title
Director
Writer
Plot
Rating
Premiered
Poster URL
Overlay (watched status)

To Do's

- LOTS of code cleanup/optimization
- add more meta - director, writers, cast DONE
- fleshing out methods and how they are called
- fix unicode problems, possibly integrating with t0mm0.common DONE
- metacontainers needs attention
- create a metacontainer zip file to optionally download instead of creating a blank DB ala icefilms

All welcome who are wanting to help!
(This post was last modified: 2011-10-24 18:38 by Eldorado.)
find quote
t0mm0 Offline
Fan
Posts: 486
Joined: Mar 2011
Reputation: 8
Location: UK
Post: #2
Eldorado Wrote:I thought I would get the ball rolling on getting the next piece in the proposed Video Falcon project - common meta data script

I've put the initial version on git - https://github.com/Eldorados/xbmc-metautils

This version was pulled from the master branch of Icefilms, I've basically put everything into it's own script folder and set it up to stand on it's own

There are some to-do's left in the code as well as quite a bit of Icefilms specific coding, once that stuff is cleaned up I'm optimistic that it could be very close to being release ready

Scraping of metadata for movies (TMDB) and tvshows (TheTVDB) is currently working using IMDB ID's, as an enhancement it would be nice to add ability to search based on simply movie/tv show name

All welcome who are wanting to help!

hi eldorado!

is there a specification somewhere for what this bit is meant to do? i'm not very familiar with the metadata stuff so would be interested in what function this bit fills!

t0mm0
find quote
rogerthis Offline
Donor
Posts: 216
Joined: Apr 2011
Reputation: 1
Location: Connacht
Post: #3
Eldorado, could you get an example of how to execute this in an addon. I had a quick look at the code and I don't know where to start.
find quote
Eldorado Offline
Fan
Posts: 508
Joined: May 2009
Reputation: 14
Post: #4
Hi guys, I haven't dug into it yet to do up an example.. though the master branch of Icefilms currently uses it, so maybe a good place to look if you want to jump in?

Basically it needs work before it can be used in any sort of way outside of Icefilms as I said it has quite a bit of specific coding done

t0mm0, this was another item on the Video Falcon list:

http://forum.xbmc.org/showthread.php?tid=99384

Quote:script.module.metahandlers
Metahandlers will be the module that can get and cache metadata.
As well as build/download and install metacontainers of pre-packaged metadata for sites (you provide it with a list of all content, and it will pre-make a cache of metadata for that list).
find quote
t0mm0 Offline
Fan
Posts: 486
Joined: Mar 2011
Reputation: 8
Location: UK
Post: #5
Eldorado Wrote:t0mm0, this was another item on the Video Falcon list:

http://forum.xbmc.org/showthread.php?tid=99384

so am i right in thinking what it needs to do is...
  • addon calls this module with a episode/movie imdb id or title
  • module looks in its database for metadata for the required episode/movie
  • if not found it scrapes a site for the required metadata, adds it to the database and returns it
  • if it is found, it simply returns it from the database

is it possible to use the existing xbmc scraper modules rather than having separately maintained ones? (just asking the question - i know nothing about metadata in xbmc)

i assume the intention is to maintain a central database so that if a movie is added from one addon its metadata will be available from all others using the module?

seems what is really needed is to be able to add stuff to the main xbmc library. there is also the hack that is doing the rounds at the moment with creating loads of strm files which is trying to solve the same problem i guess?

also there is the mention of building pre-packaged metadata bundles - this sounds like a nightmare to me but maybe there is a particular use?

there should probably be a definition of what metadata is required. does it include posters/thumbs for example? maybe this would also be a good place to track watched status (especially as it would wok across addons) while we can't do it in xbmc?

(i always find it better to try and define what something is supposed to do before writing code - might save rewriting it too much later. i hope the questions above aren't too silly - as i say i don't know anything about metadata in xbmc Confused)

t0mm0

ps. eldorado you need to add .pyo (and .pyc while you are at it) files to your .gitignore file in this repo!
find quote
slyi Offline
Junior Member
Posts: 40
Joined: Sep 2011
Reputation: 0
Post: #6
I was also tinkering with meta data updates using asynchronous methods for icefilms see demo on http://dl.dropbox.com/u/6589941/asyncmet...ncmeta.zip

I think a generic system should only download whats requested at the time and be text only (no images) as these are better stored online rather filling the limited hd of embedded devices apple tv etc...

I'd be interested in helping on this aswell, can you provide a sample that works with ice films v12?
find quote
Eldorado Offline
Fan
Posts: 508
Joined: May 2009
Reputation: 14
Post: #7
t0mm0 Wrote:so am i right in thinking what it needs to do is...
  • addon calls this module with a episode/movie imdb id or title
  • module looks in its database for metadata for the required episode/movie
  • if not found it scrapes a site for the required metadata, adds it to the database and returns it
  • if it is found, it simply returns it from the database

I think you nailed it here, pretty much what I was thinking the main functions should be, I'll add this to my op



t0mm0 Wrote:is it possible to use the existing xbmc scraper modules rather than having separately maintained ones? (just asking the question - i know nothing about metadata in xbmc)

Very good question and one I've been asking myself too!

Hoping someone can jump in with the knowledge to give a yay or nay, as your right it's very redundant and quite a bit of extra work to write and maintain a separate scraper

t0mm0 Wrote:i assume the intention is to maintain a central database so that if a movie is added from one addon its metadata will be available from all others using the module?

seems what is really needed is to be able to add stuff to the main xbmc library. there is also the hack that is doing the rounds at the moment with creating loads of strm files which is trying to solve the same problem i guess?

also there is the mention of building pre-packaged metadata bundles - this sounds like a nightmare to me but maybe there is a particular use?

there should probably be a definition of what metadata is required. does it include posters/thumbs for example? maybe this would also be a good place to track watched status (especially as it would wok across addons) while we can't do it in xbmc?

(i always find it better to try and define what something is supposed to do before writing code - might save rewriting it too much later. i hope the questions above aren't too silly - as i say i don't know anything about metadata in xbmc Confused)

t0mm0

All good points, I guess initially I was thinking to basically get this module running on it's own first and keeping all the current functionality that it is performing with Icefilms - pulling in all metadata from plot, genre, cast, thumbnail etc. storing it all in a local cache accessible by any addon, set those as it's initial boundaries then work towards defining what phase 2/enhancements should be.. eg as you said adding to main library, pre-packaged meta containers etc


t0mm0 Wrote:ps. eldorado you need to add .pyo (and .pyc while you are at it) files to your .gitignore file in this repo!

Eeee.. I usually make sure I don't copy those files Smile
find quote
Eldorado Offline
Fan
Posts: 508
Joined: May 2009
Reputation: 14
Post: #8
slyi Wrote:I was also tinkering with meta data updates using asynchronous methods for icefilms see demo on http://dl.dropbox.com/u/6589941/asyncmet...ncmeta.zip

I think a generic system should only download whats requested at the time and be text only (no images) as these are better stored online rather filling the limited hd of embedded devices apple tv etc...

I'd be interested in helping on this aswell, can you provide a sample that works with ice films v12?

I'm not sure the user would like a system that has to re-scrape every time you pull up a list of movies, and I'm assuming nor would a site such as TMDB

The Apple TV has I believe 2gig storage space?

Perhaps an option between saving just text vs text & images?

The code I have posted only works with the master branch of Icefilms due to the number of changes, Anarchintosh had said it was 95% complete, don't see any notes on what is left to do..

If you need a v12 version you can simply pull it from the current addon folder, will need to modify to remove all the icefilms specific logic
find quote
anarchintosh Offline
Senior Member
Posts: 288
Joined: Jul 2010
Reputation: 4
Post: #9
asynchronous updates are only realistic if you only use low quality thumbnail cover art instead of high quality cover art, which is a bit of a sad tradeoff.
find quote
Eldorado Offline
Fan
Posts: 508
Joined: May 2009
Reputation: 14
Post: #10
I've done some small updates

- removed all (that I could find) icefilms specific coding, which at quick glance appeared to be scraping the icefilms site for metadata if a IMDB id did not exist, possibly something like this might be useful for other sites, something to keep in mind for updates

- small changes to use getAddonInfo('path') and sqlite3

Below is a quick example on how to scrape for a movie or tv show, the metadata will be stored in a sql db in the addon_data folder

Code:
from metautils import metahandlers, metacontainers
    metapath = xbmc.translatePath('special://profile/addon_data/script.module.metautils/meta_cache')
    metaget=metahandlers.MetaData(metapath,preparezip = True)
    meta = metaget.get_meta('tt1499658','movie','Horrible Bosses')
    print meta

Output:
Code:
{'rating': 8.1999999999999993, 'genres': u'Comedy', 'name': u'Horrible Bosses', 'tmdb_id': u'51540', 'plot': 'Starring : \nJennifer Aniston, Jason Bateman, Charlie Day, Jason Sudeikis, Colin Farrell\n\nPlot : \nAfter three friends realize that their bosses are standing in the way of their happiness, they come up with a murderous plot, hoping to better their lives.', 'mpaa': u'R', 'studios': u'New Line Cinema', 'premiered': u'2011-07-08', 'imdb_id': u'tt1499658', 'imgs_prepacked': u'true', 'cover_url': u'http://cf1.imgobject.com/posters/9d8/4e258b037b9aa11b5c0009d8/horrible-bosses-cover.jpg', 'duration': 100, 'watched': 6, 'thumb_url': u'http://cf1.imgobject.com/posters/9d8/4e258b037b9aa11b5c0009d8/horrible-bosses-thumb.jpg', 'trailer_url': u'http://www.youtube.com/watch?v=mh9cG5dzs-U', 'backdrop_url': u'http://cf1.imgobject.com/backdrops/079/4dae1a515e73d67899000079/horrible-bosses-original.jpg'}

Still quite a bit left to do which mainly consists of code cleanup

Also looking for info on the possibility of using the existing TMDB and TVDB scrapers so that a second set does not need to be maintained

edit - found my answer, no can do currently : http://forum.xbmc.org/showthread.php?tid...ht=scraper
(This post was last modified: 2011-09-24 00:11 by Eldorado.)
find quote
k_zeon Offline
Senior Member
Posts: 191
Joined: Aug 2011
Reputation: 0
Post: #11
hey Eldorado.

how would you intergrate this into an addon.
ie where would you put the call to get the info.. and how would XBMC know there is data to show.

thanks
find quote
Eldorado Offline
Fan
Posts: 508
Joined: May 2009
Reputation: 14
Post: #12
k_zeon Wrote:hey Eldorado.

how would you intergrate this into an addon.
ie where would you put the call to get the info.. and how would XBMC know there is data to show.

thanks

The call would be as you are adding either directories or video items, currently you must have a imdb id for it to scrape

Using what is returned you can put it into a new dict with the proper labels

eg.

liz = xbmcgui.ListItem()
infoLabels = []
infoLabels['genre'] = str(meta['genres'])
infoLabels['duration'] = str(meta['duration'])
infoLabels['premiered'] = str(meta['premiered'])
infoLabels['studio'] = meta['studios']
infoLabels['mpaa'] = str(meta['mpaa'])
infoLabels['code'] = str(meta['imdb_id'])
infoLabels['rating'] = float(meta['rating'])

liz.setInfo(type="Video", infoLabels=infoLabels)

Or if you are using t0mmo's common library it looks like you can just pass infoLabels in:

add_video_item({'url': url},{infoLabels},img=thumb)


This is still very much in dev so just use for testing for now
(This post was last modified: 2011-09-24 04:15 by Eldorado.)
find quote
k_zeon Offline
Senior Member
Posts: 191
Joined: Aug 2011
Reputation: 0
Post: #13
Eldorado Wrote:The call would be as you are adding either directories or video items, currently you must have a imdb id for it to scrape

Using what is returned you can put it into a new dict with the proper labels

eg.

liz = xbmcgui.ListItem()
infoLabels = []
infoLabels['genre'] = str(meta['genres'])
infoLabels['duration'] = str(meta['duration'])
infoLabels['premiered'] = str(meta['premiered'])
infoLabels['studio'] = meta['studios']
infoLabels['mpaa'] = str(meta['mpaa'])
infoLabels['code'] = str(meta['imdb_id'])
infoLabels['rating'] = float(meta['rating'])

liz.setInfo(type="Video", infoLabels=infoLabels)

Or if you are using t0mmo's common library it looks like you can just pass infoLabels in:

add_video_item({'url': url},{infoLabels},img=thumb)


This is still very much in dev so just use for testing for now

so basically

add_video_item({'url': url},{genre='Action',duration='102 mins',premiered='xxxxx' etc },img=thumb)

of would it be slightly different. havent tried it yet.
find quote
Eldorado Offline
Fan
Posts: 508
Joined: May 2009
Reputation: 14
Post: #14
k_zeon Wrote:so basically

add_video_item({'url': url},{genre='Action',duration='102 mins',premiered='xxxxx' etc },img=thumb)

of would it be slightly different. havent tried it yet.

Just a slight correction:

add_video_item({'url': url},{'genre': 'Action','duration':'102 mins','premiered': 'xxxxx' etc },img=thumb)
find quote
k_zeon Offline
Senior Member
Posts: 191
Joined: Aug 2011
Reputation: 0
Post: #15
Eldorado Wrote:Just a slight correction:

add_video_item({'url': url},{'genre': 'Action','duration':'102 mins','premiered': 'xxxxx' etc },img=thumb)

ahh thanks.

Does this mean that i would need to scrape an IMDB number from each movie the first time round.
The new TVShack has an IMDB number once you click into the page, so what i would need to do is
1.get the webpage and scrape the IMBD number
2.use metautils to get the movie information
3. Then call the add_directory and place all info as you mentioned for each movie.

If movie found then would scrape info . next time if data already there would it not scrape the info.
find quote
Post Reply