script.module.metautils dev

  Thread Rating:
  • 2 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
t0mm0 Offline
Fan
Posts: 486
Joined: Mar 2011
Reputation: 8
Location: UK
Post: #31
k_zeon Wrote:i think it is to do with UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 32: ordinal not in range(128)
if i comment out the metahandler bits it load the menu's fine.


Code:
infoLabels['plot'] = str(meta['plot'])
                                            UnicodeEncodeError: 'ascii' codec can't encode character u'\xd6' in position 32: ordinal not in range(128)
21:53:03 T:118956032 M:100352000    INFO: -->End of Python script error report<--

you shouldn't be casting stuff to str like that or you will run into all sorts of unicode problems as you are finding out. you need to make sure all your text is properly encoded unicode in order to cope with all the accents and other 'odd' characters. as i said earlier, if you use t0mm0.common.net to grab the page the unicode stuff should all be sorted out for you.

also wouldn't it be better to make it so that the dict returned by the meta stuff was ready to be used as an infolabel rather than having to mess with it in the addon?

t0mm0
find quote
k_zeon Offline
Senior Member
Posts: 217
Joined: Aug 2011
Reputation: 0
Post: #32
t0mm0 Wrote:you shouldn't be casting stuff to str like that or you will run into all sorts of unicode problems as you are finding out. you need to make sure all your text is properly encoded unicode in order to cope with all the accents and other 'odd' characters. as i said earlier, if you use t0mm0.common.net to grab the page the unicode stuff should all be sorted out for you.

also wouldn't it be better to make it so that the dict returned by the meta stuff was ready to be used as an infolabel rather than having to mess with it in the addon?

t0mm0

Hi t0mm0
I have used html = net.http_GET(url).content to get the page
is this what you mean or is there something else that i need to do.?

ie
Code:
html = net.http_GET(url).content
        match=re.compile('<li class="searchList"><a href="(.+?)">(.+?)</a> <span>(.+?)</span>').findall(html)
    
        metaget=metahandlers.MetaData()
        
        for url,name,sYear in match:
            sYear = sYear.replace('(','')
            sYear = sYear.replace(')','')
            name = name.replace(':','')
            
            meta = metaget.get_meta('', 'movie', name, sYear)
            infoLabels = create_infolabels(meta, name)
            addon.add_directory({'mode' : 'GetMovieSource', 'url' : url}, name , total_items=len(match))

not too sure what you mean with 'dict returned by the meta stuff was ready to be used as an infolabel' as i am just using eldorado's example

would really appreciate an example , if time permits.

tks
find quote
Eldorado Offline
Fan
Posts: 520
Joined: May 2009
Reputation: 14
Post: #33
t0mm0 Wrote:you shouldn't be casting stuff to str like that or you will run into all sorts of unicode problems as you are finding out. you need to make sure all your text is properly encoded unicode in order to cope with all the accents and other 'odd' characters. as i said earlier, if you use t0mm0.common.net to grab the page the unicode stuff should all be sorted out for you.

also wouldn't it be better to make it so that the dict returned by the meta stuff was ready to be used as an infolabel rather than having to mess with it in the addon?

t0mm0

Oh it would! It's on my cleanup list.. not sure why they dumped them to different labels, thus forcing us to put it into a new dict

I'll look into integrating with your common module for the unicode stuff.. yet another item on the list Smile
find quote
t0mm0 Offline
Fan
Posts: 486
Joined: Mar 2011
Reputation: 8
Location: UK
Post: #34
Eldorado Wrote:Oh it would! It's on my cleanup list.. not sure why they dumped them to different labels, thus forcing us to put it into a new dict

I'll look into integrating with your common module for the unicode stuff.. yet another item on the list Smile

hehe! why is it that todo lists always get longer and never shorter - i have the exact same problem Wink
find quote
Eldorado Offline
Fan
Posts: 520
Joined: May 2009
Reputation: 14
Post: #35
I've been lots of cleanups today

- reworked so that it uses t0mm0's common library for the http get's, corrects any unicode problems
- added new fields to movies and renamed others - director, writer, tagline, cast
- now sends back a dict that can be passed directly as infoLabels

I will push the updates later today


Can anyone help out on how to format the string that is passed into the 'cast' infolabel? I haven't been able to get any cast members to show up as yet..
find quote
slyi Offline
Junior Member
Posts: 40
Joined: Sep 2011
Reputation: 0
Post: #36
This will sound a little dumb, but when i was tinkering with the async i had issues with the cast in the info dialog aswell but i found that needed click the "cast" toggle button not just select it.
Great work on the progress, would you consider refactoring the code to use base class that each scraper would implement?
find quote
Eldorado Offline
Fan
Posts: 520
Joined: May 2009
Reputation: 14
Post: #37
slyi Wrote:This will sound a little dumb, but when i was tinkering with the async i had issues with the cast in the info dialog aswell but i found that needed click the "cast" toggle button not just select it.
Great work on the progress, would you consider refactoring the code to use base class that each scraper would implement?

With the confluence skin, and maybe others, you do need to click on 'Cast' to toggle the view.. but so far I can't get anything to display

Have you been able to populate cast? If so what format did you pass in? So far I'm just basically trying to send in one name to see if it will show

Not sure what you mean on using a base class, can you elaborate on that?
find quote
slyi Offline
Junior Member
Posts: 40
Joined: Sep 2011
Reputation: 0
Post: #38
What i mean by base class is to create one class interface like https://github.com/t0mm0/xbmc-urlresolve...erfaces.py that other meta providers can implement and thus standarise and simple your api and code

For cast it is simply a string list below for imdbapi.com
PHP Code:
imdbInfo['cast'] = re.split(r"\s*[,]\s*"jsonData['Actors']) 

You were also mentioning issues finding tv shows/movie by name/year, other scraper found that search engine api provides optimial results as they do the normalizing, word stemming, pagerank and spelling to find the closest match.
eg: http://api.bing.com/rss.aspx?source=web&...ess%201983 for a mispelt risKy bUsiness film query
or googles http://code.google.com/apis/websearch/docs/#fonje
(This post was last modified: 2011-10-06 00:22 by slyi.)
find quote
slyi Offline
Junior Member
Posts: 40
Joined: Sep 2011
Reputation: 0
Post: #39
Can I make one more suggestion, instead of downloading one megapack of data/images how about just one zipped json file of the imdb data & imdb/fanart img urls of all icefilms movies a-z and tv shows a-z (without the episodes details). Once downloaded parse and import into your DB. Then the end users only need to scrape new idividual shows/movies and uncommon movies/shows not part of the icefilms. That would provide the majority of the info upfront and reduce the traffic to the scraped sites.
find quote
Eldorado Offline
Fan
Posts: 520
Joined: May 2009
Reputation: 14
Post: #40
slyi Wrote:What i mean by base class is to create one class interface like https://github.com/t0mm0/xbmc-urlresolve...erfaces.py that other meta providers can implement and thus standarise and simple your api and code

I see what you mean, basically to create more scrapers to use

That would likely entail an entire rewrite of the code (of which I am just cleaning up), but I also wonder what would be the benefit?

Currently it uses TMDB and TVDB as it's main sources, then uses IMDB to fill any missing holes, do we need more?

Also, I would hate to go down this road then in a couple months the XBMC dev's make it so that I am able to call the existing built-in scrapers

slyi Wrote:For cast it is simply a string list below for imdbapi.com
PHP Code:
imdbInfo['cast'] = re.split(r"\s*[,]\s*"jsonData['Actors']) 

Ah.. a list is probably the one type I didn't try to pass in! Smile

I was hoping to get a similar cast view as what you see in the standard library where for each cast member it has a small picture and the role they played.. any ideas?

slyi Wrote:You were also mentioning issues finding tv shows/movie by name/year, other scraper found that search engine api provides optimial results as they do the normalizing, word stemming, pagerank and spelling to find the closest match.
eg: http://api.bing.com/rss.aspx?source=web&...ess%201983 for a mispelt risKy bUsiness film query
or googles http://code.google.com/apis/websearch/docs/#fonje

Thanks I will give this a look into!
(This post was last modified: 2011-10-06 16:21 by Eldorado.)
find quote
Eldorado Offline
Fan
Posts: 520
Joined: May 2009
Reputation: 14
Post: #41
slyi Wrote:Can I make one more suggestion, instead of downloading one megapack of data/images how about just one zipped json file of the imdb data & imdb/fanart img urls of all icefilms movies a-z and tv shows a-z (without the episodes details). Once downloaded parse and import into your DB. Then the end users only need to scrape new idividual shows/movies and uncommon movies/shows not part of the icefilms. That would provide the majority of the info upfront and reduce the traffic to the scraped sites.

I'm not sure what the advantage would be of using a json file over supplying an actual copy of the database + images folder?

Whichever method we go will give the same results - a pre-loaded database to reduce the load on the sites

The images themselves will be packed in the zip file as well, one of the functions is to download the cover and save locally so that it doesn't need to be requested from the site many times
find quote
Eldorado Offline
Fan
Posts: 520
Joined: May 2009
Reputation: 14
Post: #42
I would like some input on meta data to be included

Currently I have:

- title
- code (imdb id)
- overlay (watched status)
- writer
- director
- tagline
- cast - still working on
- rating (from imdb)
- duration
- plot
- mpaa rating
- premiered
- genre(s)
- studio(s)
- trailer url
- cover/thumb
- fanart

Too much, other items to suggest?

Basically I would like to ensure there is enough data to please most, but also don't want to go over board and fill up everyone's disk space with too much
find quote
t0mm0 Offline
Fan
Posts: 486
Joined: Mar 2011
Reputation: 8
Location: UK
Post: #43
Eldorado Wrote:I'm not sure what the advantage would be of using a json file over supplying an actual copy of the database + images folder?

Whichever method we go will give the same results - a pre-loaded database to reduce the load on the sites

The images themselves will be packed in the zip file as well, one of the functions is to download the cover and save locally so that it doesn't need to be requested from the site many times

i guess it would be useful if it makes it easier to merge the downloaded database in to the existing one - can you do that if you just download a db file? remember unlike icefilms this database will be shared between addons, so if addon1 has a pre-filled db, you don't want it to remove anything that addon2 or addon3 might have already added. we'll also need to think about how to handle duplicates...

t0mm0
find quote
Eldorado Offline
Fan
Posts: 520
Joined: May 2009
Reputation: 14
Post: #44
t0mm0 Wrote:i guess it would be useful if it makes it easier to merge the downloaded database in to the existing one - can you do that if you just download a db file? remember unlike icefilms this database will be shared between addons, so if addon1 has a pre-filled db, you don't want it to remove anything that addon2 or addon3 might have already added. we'll also need to think about how to handle duplicates...

t0mm0

Are you referring to the current Icefilms DB that users have and merging to this one?

If so the two will not be compatible due to the amount of changes, new fields, changed field names and datatypes, but perhaps some sort of conversion script could be written

Or were you thinking of sending periodic updates for the db? That might have to be something for a future release, definitely would be useful
find quote
slyi Offline
Junior Member
Posts: 40
Joined: Sep 2011
Reputation: 0
Post: #45
I understand your using the icefilms meta code as a base, and it may sound like a lot to change it for little benefit, but it would mean one far cleaner code for Would you consider using some common functions for get/set db items.

To be honest, i have not been able read the code to well partly because i only started learning python a couple of weeks ago and i wanted to start with clean slate for my own knowledge but there seems to over lapping functions in each scraper tvdb & tmdb. I review your git, so more at the weekend and may send you on patch for review for common code i find.

I believe you want castandrole attribute to show the role details aswell. Looking at http://wiki.xbmc.org/?title=InfoLabels

I would like to see item below

CastAndRole
ListItem.Episode
ListItem.Season
ListItem.TVShowTitle
ListItem.Property(TotalSeasons)
ListItem.Property(TotalEpisodes)
ListItem.Property(WatchedEpisodes)
ListItem.Property(UnWatchedEpisodes)
ListItem.Property(NumEpisodes)
Container.FolderThumb
Container.TvshowThumb
Container.SeasonThumb
Fanart.Image
ListItem.StarRating

The reason i mentioned a json file it that is text rather than binary and partly i don't yet fully understand the sqllite advantages/api's.

My main objection to one massive download is that say i watch x-files re-runs i don't see the need to have all jersey shore data and images polluting my limited HD. ;-)

BTW: For thumbnails/fanart what image format do you use png or dds?
find quote
Post Reply