Accessing XBMC's IMDb scraping from within a (python) video plugin or script?
#1
Lightbulb 
Is there currently a way to access the imdb scraping from within a plugin or script? I want to be able to pass imdb urls to xbmc from another source in the video plugin to view the movie information.
Reply
#2
there is a theater showtimes plugin that has an imdb module. most of the regex is stolen from the imdb.xml of xbmc.

it may be broken now, but you can see how to do it.

it's in the xbmc-addons svn.
For python coding questions first see http://mirrors.xbmc.org/docs/python-docs/
Reply
#3
myTV and DVDProfiler script have IMDb scraping modules. they were originally based on those Nuka mentioned. Were working last time I tried them.

It would be nice to be able to tap into the main XBMC scrappers thou...
Retired from Add-on dev
Reply
#4
exposing the scrapers to python is certainly doable. it would need some internal reorganizations but those i think we want to do no matter after atlantis.
Reply
#5
I second this.

This would be great.
Reply
#6
It would be great if there was an API that would return imdb info when passed an imdb id.
Reply
#7
Try this function.
It`s far from perfect but it works quite well.

imdb() - pass it a string of the name.
It returns genre,year,image,rating,plot


Code:
def imdb(url):
    req = urllib2.Request('http://www.imdb.com/find?s=all&q='+urllib.quote(url))
    req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')
    response = urllib2.urlopen(req).read()
    alt=re.compile('<b>Media from&nbsp;<a href="/title/(.+?)/">').findall(response)
    if len(alt)>0:
        req = urllib2.Request('http://imdb.com/title/'+alt[0])
        req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')
        response = urllib2.urlopen(req).read()
        genre=re.compile(r'<h5>Genre:</h5>\n<a href=".+?">(.+?)</a>').findall(response)
        year=re.compile(r'<a href="/Sections/Years/.+?/">(.+?)</a>').findall(response)
        image=re.compile(r'<img border="0" alt=".+?" title=".+?" src="(.+?)" /></a>').findall(response)
        rating=re.compile(r'<div class="meta">\n<b>(.+?)</b>').findall(response)
        req = urllib2.Request('http://www.imdb.com/title/'+alt[0]+'/plotsummary')
        req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')
        response = urllib2.urlopen(req).read()
        plot=re.compile('<p class="plotpar">\n(.+?)\n<i>\n').findall(response)
        try:
            if plot[0].find('div')>0:
                plot[0]='No Plot found on Imdb'
        except IndexError: pass
        if len(plot)<1:
            req = urllib2.Request('http://www.imdb.com/title/'+alt[0]+'/synopsis')
            req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')
            plotter = urllib2.urlopen(req).read();clean=re.sub('\n','',plotter)
            plot=re.compile('<div id="swiki.2.1">(.+?)</div>').findall(clean)
            try:
                if plot[0].find('div')>0:
                    plot[0]='No Plot found on Imdb'
            except IndexError:
                plot=['No plot found on Imdb']
                
        return genre[0],year[0],image[0],rating[0],plot[0]
    else :
        genre=re.compile(r'<h5>Genre:</h5>\n<a href=".+?">(.+?)</a>').findall(response)
        year=re.compile(r'<a href="/Sections/Years/.+?/">(.+?)</a>').findall(response)
        image=re.compile(r'<img border="0" alt=".+?" title=".+?" src="(.+?)" /></a>').findall(response)
        rating=re.compile(r'<div class="meta">\n<b>(.+?)</b>').findall(response)
        bit=re.compile(r'<a class="tn15more inline" href="/title/(.+?)/plotsummary" onClick=".+?">.+?</a>').findall(response)
        try:
            req = urllib2.Request('http://www.imdb.com/title/'+bit[0]+'/plotsummary')
        except: pass
        req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')
        response = urllib2.urlopen(req).read()
        plot=re.compile('<p class="plotpar">\n(.+?)\n<i>\n').findall(response)
        try:
            if plot[0].find('div')>0:
                plot[0]='No Plot found on Imdb'
        except IndexError: pass
        if len(plot)<1:
            try:
                req = urllib2.Request('http://www.imdb.com/title/'+bit[0]+'/synopsis')
            except: pass
            req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')
            plotter = urllib2.urlopen(req).read();clean=re.sub('\n','',plotter)
            plot=re.compile('<div id="swiki.2.1">(.+?)</div>').findall(clean)
            try:
                if plot[0].find('div')>0:
                    plot[0]='No Plot found on Imdb'
            except IndexError:
                plot=['No Plot found on imdb']
        return genre[0],year[0],image[0],rating[0],plot[0]
Reply
#8
Exposing the scrapers to Python is probably the way to go - replicating the logic in each Python screen that needs the info thats just mores place to break when the website structure changes...
Reply

Logout Mark Read Team Forum Stats Members Help
Accessing XBMC's IMDb scraping from within a (python) video plugin or script?0