Login at Kodi Home

gzusrawx · 2008-10-02, 03:46

Is there currently a way to access the imdb scraping from within a plugin or script? I want to be able to pass imdb urls to xbmc from another source in the video plugin to view the movie information.

Nuka1195 · 2008-10-02, 04:59

there is a theater showtimes plugin that has an imdb module. most of the regex is stolen from the imdb.xml of xbmc.

it may be broken now, but you can see how to do it.

it's in the xbmc-addons svn.

BigBellyBilly · 2008-10-28, 18:44

myTV and DVDProfiler script have IMDb scraping modules. they were originally based on those Nuka mentioned. Were working last time I tried them.

It would be nice to be able to tap into the main XBMC scrappers thou...

**spiff** · 2008-10-28, 18:52

exposing the scrapers to python is certainly doable. it would need some internal reorganizations but those i think we want to do no matter after atlantis.

nate12o6 · 2008-12-05, 23:05

I second this.

This would be great.

nate12o6 · 2008-12-05, 23:22

It would be great if there was an API that would return imdb info when passed an imdb id.

~~Voinage~~ · 2008-12-07, 16:26

Try this function.
It`s far from perfect but it works quite well.

imdb() - pass it a string of the name.
It returns genre,year,image,rating,plot

Code:
def imdb(url):

    req = urllib2.Request('http://www.imdb.com/find?s=all&q='+urllib.quote(url))

    req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')

    response = urllib2.urlopen(req).read()

    alt=re.compile('<b>Media from&nbsp;<a href="/title/(.+?)/">').findall(response)

    if len(alt)>0:

        req = urllib2.Request('http://imdb.com/title/'+alt[0])

        req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')

        response = urllib2.urlopen(req).read()

        genre=re.compile(r'<h5>Genre:</h5>\n<a href=".+?">(.+?)</a>').findall(response)

        year=re.compile(r'<a href="/Sections/Years/.+?/">(.+?)</a>').findall(response)

        image=re.compile(r'<img border="0" alt=".+?" title=".+?" src="(.+?)" /></a>').findall(response)

        rating=re.compile(r'<div class="meta">\n<b>(.+?)</b>').findall(response)

        req = urllib2.Request('http://www.imdb.com/title/'+alt[0]+'/plotsummary')

        req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')

        response = urllib2.urlopen(req).read()

        plot=re.compile('<p class="plotpar">\n(.+?)\n<i>\n').findall(response)

        try:

            if plot[0].find('div')>0:

                plot[0]='No Plot found on Imdb'

        except IndexError: pass

        if len(plot)<1:

            req = urllib2.Request('http://www.imdb.com/title/'+alt[0]+'/synopsis')

            req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')

            plotter = urllib2.urlopen(req).read();clean=re.sub('\n','',plotter)

            plot=re.compile('<div id="swiki.2.1">(.+?)</div>').findall(clean)

            try:

                if plot[0].find('div')>0:

                    plot[0]='No Plot found on Imdb'

            except IndexError:

                plot=['No plot found on Imdb']

        return genre[0],year[0],image[0],rating[0],plot[0]

    else :

        genre=re.compile(r'<h5>Genre:</h5>\n<a href=".+?">(.+?)</a>').findall(response)

        year=re.compile(r'<a href="/Sections/Years/.+?/">(.+?)</a>').findall(response)

        image=re.compile(r'<img border="0" alt=".+?" title=".+?" src="(.+?)" /></a>').findall(response)

        rating=re.compile(r'<div class="meta">\n<b>(.+?)</b>').findall(response)

        bit=re.compile(r'<a class="tn15more inline" href="/title/(.+?)/plotsummary" onClick=".+?">.+?</a>').findall(response)

        try:

            req = urllib2.Request('http://www.imdb.com/title/'+bit[0]+'/plotsummary')

        except: pass

        req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')

        response = urllib2.urlopen(req).read()

        plot=re.compile('<p class="plotpar">\n(.+?)\n<i>\n').findall(response)

        try:

            if plot[0].find('div')>0:

                plot[0]='No Plot found on Imdb'

        except IndexError: pass

        if len(plot)<1:

            try:

                req = urllib2.Request('http://www.imdb.com/title/'+bit[0]+'/synopsis')

            except: pass

            req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14')

            plotter = urllib2.urlopen(req).read();clean=re.sub('\n','',plotter)

            plot=re.compile('<div id="swiki.2.1">(.+?)</div>').findall(clean)

            try:

                if plot[0].find('div')>0:

                    plot[0]='No Plot found on Imdb'

            except IndexError:

                plot=['No Plot found on imdb']

        return genre[0],year[0],image[0],rating[0],plot[0]

Dan Dare · 2009-05-30, 20:30

Exposing the scrapers to Python is probably the way to go - replicating the logic in each Python screen that needs the info thats just mores place to break when the website structure changes...