2012-09-04, 20:45
Hi,
I'm pretty comfortable with bash, cut, grep and awk, but doing the same stuff in py+soup is doing my head in. So far I can fetch the 'desc' class from an IMDB watch list, but I cant turn it into 'keys' or 'variables' that I can do anything useful with. Here is my basic tutorial script:
This will output something like:
Can anyone help me carve this text up into something useful?
------to be more specific, I want: IMDB ID (the tt string (^tt[0-9]{7} as regex)), the imdb URL (/title/id/), the title and of course the thumbnail. (imdbid,url,title,thumnail).
I have imdbpy, which is great for fetching stuff once I have a name or an ID, but here I just want that info for a given watchlist.
I'm pretty comfortable with bash, cut, grep and awk, but doing the same stuff in py+soup is doing my head in. So far I can fetch the 'desc' class from an IMDB watch list, but I cant turn it into 'keys' or 'variables' that I can do anything useful with. Here is my basic tutorial script:
Code:
from bs4 import BeautifulSoup
from mechanize import Browser
import urllib2
import re
url="http://www.imdb.com/user/ur35645275/watchlist"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
movies=soup.findAll('div',{'class':'desc'})
for eachmovie in movies:
# print eachmovie['href']+","+eachmovie.string
print eachmovie
This will output something like:
Quote:<div class="desc">Which is cool and all, but I want clean strings I can feed into xbmc.
<a href="/title/tt0187078/">Gone in Sixty Seconds</a>
</div>
<div class="desc">
<a href="/title/tt0477472/">Solo</a>
</div>
<div class="desc">
<a href="/title/tt0086250/">Scarface</a>
</div>
<div class="desc">
<a href="/title/tt0072890/">Dog Day Afternoon</a>
</div>
Can anyone help me carve this text up into something useful?
------to be more specific, I want: IMDB ID (the tt string (^tt[0-9]{7} as regex)), the imdb URL (/title/id/), the title and of course the thumbnail. (imdbid,url,title,thumnail).
I have imdbpy, which is great for fetching stuff once I have a name or an ID, but here I just want that info for a given watchlist.