Developing an Amazon Movie Scraper

  Thread Rating:
  • 1 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
gyrene2083 Offline
Senior Member
Posts: 202
Joined: Oct 2008
Reputation: 0
Location: New York City
Post: #31
jelockwood,

I am looking forward to testing your scraper out. I have been following this thread since Oct. I have found many of the DVD's on IMDB don't have coverart, where as Amazon does, as well as missing dvd information. I appreciate all your efforts, and I am looking forward to testing this out.

Advice on Hardware
PC-Shuttle XPC SP35P2 CPU-Intel E8400 3Ghz RAM-4GB Video Card-Nvidia 9400GT
OS-Win7 32bit - XBMC PRE-11.0 Git:20110417-1f1bbfa (Compiled:Jul 2 2011) SKIN - Aeon NOX Keyboard - DiNovo Mini

-Semper Fi
gyrene2083
find quote
jelockwood Offline
Senior Member
Posts: 111
Joined: Mar 2008
Reputation: 0
Post: #32
gyrene2083 Wrote:jelockwood,

I am looking forward to testing your scraper out. I have been following this thread since Oct. I have found many of the DVD's on IMDB don't have coverart, where as Amazon does, as well as missing dvd information. I appreciate all your efforts, and I am looking forward to testing this out.

If the only problem is cover art, then you could use the IMDB scraper, and manually select a local picture, or put a picture in the directory with a .tbn file extension. I wrote the scrapers because some titles are not listed at all on IMDB and I still wanted to include them in the XBMC library.

The download link is now live so you can give it a go.
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #33
both are now sitting in svn (r16563). cheers again!
find quote
jelockwood Offline
Senior Member
Posts: 111
Joined: Mar 2008
Reputation: 0
Post: #34
I just tried using the Amazon scrapers I mostly wrote for the first time for several weeks, and damn they don't work any more for me.

Currently neither is finding any results (so it is not simply an issue of scraping info from a selected result). This was the original problem that I had (constructing a correct query in the scraper, and then getting/showing the list of results). This was originally solved by C-Quel generously providing his original Amazon scraper effort which I then finished off.

Could anyone else confirm whether the Amazon scrapers (either US or UK) are currently still working for them, and if so what DVD title they used successfully.

If on the other hand, other users confirm it is broken, would anyone be able to assist in diagnosing it?

What held me up last time, is that I could not (without a LAN packet sniffer) see what request the scraper sent out, and what result it got back from Amazon and then be able to see how far it got. Once I got past this and moved on to scraping the film info, this could be easily tested by seeing how many fields successfully returned results.
find quote
C-Quel Offline
Retired Team-Kodi Member
Posts: 1,375
Joined: Aug 2004
Reputation: 0
Post: #35
Try this...

change Get SearchResults from

imageColumn"[^:]*a href="([^"]*)"[^:]*[^>]*alt="([^"]*)"

productTitle"><a href="([^"]*)"> ([^<]*)</a>

or properly formatted

productTitle&quot;&gt;&lt;a href=&quot;([^&quot;]*)&quot;&gt; ([^&lt;]*)&lt;/a&gt;

might not be perfect as i simply glanced at amazon no tools to hand.

Zotac ID89 + 4GB + 160GB Intel SSD + Samsung UE40D7000 + DS411+II / 2 x 3TB WD RED CAVIAR (TVHeadend Package + 4 Tuners) + Fibaro HC2 Home Automation Intergration!

^^^

Fucking awesome springs to mind :)

iNerd Store

iNerd Forum
find quote
jelockwood Offline
Senior Member
Posts: 111
Joined: Mar 2008
Reputation: 0
Thumbs Up   
Post: #36
I have already thanked C-Quel (again) via a private message, but this fix does so far look successful. I will do some more testing and then put updated versions on my download page and issue a request for them to be included as updated and fixed versions in XBMC.

Many thanks again to C-Quel and everyone else who has helped out in the past.

C-Quel Wrote:Try this...

change Get SearchResults from

imageColumn"[^:]*a href="([^"]*)"[^:]*[^>]*alt="([^"]*)"

productTitle"><a href="([^"]*)"> ([^<]*)</a>

or properly formatted

productTitle&quot;&gt;&lt;a href=&quot;([^&quot;]*)&quot;&gt; ([^&lt;]*)&lt;/a&gt;

might not be perfect as i simply glanced at amazon no tools to hand.
find quote
mkortstiege Offline
Team-XBMC Developer
Posts: 2,907
Joined: Jan 2008
Reputation: 8
Location: Germany
Post: #37
jelockwood Wrote:I have already thanked C-Quel (again) via a private message, but this fix does so far look successful. I will do some more testing and then put updated versions on my download page and issue a request for them to be included as updated and fixed versions in XBMC.

Many thanks again to C-Quel and everyone else who has helped out in the past.

Please use our tracker instead and attach a unified diff to the previous scraper.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules
For troubleshooting and bug reporting please make sure you read this first.
find quote
ultrabrutal Offline
Posting Freak
Posts: 952
Joined: Feb 2005
Reputation: 0
Location: South of Heaven
Post: #38
Amazon does not give permission to get info via http. They have a webservice to use which is legal, however you have to delete the info after 3 months... hehe this means that movies should automaticly start to disappear from the library if they were scanned via Amazon webservice scrapper Wink
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #39
both scrapers disabled in svn
find quote
nekrosoft13 Offline
Fan
Posts: 491
Joined: Dec 2008
Reputation: 1
Post: #40
ultrabrutal Wrote:Amazon does not give permission to get info via http. They have a webservice to use which is legal, however you have to delete the info after 3 months... hehe this means that movies should automaticly start to disappear from the library if they were scanned via Amazon webservice scrapper Wink

you gonna ruin everything
find quote
jelockwood Offline
Senior Member
Posts: 111
Joined: Mar 2008
Reputation: 0
Post: #41
Ok, I did some more testing of this updated version and found a couple more issues.

1. There was a problem with processing the DVD title on some entries on Amazon.co.uk due to the fact some titles are formatted different to others, I believe I have successfully modified the scraper to better cope with this.

2. I took the opportunity to add support for scraping the DVD "Writers" information if available on the Amazon pages. This applies to both the Amazon.com and Amazon.co.uk versions.

I have put the updated versions at this URL for those keen to get it before it appears in the next XBMC release.

http://homepage.mac.com/jelockwood/scrapers.html
find quote
Gamester17 Offline
Team-XBMC Forum Moderator
Posts: 10,523
Joined: Sep 2003
Reputation: 10
Location: Sweden
Post: #42
Thanks, however please (always) create a new ticket on trac for each new scraper update:
http://trac.xbmc.org (unified diff if possible, or better yet both diff and the full file).

Thanks again Big Grin

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
jelockwood Offline
Senior Member
Posts: 111
Joined: Mar 2008
Reputation: 0
Post: #43
Gamester17 Wrote:Thanks, however please (always) create a new ticket on trac for each new scraper update:
http://trac.xbmc.org (unified diff if possible, or better yet both diff and the full file).

Thanks again Big Grin

I have reopened and updated the original Trac I used to submit the first version. The purpose of the previous message (from me) was to let those people know what was happening who have been following this thread.
find quote
Gamester17 Offline
Team-XBMC Forum Moderator
Posts: 10,523
Joined: Sep 2003
Reputation: 10
Location: Sweden
Post: #44
Please create new trac tickets for updates if and when the old ticket been closed, instead of reopening the old ticket, (only if the old ticket has never closed then it is OK to posts updates to it), this process is to make tracking management easier.

Thanks again! Nod

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
C-Quel Offline
Retired Team-Kodi Member
Posts: 1,375
Joined: Aug 2004
Reputation: 0
Post: #45
Neither does iMDB

ultrabrutal Wrote:Amazon does not give permission to get info via http. They have a webservice to use which is legal, however you have to delete the info after 3 months... hehe this means that movies should automaticly start to disappear from the library if they were scanned via Amazon webservice scrapper Wink

Zotac ID89 + 4GB + 160GB Intel SSD + Samsung UE40D7000 + DS411+II / 2 x 3TB WD RED CAVIAR (TVHeadend Package + 4 Tuners) + Fibaro HC2 Home Automation Intergration!

^^^

Fucking awesome springs to mind :)

iNerd Store

iNerd Forum
find quote