[Note: While the examples below use a film title of "Soylent Green" I have manually searched IMDB using a browser to confirm other titles are definitely not listed.]
Surprisingly there is no existing Amazon scraper. As part of an effort to make one myself I started off by looking at the existing scrapers to see how they worked, and following on from this I made some initial efforts to convert the current FilmAffinity scraper to use English results rather than Spanish results (you can download a copy here if you are interested http://homepage.mac.com/jelockwood/.Publ...nityen.zip).
While I have not yet got an Amazon scraper even partially working yet, I have found some important information about the format of the various URLs that Amazon uses.
1. Amazon itself normally replaces spaces in Title searches with a plus (+) symbol, however it does seem to also work with a space (or %20).
A search URL like the following entered in a web-browser all work
Code:
http://www.amazon.com/s/ref=nb_ss_d?url=search-alias=dvd&field-keywords=soylent+green&x=0&y=0Code:
http://www.amazon.com/s/ref=nb_ss_d?url=search-alias=dvd&field-keywords=soylent green&x=0&y=0Code:
http://www.amazon.com/s/ref=nb_ss_d?url=search-alias=dvd&field-keywords=soylent%20green&x=0&y=0and indeed also the slightly shorter
Code:
http://www.amazon.com/s/ref=nb_ss_d?url=search-alias=dvd&field-keywords=soylent%20green2. The URL of a result is normally a rather messy and complicated format like this
Code:
http://www.amazon.com/Soylent-Green-John-Barclay/dp/B0016I0AJG/ref=sr_1_1?ie=UTF8&s=dvd&qid=1217077050&sr=1-1as you can see there would appear to be two different ID numbers plus a text field. However I have been able to determine that the following much simpler form of the URL also works.
Code:
http://www.amazon.com/dp/B0016I0AJG/Therefore we just need to extract the ID number beginning with a B (they all seem to begin with a B).
3. The thumbnail image normally has a URL of the form
Code:
http://ecx.images-amazon.com/images/I/51bU-puSlkL._SL500_AA240_.jpgand the large image a URL of the form
Code:
http://ecx.images-amazon.com/images/I/51bU-puSlkL._SS500_.jpgas you can see the ID number is totally different to anything previously used. However I have also found that the following URL produces the same large image and uses the main ID number from the original URL
Code:
http://ecx.images-amazon.com/images/P/B0016I0AJG.01.L.jpgor the older alternative host name
Code:
http://images.amazon.com/images/P/B0016I0AJG.01.L.jpgNote these forms of the URL must use a P rather than an I.
Based on all the above, would anyone care to assist by coming up with an initial Scraper by coding up the CreateSearchUrl and GetSearchResults sections? I will then try scraping the info fields.
PS. On a different topic, if one has a VIDEO_TS folder in a folder representing the name of the film one can use this folder name for IMDB scraping, however as mentioned not all the DVDs are listed on IMDB, I can see it should be possible to use an NFO file to provide at least some metadata but I am unsure of the correct naming and placement in this scenario.
e.g. /DVDs/Soylent Green/VIDEO_TS/
What should the NFO file be called and in which of the three possible folders (DVDs, Soylent Green, or VIDEO_TS) should it be placed?

![[Image: badge.gif]](http://www.ohloh.net/projects/9132/badge.gif)
Search
Help