C-Quel Wrote:try this....
http://pastebin.com/m657636a8
old but with cleanup + minor changes should work
Many, many, many thanks for pointing me towards this. I felt I was getting closer to getting the GetSearchResults working but had not yet succeeded. Yours does work.
For your information I found the following results from your unmodified Amazon scraper.
1. It successfully did a CreateSearchUrl
2. It only listed one result using its GetSearchResults
3. When that result was used it only succeeded in filling in "Studio", "Runtime" and obtaining the movie thumbnail.
I have so far 'improved' it by
1. Listing the entire page of results returned by Amazon (a maximum of twelve results). This was done by adding a repeat command to your GetSearchResults.
2. Filling in the movie "Title", "Year" (the proper film year not the DVD year), and the "Plot".
3. I have very slightly changed the filename you used for the movie thumbnail to one I believe will still return a result in a very few cases yours might not. My modified filename will always return the largest available artwork (usually 500 pixels) whereas yours would only get 500 pixel tall artwork and I believe a very few DVDs may not have artwork available that big.
While I have added/changed the code to also do "Directors" and "Actors", this is not working. My currently not working approach was to have a first regex to get the block listing all the actor(s) or director(s) and then a second regex which is supposed to extract the individual names from that block.
My efforts so far can be obtained from the following link
http://homepage.mac.com/jelockwood/.Publ...ustest.zip
As far as I can see there is no available information on the Amazon product page to do MPAA rating (Amazon use a GIF and no text), nor a tagline or summary, genre, or writer. There might be a way of getting a rating (that is reader score) by using the following text align="absbottom" alt="4.5 out of 5 stars" height="12". Note: There are several entries of this text in a product page and we would always want to look only at the first.
If anyone else would like to help out it would be much appreciated. In particular getting the actor(s) and director(s) working is a priority.
For everyone's benefit this is what the block containing all the actors looks like
Code:
<li> <b>Actors:</b> <a href="/s?ie=UTF8&search-alias=dvd&field-keywords=Charlton%20Heston">Charlton Heston</a>, <a href="/s?ie=UTF8&search-alias=dvd&field-keywords=Edward%20G.%20Robinson">Edward G. Robinson</a>, <a href="/s?ie=UTF8&search-alias=dvd&field-keywords=Dick%20Van%20Patten">Dick Van Patten</a>, <a href="/s?ie=UTF8&search-alias=dvd&field-keywords=Chuck%20Connors">Chuck Connors</a>, <a href="/s?ie=UTF8&search-alias=dvd&field-keywords=Joseph%20Cotten">Joseph Cotten</a></li>And the Directors block is virtually identical
Code:
<li> <b>Directors:</b> <a href="/s?ie=UTF8&search-alias=dvd&field-keywords=Richard%20Fleischer">Richard Fleischer</a></li>
![[Image: badge.gif]](http://www.ohloh.net/projects/149/badge.gif)
Search
Help