2010-01-09, 08:48
Hello,
I've tried for a few hours here to get various forms of the US amazon scraper working. I've tried the release version. I've tried downloading from the SVN. I've tried countless searches on google, and xbmc forums and had no luck. I turned on debugging and got these results when I attempted to scrape a movie:
My movie collection has several kids movie which IMDB does not handle well. I would love to be able to scrape from amazon. Can anyone help me?
Thanks in advance.
P.S.
Here is the amazonus.xml file.
I've tried for a few hours here to get various forms of the US amazon scraper working. I've tried the release version. I've tried downloading from the SVN. I've tried countless searches on google, and xbmc forums and had no luck. I turned on debugging and got these results when I attempted to scrape a movie:
Code:
01:46:25 T:2899311472 M:524914688 DEBUG: InternalFindMovie: Searching for 'bear in the big blue house' using Amazon US scraper (file: 'amazonus.xml', content: 'movies', language: 'en', date: '2009-05-22', framework: '1.0')
01:46:25 T:2899311472 M:524914688 DEBUG: FileCurl::Open(0xbfa9ef6c) http://www.amazon.com/s/ref=nb_ss_d_h_?url=search-alias%3Ddvd&field-keywords=bear%20in%20the%20big%20blue%20house
01:46:26 T:2899311472 M:524918784 DEBUG: FileCurl::Close(0xbfa9ef6c) http://www.amazon.com/s/ref=nb_ss_d_h_?url=search-alias%3Ddvd&field-keywords=bear%20in%20the%20big%20blue%20house
01:46:26 T:2899311472 M:524918784 DEBUG: scraper: GetSearchResults returned <?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results></results>
01:46:26 T:2899311472 M:524918784 ERROR: Process: Error looking up movie Bear in the big blue house
01:46:26 T:2899311472 M:524918784 DEBUG: Thread 2899311472 terminating
01:46:26 T:3078846352 M:524918784 INFO: Loading skin file: DialogKeyboard.xml
01:46:26 T:3078846352 M:524918784 DEBUG: Load DialogKeyboard.xml: 15.29ms
01:46:26 T:3078846352 M:524918784 DEBUG: ------ Window Init (DialogKeyboard.xml) ------
01:46:26 T:3078846352 M:524918784 DEBUG: Alloc resources: 2.77ms (0.00 ms skin load)
01:46:26 T:3078846352 M:524419072 DEBUG: ------ Window Deinit (DialogProgress.xml) ------
My movie collection has several kids movie which IMDB does not handle well. I would love to be able to scrape from amazon. Can anyone help me?
Thanks in advance.
P.S.
Here is the amazonus.xml file.
Code:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Initial basic version doing Studio and Thumb believed to have been written by C-Quel -->
<!-- Then updated by John Lockwood to scrape Title, Year, MPAA, Runtime, Rating, Votes, Plot, Actors, Directors -->
<!-- This version 1.1 dated 12/01/09 includes fix by C-Quel for processing results from Amazon to match recent change -->
<!-- Version 1.1 also now supports the Writers field -->
<scraper framework="1.0" date="2009-05-22" content="movies" name="Amazon US" thumb="amazonus.png" language="en">
<CreateSearchUrl dest="3">
<RegExp input="$$1" output="<url>http://www.amazon.com/s/ref=nb_ss_d_h_?url=search-alias%3Ddvd&amp;field-keywords=\1</url>" dest="3">
<expression noclean="1"/>
</RegExp>
</CreateSearchUrl>
<GetSearchResults dest="8">
<RegExp input="$$5" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results>\1</results>" dest="8">
<RegExp input="$$1" output="<entity><title>\2</title><url>\1</url></entity>" dest="5">
<expression repeat="yes" clear="yes" noclean="1">productTitle"><a href="([^"]*)">([^<]*)</a></expression>
</RegExp>
<expression clear="yes" noclean="1"/>
</RegExp>
</GetSearchResults>
<GetDetails clearbuffers="no" dest="3">
<RegExp input="$$5" output="<details>\1</details>" dest="3">
<RegExp input="$$1" output="<title>\1</title>" dest="5">
<expression noclean="1"><title>[Amazon.com: ]*([^:]*)</expression>
</RegExp>
<RegExp input="$$1" output="<year>\1</year>" dest="5+">
<expression trim="1">[ \[\(]([0-9]{4})[ \]\)][^<]*</span></expression>
</RegExp>
<RegExp input="$$1" output="<top250>\1</top250>" dest="5+">
<expression>Top 250: #([0-9]*)</a></expression>
</RegExp>
<RegExp input="$$9" output="<mpaa>G</mpaa>" dest="5+">
<RegExp input="$$1" output="\1" dest="9">
<expression><b>Rating: </b>[^_]*/(g)._</expression>
</RegExp>
<expression>(g)</expression>
</RegExp>
<RegExp input="$$9" output="<mpaa>PG</mpaa>" dest="5+">
<RegExp input="$$1" output="\1" dest="9">
<expression><b>Rating: </b>[^_]*/(pg)._</expression>
</RegExp>
<expression>(pg)</expression>
</RegExp>
<RegExp input="$$9" output="<mpaa>PG-13</mpaa>" dest="5+">
<RegExp input="$$1" output="\1" dest="9">
<expression><b>Rating: </b>[^_]*/(pg-13)._</expression>
</RegExp>
<expression>(pg-13)</expression>
</RegExp>
<RegExp input="$$9" output="<mpaa>R</mpaa>" dest="5+">
<RegExp input="$$1" output="\1" dest="9">
<expression><b>Rating: </b>[^_]*/(r)._</expression>
</RegExp>
<expression>(r)</expression>
</RegExp>
<RegExp input="$$9" output="<mpaa>NC-17</mpaa>" dest="5+">
<RegExp input="$$1" output="\1" dest="9">
<expression><b>Rating: </b>[^_]*/(nc-17)._</expression>
</RegExp>
<expression>(nc-17)</expression>
</RegExp>
<RegExp input="$$9" output="<mpaa>UNRATED</mpaa>" dest="5+">
<RegExp input="$$1" output="\1" dest="9">
<expression><b>Rating: </b>[^_]*/(unrated)._</expression>
</RegExp>
<expression>(unrated)</expression>
</RegExp>
<RegExp input="$$1" output="<certification>\1</certification>" dest="5+">
<expression repeat="yes">Classification:</b>[^>]*alt="([0-9]*)"</expression>
</RegExp>
<RegExp input="$$1" output="<tagline>\1</tagline>" dest="5+">
<expression><h5>Tagline:</h5>([^<]*)</expression>
</RegExp>
<RegExp input="$$1" output="<runtime>\1</runtime>" dest="5+">
<expression trim="1">Run Time:</b>[^0-9]*([^<]*)</li></expression>
</RegExp>
<RegExp input="$$1" output="<rating>\1.\2</rating><votes>\3</votes>" dest="5+">
<expression noclean="1">Average Customer Review</b>[^_]*stars-([0-9])-([0-9])[^)]*>([0-9]*) customer reviews</a>\)</expression>
</RegExp>
<RegExp input="$$1" output="<genre>\1</genre>" dest="5+">
<expression repeat="yes">"/Sections/Genres/[^/]*/">([^<]*)</a></expression>
</RegExp>
<RegExp input="$$1" output="<studio>\1</studio>" dest="5+">
<expression>Studio:</b> ([^<]*)</li></expression>
</RegExp>
<RegExp input="$$1" output="<outline>\2</outline><plot>\2</plot>" dest="5+">
<expression trim="1">Plot (Outline|Summary):</h5>([^<]*)</expression>
</RegExp>
<RegExp input="$$1" output="<plot>\1</plot>" dest="5+">
<expression trim="1"><b>Product Description</b><br /[^>]*>([^<]+)</expression>
</RegExp>
<RegExp input="$$1" output="<thumb>\101.L.jpg</thumb>" dest="5+">
<expression noclean="1">"original_image", "([^"]*)AA2[0-9]0_\.jpg"</expression>
</RegExp>
<RegExp input="$$9" output="<credits>\1</credits>" dest="5+">
<RegExp input="$$1" output="\1" dest="9">
<expression noclean="1"><b>Writers:</b> ([^\n]*</a>)</expression>
</RegExp>
<expression noclean="1" repeat="yes">[^>]*>([^<]+)</a></expression>
</RegExp>
<RegExp input="$$9" output="<director>\1</director>" dest="5+">
<RegExp input="$$1" output="\1" dest="9">
<expression noclean="1"><b>Directors:</b> ([^\n]*</a>)</expression>
</RegExp>
<expression noclean="1" repeat="yes">[^>]*>([^<]+)</a></expression>
</RegExp>
<RegExp input="$$9" output="<actor><name>\1</name></actor>" dest="5+">
<RegExp input="$$1" output="\1" dest="9">
<expression noclean="1"><b>Actors:</b> ([^\n]*</a>)</expression>
</RegExp>
<expression noclean="1" repeat="yes">[^>]*>([^<]+)</a></expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</GetDetails>
</scraper>