2009-06-21, 16:12
I'm no scraper developer or regex shark by far, so I need alittle help
From the TMDB scraper:
Regex:
<movie>.*?<title>([^<]*)</title>.*?<id>([^<]*)</id>.*?</movie>
XML:
<results for="terminator" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<moviematches>
<movie>
<score>1.0</score>
<popularity>45</popularity>
<title>Terminator 2: Judgment Day</title>
<alternative_title>Terminator 2</alternative_title>
<type>movie</type>
<id>280</id>
<imdb>tt0103064</imdb>
<url>http://www.themoviedb.org/movie/280</url>
<short_overview>It has been ten long years since a Terminator failed to kill Sarah Connor and her unborn son, John. Now, Skynet has sent back another Terminator. This one being more advanced than the last one. John Connor, who is now ten years old, is the target. The future John sends back a replica of the Terminator that tried to kill him back in time to 1995. It's Terminator vs. Terminator.</short_overview>
<release>1991-07-03</release>
</movie>
</moviematches>
</results>
Here is my XML:
<response>
<titles>
<title id="1c5f16f0-d5f3-4107-bcb6-007681e76be1" type="DVD" country="United Kingdom" barcode="5014138026448" title="Rosemary Conley - Shape Up And Salsacise" edition="" year="2005" thumbnail="http://fs.luckydata.com/Covers/2e92c6ce-378c-4535-a938-2c20d0b2b517.jpg" thumbnailwidth="97" thumbnailheight="140" completepercentage="100" />
</titles>
</response>
Now I need the "id" and the "title" for my search result, just like TMDB.
Here's what I have so far (non-working):
<title id="([^<]*)" type=.*?title="([^<]*)" edition=
Can anyone help me out here
From the TMDB scraper:
Regex:
<movie>.*?<title>([^<]*)</title>.*?<id>([^<]*)</id>.*?</movie>
XML:
<results for="terminator" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<moviematches>
<movie>
<score>1.0</score>
<popularity>45</popularity>
<title>Terminator 2: Judgment Day</title>
<alternative_title>Terminator 2</alternative_title>
<type>movie</type>
<id>280</id>
<imdb>tt0103064</imdb>
<url>http://www.themoviedb.org/movie/280</url>
<short_overview>It has been ten long years since a Terminator failed to kill Sarah Connor and her unborn son, John. Now, Skynet has sent back another Terminator. This one being more advanced than the last one. John Connor, who is now ten years old, is the target. The future John sends back a replica of the Terminator that tried to kill him back in time to 1995. It's Terminator vs. Terminator.</short_overview>
<release>1991-07-03</release>
</movie>
</moviematches>
</results>
Here is my XML:
<response>
<titles>
<title id="1c5f16f0-d5f3-4107-bcb6-007681e76be1" type="DVD" country="United Kingdom" barcode="5014138026448" title="Rosemary Conley - Shape Up And Salsacise" edition="" year="2005" thumbnail="http://fs.luckydata.com/Covers/2e92c6ce-378c-4535-a938-2c20d0b2b517.jpg" thumbnailwidth="97" thumbnailheight="140" completepercentage="100" />
</titles>
</response>
Now I need the "id" and the "title" for my search result, just like TMDB.
Here's what I have so far (non-working):
<title id="([^<]*)" type=.*?title="([^<]*)" edition=
Can anyone help me out here