MovieMeter.nl (Dutch Movies) Scraper development...

  Thread Rating:
  • 5 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Trazer Offline
Junior Member
Posts: 10
Joined: Feb 2008
Reputation: 0
Location: Netherlands
Post: #21
Finally had some time to fiddle with the scraper. Seems moviemeter.nl changed something on the main page so i had to change the regexp for retrieving the hashcode. At least that part is working again.
I have a question about chaining for the GetSearchResults function. Does scrap.exe support this or can i only test this using xbmc? My guess it's the latter. Am i guessing right?
find quote
spiff Online
Grumpy Bastard Developer
Posts: 12,186
Joined: Nov 2003
Reputation: 82
Post: #22
yeah, scrap is totally deprecated as we lost the source code :/

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Trazer Offline
Junior Member
Posts: 10
Joined: Feb 2008
Reputation: 0
Location: Netherlands
Post: #23
Wow, that was a fast reply. Thanks much appreciated.

Time for me to setup xbmc for windows to correctly test the scraper and expand it's functionalities.
find quote
athloni Offline
Fan
Posts: 312
Joined: Dec 2007
Reputation: 0
Post: #24
Whe can i find the moviemeter scraper?
find quote
Bigfoot87 Offline
Senior Member
Posts: 134
Joined: Dec 2006
Reputation: 0
Location: Netherlands
Post: #25
It's still under construction. Wink
find quote
fjskmdl Offline
Junior Member
Posts: 7
Joined: Jan 2009
Reputation: 0
Post: #26
Hi, i made a php script that hopefully someone can translate into a working scraper file...



If you need more info please reply

PHP Code:
<?php
/*
if ($('quicksearch')) {
               new Searcher.Ajax.Json('quicksearch', 'http://www.moviemeter.nl/calls/search.php?hash=29918b11647fdd3755d59e6ac45d4977&qs=1', {
                       'postVar': 'search',
                       'quicksearch': true,
                       'maxChoices': 12,
                       'overflow':true,
                       'basic':true
                   });
}
*/

$term 'jurassic park';
$url 'http://www.moviemeter.nl/calls/search.php?hash=29918b11647fdd3755d59e6ac45d4977&qs=1&search='.$term;

$str file_get_contents($url);

//json response example for search "jurassic park"
//$str = '["header_films_0_3",{"i":"365","ty":"f","t":"Jurassic Park","a":"","y":"1993","img":"%3Cimg src%3D%22http%3A%2F%2Fwww.moviemeter.nl%2Fimages%2Fcovers%2Fthumbs%2F0%2F36​5.jpg%22 class%3D%22thumbnail%22 alt%3D%22Jurassic Park %281993%29%22 %2F%3E","px":75,"h":"%3Cp class%3D%22subtext%22%3EAvontuur %2F Science-Fiction%2C 127 minuten%3Cbr %2F%3Egeregisseerd door Steven Spielberg%3Cbr %2F%3Emet Sam Neill%2C Jeff Goldblum en Laura Dern%3Cbr %2F%3E%3C%2Fp%3E"},{"i":"341","ty":"f","t":"Jurassic Park III","a":"Jurassic Park 3","y":"2001","img":"%3Cimg src%3D%22http%3A%2F%2Fwww.moviemeter.nl%2Fimages%2Fcovers%2Fthumbs%2F0%2F34​1.jpg%22 class%3D%22thumbnail%22 alt%3D%22Jurassic Park III %282001%29%22 %2F%3E","px":75,"h":"%3Cp class%3D%22subtext%22%3EScience-Fiction %2F Actie%2C 92 minuten%3Cbr %2F%3Egeregisseerd door Joe Johnston%3Cbr %2F%3Emet Sam Neill%2C William H. Macy en T%E9a Leoni%3Cbr %2F%3E%3C%2Fp%3E"},{"i":"364","ty":"f","t":"Lost World%3A Jurassic Park%2C The","a":"Jurassic Park 2","y":"1997","img":"%3Cimg src%3D%22http%3A%2F%2Fwww.moviemeter.nl%2Fimages%2Fcovers%2Fthumbs%2F0%2F36​4.jpg%22 class%3D%22thumbnail%22 alt%3D%22Lost World%3A Jurassic Park%2C The %281997%29%22 %2F%3E","px":75,"h":"%3Cp class%3D%22subtext%22%3EScience-Fiction %2F Avontuur%2C 129 minuten%3Cbr %2F%3Egeregisseerd door Steven Spielberg%3Cbr %2F%3Emet Jeff Goldblum%2C Julianne Moore en Vince Vaughn%3Cbr %2F%3E%3C%2Fp%3E"},"header_directors_0_0","header_topics_0_2",{"i":"1424","t":"Jurassic Park 4 %28Film %3E Nieuws%29","ty":"t"},{"i":"5679","t":"Favoriete dino uit de Jurassic Park reeks %28Film %3E Toplijsten en favorieten%29","ty":"t"},"header_users_0_2",{"i":"37185","t":"JurassicPark","ty":"u","img":"%3Cimg src%3D%22http%3A%2F%2Fwww.moviemeter.nl%2Fimages%2Fuser_unknown.jpg%22 class%3D%22avatar%22 %2F%3E","px":54,"h":"%3Cp class%3D%22subtext%22%3Eingeschreven sinds 15 augustus 2006%3Cbr %2F%3E632 stemmen%2C 509 berichten%3C%2Fp%3E"},{"i":"1720","t":"Jurassic Smurf","ty":"u"}]';

//echo $str;
//echo '<hr />';
//i = id, t = movie title, y = movie year
preg_match_all('|"i":"(.*)".*"t":"(.*)".*"y":"(.*)".*|iUm'$str$ids);

$detail_urls = array();
if (!empty(
$ids[1])) {
    foreach(
$ids[1] as $id) {
        
array_push($detail_urls'http://www.moviemeter.nl/film/'.$id);
    }
}

//echo 'matches<pre>';
//print_r($detail_urls);
//echo '</pre>';

//parse a detail url
$contents file_get_contents('http://www.moviemeter.nl/film/364');
$contents str_replace("\r\n"''$contents);
$contents str_replace("\r"''$contents);
$contents str_replace("\n"''$contents);

preg_match_all('|.*<div id="film_info">(.*)<br />(.*)<br />(.*)<br />(.*)<br />(.*)<br />(.*)<br />(.*)<br />(.*)<br />.*</div>.*|iUm'$contents$movie_info);

echo 
'country:'.$movie_info[1][0];
echo 
'<br />';
echo 
'genre(s):'.$movie_info[2][0];
echo 
'<br />';
echo 
'movie length:'.$movie_info[3][0];
echo 
'<br />';
echo 
'director:'.$movie_info[5][0];
echo 
'<br />';
echo 
'actors:'.$movie_info[6][0];
echo 
'<br />';
echo 
'movie info:'.$movie_info[8][0];
echo 
'<br />';
?>
find quote
fjskmdl Offline
Junior Member
Posts: 7
Joined: Jan 2009
Reputation: 0
Post: #27
Currently i have something working to get the movie description from moviemeter, but i am having a problem with the following expression:


<expression>&lt;div id=&quot;film_info&quot;&gt;(.*)[^&lt;div]</expression>

source string = 'fdqskfdq<div id="film_info">MOVIE CONTENT THAT I NEED<div>fdqsfqds</div></div>fsdjlk';

i am trying to get all the data betwee div id="film_info">xxx</div>, but i get alot more (also the comments), see for example http://www.moviemeter.nl/film/365

does anyone know how to set this to the right regex?
find quote
fjskmdl Offline
Junior Member
Posts: 7
Joined: Jan 2009
Reputation: 0
Post: #28
i have some working code for moviemeter:

Code:
<scraper name="Moviemeter" content="movies" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<!-- By fjskmdl 2 jan 2009 -->
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="http://www.moviemeter.nl/calls/search.php?hash=3d669ba0d93914426945f6985e135be6&amp;qs=1&amp;search=\1" dest="3">
            <expression noclean="1"/>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\3&lt;/title&gt;&lt;url&gt;http://www.moviemeter.nl/film/\2&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes">({&quot;i&quot;:&quot;([0-9]+)&quot;,&quot;ty&quot;:&quot;[a-z]*&quot;,&quot;t&quot;:&quot;(.[^&quot;]*).[^}])</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetSearchResults>
  <GetDetails dest="3">
    <RegExp input="$$8" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
        <!-- title,year -->
        <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;&lt;year&gt;\2&lt;/year&gt;" dest="8">
            <expression trim="1" noclean="1">&lt;h1&gt;([^\(]*)\(([^\(]*)</expression>
        </RegExp>
        <!--Director-->
        <RegExp input="$$1" output="&lt;director&gt;\2&lt;/director&gt;" dest="8+">
            <expression repeat="yes">geregisseerd door ([^&gt;]*)&gt;([^&lt;]*)</expression>
        </RegExp>
        <!--Actors -->
        <RegExp input="$$1" output="&lt;actor&gt;&lt;name&gt;\1&lt;/name&gt;&lt;role&gt;&lt;/role&gt;&lt;/actor&gt;" dest="8+">
            <expression>met ([^&lt;]*)</expression>
        </RegExp>

        <!-- Runtime !-->
        <RegExp input="$$1" output="&lt;runtime&gt;\1 minuten&lt;/runtime&gt;" dest="8+">
            <expression repeat="yes">([0-9]+) minuten</expression>
        </RegExp>
        <!-- Thumbnail !-->
        <RegExp input="$$1" output="&lt;thumb&gt;&lt;url spoof=&quot;http://www.moviemeter.nl&quot;&gt;http://www.moviemeter.nl/images/covers/\1/\2.jpg&lt;/url&gt;&lt;/thumb&gt;" dest="8+">
            <expression>http://www.moviemeter.nl/images/covers/([0-9]+)/([0-9]+)\.jpg</expression>
        </RegExp>

        <!--rating -->
        <RegExp input="$$1" output="&lt;rating&gt;\1&lt;/rating&gt;" dest="8+">
            <expression>gemiddelde &lt;b&gt;([0-9,]+)([^&lt;]*)&lt;/b&gt;</expression>
        </RegExp>

        <!-- nr votes -->
        <RegExp input="$$1" output="&lt;votes&gt;\1&lt;/votes&gt;" dest="8+">
            <expression>&lt;b&gt;([0-9]+)&lt;/b&gt; stemmen</expression>
        </RegExp>

        <!-- genre -->
        <RegExp input="$$1" output="&lt;genre&gt;\2&lt;/genre&gt;" dest="8+">
            <expression>film_info&quot;&gt;([^&lt;]*)&lt;br /&gt;([^&lt;]*)</expression>
        </RegExp>
        <!-- Plot -->
        <RegExp input="$$1" output="&lt;plot&gt;\7&lt;/plot&gt;" dest="8+">
            <expression repeat="yes">&lt;div id=&quot;film_info&quot;&gt;([^&lt;]*)&lt;br /&gt;([^&lt;]*)&lt;br /&gt;([^&lt;]*)&lt;br /&gt;&lt;br /&gt;geregisseerd door &lt;a href=&quot;http://www\.moviemeter\.nl/director/([0-9]+)&quot;([^&lt;]*)&lt;/a&gt;&lt;br /&gt;([^&lt;]*)&lt;br /&gt;&lt;br /&gt;([^&lt;]*)</expression>
        </RegExp>
        <expression noclean="1"/>
        </RegExp>
    </GetDetails>
</scraper>

bugs: when rating is 2,98 it shows 2.00
cast --> all persons are shown on 1 line
find quote
spiff Online
Grumpy Bastard Developer
Posts: 12,186
Joined: Nov 2003
Reputation: 82
Post: #29
yeyh! i see you figured out the scraper syntax Smile

the problem with the comma separated number is that they are simply not valid floating point numbers (parsed as %f in a sscanf like function if that tells you anything). you need to translate them to use a dot.

cast is just the expression, there is no repeat on it (but i assume you knew that)

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Jordy Offline
Junior Member
Posts: 1
Joined: Jan 2009
Reputation: 0
Post: #30
Hi,

This scraper is going to stop working soon because of some changes in the HTML of the site I'm going to make. However, I'm creating an XML-RPC API (web service) for accessing the MovieMeter.nl film information. Would it be possible for you to change your scripts so this API is used instead of scraping the HTML? If someone wants to test using this API, please contact me at info@moviemeter.nl
find quote
Post Reply