2009-08-25, 16:29
Hi there !
I'm working on a tool which combine XBMC scrapper for a search, and the produce HTML rendering of the result. The idea is to be able to scrap from multiple sources (for example take thumb on imdb scraper, fanart on tmdb scraper and plot from allocine scrapper).
My tool start to make the deal (with the use of my dll to scrap from the different scraper) and produce an HTML result. What I wanna do now is writing a scraper.
Here is a little sample :
with the resquest : http://localhost:52026/CMMServer/Default.aspx?s=Basic
I have the response :
But I'm not strong enought to write the scraper (I'm not very fluent with regexp). Does anyone can help me ?
I'm working on a tool which combine XBMC scrapper for a search, and the produce HTML rendering of the result. The idea is to be able to scrap from multiple sources (for example take thumb on imdb scraper, fanart on tmdb scraper and plot from allocine scrapper).
My tool start to make the deal (with the use of my dll to scrap from the different scraper) and produce an HTML result. What I wanna do now is writing a scraper.
Here is a little sample :
with the resquest : http://localhost:52026/CMMServer/Default.aspx?s=Basic
I have the response :
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>
CMM Response
</title></head>
<body>
<form name="form1" method="post" action="Default.aspx?s=Basic+(2003)" id="form1">
<div>
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE4MTc3Njc3MzUPZBYCAgMPZBYCAgEPZBYcZg9kFgJmD2QWAmYPDxYCHghJbWFnZVVybAU2Li90ZW1wL2ZhbmFydF8yZWZmNjc1YS05NzRjLTQxZDgtOTlkNy1lYjM3MWY0MjE0ZmYuanBnZGQCAQ9kFgJmD2QWAmYPDxYCHwAFNS4vdGVtcC90aHVtYl8yZWZmNjc1YS05NzRjLTQxZDgtOTlkNy1lYjM3MWY0MjE0ZmYuanBnZGQCAg9kFgJmD2QWAmYPDxYCHgRUZXh0BdQBVW5lIG51aXQsIGxvcnMgZCd1biBleGVyY2ljZSBkJ2VudHJhw65uZW1lbnQsIHVuIG91cmFnYW4gZnJhcHBlIFBhbmFtYS4gU2l4IG1pbGl0YWlyZXMgZG9udCBsJ2F1dG9yaXRhaXJlIHNlcmdlbnQgV2VzdCBkaXNwYXJhaXNzZW50IGV0IGlsIG5lIHJlc3RlIHF1ZSBkZXV4IHTDqW1vaW5zIHBvdXIgcmFjb250ZXIgY2UgcXUnaWwgcydlc3QgcGFzc8OpLg3vgI3rqq3vgI1kZAIDD2QWAmYPZBYCZg8PFgIfAQUOSm9obiBNY1RpZXJuYW5kZAIED2QWAmYPZBYCZg8PFgIfAQURQWN0aW9uIC8gVGhyaWxsZXJkZAIFD2QWAmYPZBYCZg8PFgIfAQXUAVVuZSBudWl0LCBsb3JzIGQndW4gZXhlcmNpY2UgZCdlbnRyYcOubmVtZW50LCB1biBvdXJhZ2FuIGZyYXBwZSBQYW5hbWEuIFNpeCBtaWxpdGFpcmVzIGRvbnQgbCdhdXRvcml0YWlyZSBzZXJnZW50IFdlc3QgZGlzcGFyYWlzc2VudCBldCBpbCBuZSByZXN0ZSBxdWUgZGV1eCB0w6ltb2lucyBwb3VyIHJhY29udGVyIGNlIHF1J2lsIHMnZXN0IHBhc3PDqS4N74CN66qt74CNZGQCBg9kFgJmD2QWAmYPDxYCHwEFCDFoIDM4bWluZGQCBw9kFgJmD2QWAmYPDxYCHwFkZGQCCA9kFgJmD2QWAmYPDxYCHwEFcU9uIHNlbnQgcXVlIGxlcyBkZXV4IGFjdGV1cnMgcHJlbm5lbnQgdW4gcsOpZWwgcGxhaXNpciDDoCBqb3VlciBhdSBjaGF0IGV0IMOgIGxhIHNvdXJpcy4gRXQgbm91cyBhdXNzaSAh66qt74CN66qtZGQCCQ9kFgJmD2QWAmYPDxYCHwEFBUJhc2ljZGQCCg9kFgJmD2QWAmYPDxYCHwEFBjIzLDMyNWRkAgsPZBYCZg9kFgJmDw8WAh8BZGRkAgwPZBYCZg9kFgJmDw8WAh8BBQM2LDNkZAIND2QWAmYPZBYCZg8PFgIfAQUEMjAwM2RkZBzUu/5/2nYny+68afkNsdhQhAhN" />
</div>
<table id="TableResult" border="0" style="width:133px;">
<tr id="FanartRow">
<td id="FanartCell"><img id="FanartImage" src="./temp/fanart_2eff675a-974c-41d8-99d7-eb371f4214ff.jpg" style="border-width:0px;" /></td>
</tr><tr id="TableRow1">
<td id="TableCell1"><img id="ThumbImage" src="./temp/thumb_2eff675a-974c-41d8-99d7-eb371f4214ff.jpg" style="border-width:0px;" /></td>
</tr><tr id="TableRow2">
<td id="TableCell2"><span id="Plot">Une nuit, lors d'un exercice d'entraînement, un ouragan frappe Panama. Six militaires dont l'autoritaire sergent West disparaissent et il ne reste que deux témoins pour raconter ce qu'il s'est passé.</span></td>
</tr><tr id="TableRow3">
<td id="TableCell3"><span id="Director">John McTiernan</span></td>
</tr><tr id="TableRow4">
<td id="TableCell4"><span id="Genre">Action / Thriller</span></td>
</tr><tr id="TableRow5">
<td id="TableCell5"><span id="PlotOutline">Une nuit, lors d'un exercice d'entraînement, un ouragan frappe Panama. Six militaires dont l'autoritaire sergent West disparaissent et il ne reste que deux témoins pour raconter ce qu'il s'est passé.</span></td>
</tr><tr id="TableRow6">
<td id="TableCell6"><span id="Runtime">1h 38min</span></td>
</tr><tr id="TableRow7">
<td id="TableCell7"><span id="Studio"></span></td>
</tr><tr id="TableRow8">
<td id="TableCell8"><span id="Tagline">On sent que les deux acteurs prennent un réel plaisir à jouer au chat et à la souris. Et nous aussi !</span></td>
</tr><tr id="TableRow9">
<td id="TableCell9"><span id="Title">Basic</span></td>
</tr><tr id="TableRow10">
<td id="TableCell10"><span id="Votes">23,325</span></td>
</tr><tr id="TableRow11">
<td id="TableCell11"><span id="WritingCredit"></span></td>
</tr><tr id="TableRow12">
<td id="TableCell12"><span id="Rating">6,3</span></td>
</tr><tr id="TableRow13">
<td id="TableCell13"><span id="Year">2003</span></td>
</tr>
</table>
</form>
</body>
</html>
But I'm not strong enought to write the scraper (I'm not very fluent with regexp). Does anyone can help me ?