![]() |
|
Lovefilm.se (Swedish) scraper - search uses javascript, can that be bypassed? - Printable Version +- XBMC Community Forum (http://forum.xbmc.org) +-- Forum: Development (/forumdisplay.php?fid=32) +--- Forum: Scraper Development (/forumdisplay.php?fid=60) +--- Thread: Lovefilm.se (Swedish) scraper - search uses javascript, can that be bypassed? (/showthread.php?tid=67920) Pages: 1 2 |
Lovefilm.se (Swedish) scraper - search uses javascript, can that be bypassed? - filigran - 2010-01-20 04:10 Hi! I've been using imdb and thetvdb.com to scrape my movies/tv shows, but being swedish and all, I thought it would be nice to have a swedish scraper, and tried to create one for http://www.lovefilm.se . I found the dummies howto, and I've figured out the basics, but I've stumbled upon a problem: their search engine uses javascript (atleast that's what I've come to think) and when parsing the results using scrape.exe I get nothing. I tried to just wget the page, and saw that there are indeed no results. Can I, somehow, get the search results to be parsed even though they use JS (or whatever)? I found this post: http://forum.xbmc.org/showpost.php?p=262584&postcount=10 which says: Quote:I notice that they are currently using a javascript search system which prevents you from simply parsing the HTML sent back after a search request (bastards *shakes fist*) which makes it only slightly harder to scrape. I read that like it's doable? Too bad he doesn't say how. - spiff - 2010-01-20 09:16 searching using google is the only workaround i'm aware of. btw, scrape.exe is utterly outdated, check the scraper editor. - The_Ghost16 - 2010-01-20 11:06 With the following url you can find a movie: http://www.lovefilm.se/movieSearch.do?query= Just paste the movietitle after query= and you will find the movies. After that is done you can open the movie and you can scrape the result. This doesn't look that hard. - filigran - 2010-01-20 12:43 spiff Wrote:searching using google is the only workaround i'm aware of. btw, scrape.exe is utterly outdated, check the scraper editor. Yeah, I saw that on the page. The scraper editor, would that be http://forum.xbmc.org/showthread.php?tid=52929 ? I tried that one using wine, but I needed to install mono through wine, and scrape.exe seemed to work properly for just testing. Guess I'll have to fix the Mono stuff for wine to be sure. I'll try using google search. Thanks. The_Ghost16 Wrote:With the following url you can find a movie: Yeah, I know. But doing a search for "batman", i.e. http://www.lovefilm.se/movieSearch.do?query=batman gives me a few results. If I select some results, and check the DOM with firefox, I see them: PHP Code: <div id="resultAllMovie">that's the first result. But checking the source, I get this: PHP Code: <div id="resultAllMovie"></div>It's the same if I just wget the page. Am I missing something? If I search for something that only yields one matching result, like "band of brothers", I get to that movie page directly, and there I can scrape the details. But I need to find the search results too. Thanks for your replies! - filigran - 2010-02-02 23:06 Sorry to bring up this forgotten thread again. I gave up on this since I couldn't find a way to work it out, but I just have to ask: spiff Wrote:searching using google is the only workaround i'm aware of. btw, scrape.exe is utterly outdated, check the scraper editor. When you say "searching using google", what exactly do you mean? I thought I knew how to google, but I must be missing something obvious. EDIT: I assume you mean "site:lovefilm.se/film <keyword>"? I guess that's as close as I can come? Or did you have something else in mind? Could a javascript capable scraper be something for the future? Might be a security risk I suppose ... or is it just not possible? - vdrfan - 2010-02-03 01:15 Why not just use http://www.lovefilm.se/movieSearch.do?query=<keyword> ? - filigran - 2010-02-03 15:21 vdrfan Wrote:Why not just use http://www.lovefilm.se/movieSearch.do?query=<keyword> ? Like I said earlier: filigran Wrote:Yeah, I know. But doing a search for "batman", i.e. http://www.lovefilm.se/movieSearch.do?query=batman gives me a few results. If I select some results, and check the DOM with firefox, I see them: Am I just being totally fucking dumb here? - spiff - 2010-02-03 15:30 nope. what i mean by search using google is something ala http://www.google.com/search?hl=en&site=&q=batman+site%3Alovefilm.se&btnG=Search - filigran - 2010-02-05 02:12 spiff Wrote:nope. what i mean by search using google is something ala Yeah, that's what I meant, just didn't include the url but the search string. ![]() I got a bit further now, and I have a scraper that works inside the editor, but not inside XBMC. If I use the editor and test the scraper, it asks me for a search string, gets an url, gives me a list of search results, and then fetches info for the one I choose. All is well. But inside XBMC I get no results when scanning, and no results when adding manually (hitting 'I' on a movie). Using other scrapers work. The XBMC log says this: Code: 23:33:01 T:3860 M:450981888 DEBUG: SDLKeyboard: scancode: 23, sym: 105, unicode: 105, modifier: 0The results, according to the editor is: Code: <results><entity><url>http://www.lovefilm.se/film/48044-The+Dark+Knight.do</url><title>The Dark Knight DVD</title></entity><entity><url>http://www.lovefilm.se/film/52631-The+Dark+Knight+(Blu-ray)+-+Extramaterial.do</url><title>The Dark Knight (Blu-ray) - Extramaterial</title></entity><entity><url>http://www.lovefilm.se/film/51628-The+Dark+Knight+(Blu-ray).do;jsessionid=DDC3B8E739F803541C84096C18C90991</url><title>The Dark Knight (Blu-ray)</title></entity></results>My XML code: PHP Code: <?xml version="1.0" encoding="utf-8"?>My regexes probably suck, but they yield some results in the editor atleast. Is there anything missing, some required field? NfoUrl and stuff, do they have to be there? Thanks for your help so far!
- spiff - 2010-02-05 11:18 no reason to escape the /'es. |