![]() |
|
German IMDB scraper, please test it and give feedback - Printable Version +- XBMC Community Forum (http://forum.xbmc.org) +-- Forum: Help and Support (/forumdisplay.php?fid=33) +--- Forum: Add-ons Help and Support (/forumdisplay.php?fid=27) +---- Forum: Metadata scrapers (/forumdisplay.php?fid=147) +---- Thread: German IMDB scraper, please test it and give feedback (/showthread.php?tid=75121) |
- Nicezia - 2010-06-06 09:11 olympia Wrote:Yes, it's only useful when you have an xbmc compliant external nfo to import from. Nevertheless you couldn't even scrape this info from anywhere i don't mean to be correcting you here, but i have written a scraper for AEBN that imports the Movie Series name as set, and XBMC parses it from the scraper just fine, so set can as well be used with scraping Any tag that can be exported from XBMC (do an export on a small library for an example) can be imported by XBMC from scraper or nfo.... however, you are very right in that its best to let XBMC or an external nfo manager to handle fileinfo - Nicezia - 2010-06-06 09:24 Eisbahn Wrote:- importing up to 6 genres (9 easy possible) if you used a sepearating nested RegExp, you could import infinite amount of genres for instance say something is formated as such Code: <info-genre>Genres: Western, Comedy, Whatever, Else</info-genre>First copy the part that you need to a buffer (8 in this case) Code: <RegExp input=$$1 output="\1" dest=8>and then parent that with a regular expression that repeatedly finds the comma seperated genres (getting the input from the string you copied to $$8) so that the end product is something like this Code: <RegExp input="$$8" output="<genre>\1</genre>" dest="whatever buffer you're collecting details in">- vdrfan - 2010-06-06 11:39 Thanks for the good information Nicezia! Note, as of SVN revision r30825 the way we handle the runtime/duration has slightly changed. The following VideoInfoTags (streamdetails) are exported from the actual media file: codec, aspect, width, height and duration (new). In order to make sure the runtime is shown properly, make sure the scraper only returns minutes (numeric) only. This value is used in case the meta extraction is disabled and/or we somehow fail to extract it. - Nicezia - 2010-06-06 12:01 vdrfan Wrote:Thanks for the good information Nicezia! Note, as of SVN revision r30825 the way we handle the runtime/duration has slightly changed. Noted, and i will adjust ScraperXML code accordingly @vdrfan, also noticed there are a few other tags not mentioned anywhere else (country, sorttitle, epbookmark, originaltitle) and that premiered though taken from the nfo/scraper, doesn't seem to store into database at all (at least in the last version i'm basing off of which is before the add-on merge, and therefore when importing the file this info is lost, if its even provided) are these extra tags depreciated tags that haven't been removed from code or added tags (only just now getting to a point where i can read C++ code as well as CSharp) and is the Premiered getting lost fromthe database an oversight? - Eisbahn - 2010-06-06 23:59 Hi, sorry for late reply, but we had a wonderful day, relaxing with my wife and kids. But i've done a little bit of work and some new questions: What is the content of the tag <outline>? Some more infos as tagline but not as much as in plot or plotsummary? Couldn't find it... At the moment I put in this tag a "short" plot (the one given at the main overview of IMDB). What about <certification>? Is it deprecated and only MPAA is used instead? Because of different DVDs, I've got more than one mpaa tag, e.g. 12years heavy cut, 16years cut, 18years uncut (it's not a single instance) at "The Rock" (IMDB-ID = tt0117500) Is <originaltitle> a subset of <sorttitle>, e.g. Code: <originaltitle>What about the function GetIMDBThumbs? Does it fetch all pics from IMDB, or only the posters (and maybe product)? What are the constants SX, SY, SX$INFO and SY$INFO (or what is this)? Why is the function not repeated (think the users wants more than one thumbnail)? Don't know exactly what this function should do. Pointing to <http://www.imdb.de/title/tt0499549/mediaindex?refine=poster>? Any help? How can I call a site without getting a "&" to "&" cleaned? Actually I used a function which removes the & and makes an & into the links :=( The "no HTML clean" tag does not work at all... ok = meaning my scraper gathers the corresponding infos n/a = not for import use or no infos given on german imdb site stc = still to come => maybe implemented in future release (meaning: think it's a useless feature...) Code: <movie>What format should <premiered> have? String with month written out, or date? Eisbahn - Nicezia - 2010-06-07 00:44 Eisbahn Wrote:What about <certification>? Is it deprecated and only MPAA is used instead? Certification is still in there, sorry I left it out of my info Eisbahn Wrote:What about the function GetIMDBThumbs? Does it fetch all pics from IMDB, or only the posters (and maybe product)? What are the constants SX, SY, SX$INFO and SY$INFO (or what is this)? Why is the function not repeated (think the users wants more than one thumbnail)? Don't know exactly what this function should do. Pointing to <http://www.imdb.de/title/tt0499549/mediaindex?refine=poster>? Any help? GetIMDBThumbs only grabs the posters. The actor thumbs are grabbed with the rest of the actor info SX$INFO is nothing however the $INFO part has meaning, what you left out was [imdbscale] in its entirety $INFO[imdbscale], is a place holder for whatever value the user has selected in the settings for the size of the images to be downloaded (the setting with the id "imdbscale"), $INFO[<settingid>] simply tells the scraper "Replace this placeholder (the placeholder being in this case $INFO[<settingid>])with the text selected in the setting with the id <settingid> Eisbahn Wrote:How can I call a site without getting a "&" to "&" cleaned? Actually I used a function which removes the & and makes an & into the links :=( The "no HTML clean" tag does not work at all... Ampersands should be cleaned up by default (if you're looking at the source code of XBMC see ScraperParser::ParseExression where it is commented nasty hack #1) double the ampersand example http://foo.com/search.php?q=foo&s=foo2 the effect being that &amp; becomes & Eisbahn Wrote:What format should <premiered> have? String with month written out, or date? Premiered is simply imported/exported as a string, so it has no localization and/or globalization format. So it doesn't really matter (but as of current i have no idea IF its stored in database, and if it is, no idea WHERE its stored, because looking in the video34.db the premiered value seems to be nowhere.) - spiff - 2010-06-07 10:25 Nicezia Wrote:@vdrfan, also noticed there are a few other tags not mentioned anywhere else (country, sorttitle, epbookmark, originaltitle) and that premiered though taken from the nfo/scraper, doesn't seem to store into database at all (at least in the last version i'm basing off of which is before the add-on merge, and therefore when importing the file this info is lost, if its even provided) added. premiered is only used in relation to tvshows. country and sorttitle should be selfexplanatory, epbookmark is the episode bookmark in multi-episode files (i.e. where does episode 2 start). - Eisbahn - 2010-06-07 22:38 Nearly everything works now, only the thumbs from IMDB are not working at all... In the main scraper I use the following RegEx Code: <RegExp input="$$2" output="<url cache="$$2-posters.html" function="GetIMDBThumbs">$$3mediaindex?refine=poster</url>" dest="5+">Code: http://www.imdb.com/title/tt0499549/mediaindex?refine=posterThe Function is Code: <GetIMDBThumbs dest="5">Eisbahn - Nicezia - 2010-06-10 21:42 Eisbahn Wrote:(why the hell should they be crippled to "square format"), e.g. <http://ia.media-imdb.com/images/M/MV5BMTYxMzg0NzYwOV5BMl5BanBnXkFtZTcwMDc3MzEzMw@@._V1._CR0,0,388,388_SX512_SY512_.jpg>. it really isn't "crippled" to square, the image is scaled by imdb in relation to the width. - Eisbahn - 2010-06-12 12:24 Nicezia Wrote:it really isn't "crippled" to square, the image is scaled by imdb in relation to the width. Hmmm, the original image is <http://www.imdb.de/media/rm3073674240/tt0499549>, all thumbs are cutted to squares, e.g. <http://ia.media-imdb.com/images/M/MV5BMTYxMzg0NzYwOV5BMl5BanBnXkFtZTcwMDc3MzEzMw@@._ V1._CR0,0,388,388_SX512_SY512_.jpg>. But thats not a problem of XBMC or the scraper, it's IMDB. But the main problem still exists: the images are not shown in XBMC. Any chance to check wich URL is generated by the scraper and used for the pic in XBMC? However: think I could release v1.0 which gathers nearly all infos in a nice format from IMDB and (on user preference) covers and plot from partner sites this weekend. Eisbahn |