German IMDB scraper, please test it and give feedback

  Thread Rating:
  • 1 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #16
Eisbahn Wrote:What about <certification>? Is it deprecated and only MPAA is used instead?
Because of different DVDs, I've got more than one mpaa tag, e.g. 12years heavy cut, 16years cut, 18years uncut (it's not a single instance) at "The Rock" (IMDB-ID = tt0117500)

Certification is still in there, sorry I left it out of my info

Eisbahn Wrote:What about the function GetIMDBThumbs? Does it fetch all pics from IMDB, or only the posters (and maybe product)? What are the constants SX, SY, SX$INFO and SY$INFO (or what is this)? Why is the function not repeated (think the users wants more than one thumbnail)? Don't know exactly what this function should do. Pointing to <http://www.imdb.de/title/tt0499549/mediaindex?refine=poster>? Any help?

GetIMDBThumbs only grabs the posters. The actor thumbs are grabbed with the rest of the actor info

SX$INFO is nothing however the $INFO part has meaning, what you left out was [imdbscale] in its entirety $INFO[imdbscale], is a place holder for whatever value the user has selected in the settings for the size of the images to be downloaded (the setting with the id "imdbscale"), $INFO[<settingid>] simply tells the scraper "Replace this placeholder (the placeholder being in this case $INFO[<settingid>])with the text selected in the setting with the id <settingid>

Eisbahn Wrote:How can I call a site without getting a "&" to "&amp;" cleaned? Actually I used a function which removes the &amp; and makes an & into the links :=( The "no HTML clean" tag does not work at all...

Ampersands should be cleaned up by default (if you're looking at the source code of XBMC see ScraperParser::ParseExression where it is commented nasty hack #1)

double the ampersand

example http://foo.com/search.php?q=foo&amp;s=foo2

the effect being that &amp;amp; becomes &amp;

Eisbahn Wrote:What format should <premiered> have? String with month written out, or date?

Premiered is simply imported/exported as a string, so it has no localization and/or globalization format. So it doesn't really matter
(but as of current i have no idea IF its stored in database, and if it is, no idea WHERE its stored, because looking in the video34.db the premiered value
seems to be nowhere.)

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2010-06-07 06:47 by Nicezia.)
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #17
Nicezia Wrote:@vdrfan, also noticed there are a few other tags not mentioned anywhere else (country, sorttitle, epbookmark, originaltitle) and that premiered though taken from the nfo/scraper, doesn't seem to store into database at all (at least in the last version i'm basing off of which is before the add-on merge, and therefore when importing the file this info is lost, if its even provided)

are these extra tags depreciated tags that haven't been removed from code or added tags (only just now getting to a point where i can read C++ code as well as CSharp) and is the Premiered getting lost fromthe database an oversight?

added. premiered is only used in relation to tvshows. country and sorttitle should be selfexplanatory, epbookmark is the episode bookmark in multi-episode files (i.e. where does episode 2 start).
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #18
Nearly everything works now, only the thumbs from IMDB are not working at all...
In the main scraper I use the following RegEx
Code:
        <RegExp input="$$2" output="&lt;url cache=&quot;$$2-posters.html&quot; function=&quot;GetIMDBThumbs&quot;&gt;$$3mediaindex?refine=poster&lt;/url&gt;" dest="5+">
            <expression/>
        </RegExp>
        <RegExp input="$$2" output="&lt;url cache=&quot;$$2-product.html&quot; function=&quot;GetIMDBThumbs&quot;&gt;$$3mediaindex?refine=product&lt;/url&gt;" dest="5+">
            <expression/>
        </RegExp>
Resulting in the URLs
Code:
http://www.imdb.com/title/tt0499549/mediaindex?refine=poster
http://www.imdb.com/title/tt0499549/mediaindex?refine=product

The Function is
Code:
<GetIMDBThumbs dest="5">
    <RegExp input="$$6" output="&lt;details&gt;\1&lt;/details&gt;" dest="5">
        <!--\1_SX$INFO[imdbscale]_SY$INFO[imdbscale]_\2-->
        <RegExp input="$$1" output="\1_SX512_SY512_\2" dest="4">
            <expression repeat="yes" noclean="1,2">&lt;img alt=&quot;&quot; height=&quot;100&quot; width=&quot;100&quot;  src=(.*?)_S.*?(.jpg)&quot;</expression>
        </RegExp>
        <RegExp input="$$4" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="6">
            <expression repeat="yes" noclean="1">(.*?_SX[0-9]+_SY[0-9]+_.jpg)</expression>
        </RegExp>
        <expression noclean="1"/>
    </RegExp>
</GetIMDBThumbs>
If I do it by hand, I can see nice pics (why the hell should they be crippled to "square format"), e.g. <http://ia.media-imdb.com/images/M/MV5BMTYxMzg0NzYwOV5BMl5BanBnXkFtZTcwMDc3MzEzMw@@._V1._CR0,0,388,388_SX512_SY512_​.jpg>. But if I have a look in XBMC, I see only placeholders (white "Polaroid" with black square). What went wrong?

Eisbahn
(This post was last modified: 2010-06-07 22:41 by Eisbahn.)
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #19
Eisbahn Wrote:(why the hell should they be crippled to "square format"), e.g. <http://ia.media-imdb.com/images/M/MV5BMTYxMzg0NzYwOV5BMl5BanBnXkFtZTcwMDc3MzEzMw@@._V1._CR0,0,388,388_SX512_SY512_​.jpg>.

it really isn't "crippled" to square, the image is scaled by imdb in relation to the width.

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #20
Nicezia Wrote:it really isn't "crippled" to square, the image is scaled by imdb in relation to the width.

Hmmm, the original image is <http://www.imdb.de/media/rm3073674240/tt0499549>, all thumbs are cutted to squares, e.g. <http://ia.media-imdb.com/images/M/MV5BMT...MzEzMw@@._ V1._CR0,0,388,388_SX512_SY512_.jpg>. But thats not a problem of XBMC or the scraper, it's IMDB.
But the main problem still exists: the images are not shown in XBMC. Any chance to check wich URL is generated by the scraper and used for the pic in XBMC?
However: think I could release v1.0 which gathers nearly all infos in a nice format from IMDB and (on user preference) covers and plot from partner sites this weekend.

Eisbahn
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #21
v1.0.0 available, see first post
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #22
v1.0.1
corrected some RegEx to get all tags working again
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #23
v1.1.0
- images/thumbs from IMDB are working now (only a typo)
- alternative plot from OFDB still not working
find quote
xsidx Offline
Junior Member
Posts: 1
Joined: Jun 2010
Reputation: 0
Post: #24
is the latest version available for download anywhere? would love to test it. danke!
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #25
latest Version 2.0.0 can be found here:
<http://eisbahn.ohost.de/>

What is _not_ working
<premiered>Premierendatum</premiered> not im-/exported to XBMC
<aired>???</aired> only for TV-Shows/series?
<set>???</set> don't know what this is
<artist>???</artist> difference to actor?
<status>???</status> don't know what this is
<certification>Altersfreigabe für alle Staaten außer D</certification> not im-/exported to XBMC
<sorttitle>alternative Filmtitel</sorttitle>only first titel is im-/exported to XBMC
<code>???</code> don't know what this is, I think it's the codec => no sense to import anything in this field
<trailer>Trailer</trailer> senseless for me because the hole DVD is in XBMC present

Any hints for the corrupted tags are highly welcome.

Regards,

Eisbahn
find quote
krolli Offline
Junior Member
Posts: 1
Joined: Jul 2010
Reputation: 0
Post: #26
Hello,

I cant include the scraper into the addons dir. Can you attach a addon.xml file please?
Ah... I wrote it by myself... I've copied another addon.xml and edit it.
Thx for the scraper.
But i got no Covers ;(
(This post was last modified: 2010-07-12 09:53 by krolli.)
find quote
vdrfan Offline
Team-XBMC Developer
Posts: 2,837
Joined: Jan 2008
Reputation: 8
Location: Germany
Post: #27
@Eisbahn, mind posting an add-on ready version of the scraper? Otherwise users with newer builds won't be able to test and give feedback. Thanks.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules
For troubleshooting and bug reporting please make sure you read this first.
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #28
Hello,

at the moment the scraper is only ready for v9.11, not the upcomming v10 with the new structure. Sadly I do not have any infos how v10 should be implemented, wiki is empty and in the forum I couldn't find any infos as well...<http://wiki.xbmc.org/index.php?title=Add..._Extension > there are no infos for scrapers => No infos, no scaper :=(
For me with v9.11 unzipping and copying the two files into the video scraper dir works fine, can test v10 in a VM in a few hours. But I excpect it wont work out of the box with v10 as olympia wrote in another thread.
I don't know how often I asked: what tags are supported by XBMC v9.11 and v10 and what is the meaning of each? Any infoy about the structure in v10? Think it should be no problem (ok, a little) to get it working with v10.

Eisbahn
find quote
vdrfan Offline
Team-XBMC Developer
Posts: 2,837
Joined: Jan 2008
Reputation: 8
Location: Germany
Post: #29
All tags you've listed are obsolete due to the fact they are for shows, not needed for german stuff or handled internally. I am completely with you for the trailer stuff but i bet some users will request it as soon the scraper hits the official repository Tongue

The only difference for the upcoming dharma release is how scrapers and settings are handled. I'd recommend you to have a look at the other scrapers that are already add-ons to get an overview until the wiki is updated.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules
For troubleshooting and bug reporting please make sure you read this first.
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #30
Hi vdrfan,

just found <http://xbmc.git.sourceforge.net/git/gitweb.cgi?p=xbmc/scrapers;a=commit;h=5b59dec81b4e5046a3a515bc0cc6fd68ba408201>. Hope this are actual and proper xml examples, will try it this evening at home.
Are any docs out right now? I know this situation from real life: no docs ready, but client wants an implementation of feature X. No problem, but if the client does not say what he realy wants, it wont be a cheap solution and both sides are frustrated at the end... => Normally I do not accept any contracts without clear rules, or I adapt the price a bit ;=)

Eisbahn
find quote
Post Reply