German IMDB scraper, please test it and give feedback

  Thread Rating:
  • 1 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #1
new version online: http://github.com/Eisbahn/IMDb_de-Scraper/zipball/3.0.5

v3.0.4 and v2.0.2 out now: <http://github.com/Eisbahn/IMDb_de-Scraper/>

Hello,

just finished a first version after some work: german scraper for IMDB (in german language). Actual Version 1.0.0 is on http://ul.to/v5d9j0 ready for test. It grabs every tag availabel, only Trailer is not implemented (because for me it's a useless feature). Please feel free to report bugs or issues,
latest Version 2.0.0 for XBMC v9.11 can be found here:
<http://eisbahn.ohost.de/>

What is _not_ working
<premiered>Premierendatum</premiered> not im-/exported to XBMC
<aired>???</aired> only for TV-Shows/series?
<set>???</set> don't know what this is
<artist>???</artist> difference to actor?
<status>???</status> don't know what this is
<certification>Altersfreigabe für alle Staaten außer D</certification> not im-/exported to XBMC
<sorttitle>alternative Filmtitel</sorttitle>only first titel is im-/exported to XBMC
<code>???</code> don't know what this is, I think it's the codec => no sense to import anything in this field
<trailer>Trailer</trailer> senseless for me because the hole DVD is in XBMC present

Any hints for the corrupted tags are highly welcome.

Code:
<movie>
ok         <id>tt0432337</id>
ok         <title>Who knows</title>
ok         <originaltitle>Who knows for real</originaltitle>
ok         <sorttitle>Who knows 1</sorttitle>
n/a        <set>Who knows triology</set>
ok         <rating>6.100000</rating>
ok         <votes>50</votes>
ok         <year>2008</year>
ok         <top250>0</top250>
ok         <certification>MPAA for different countries</certification>
ok         <mpaa>Not available</mpaa>
ok         <studio>my camera</studio>
ok         <outline>A look at the role of the Buckeye State in the 2004 Presidential Election.</outline>
ok         <plot>A look at the role of the Buckeye State in the 2004 Presidential Election.</plot>
ok         <tagline></tagline>
ok         <runtime>90 min</runtime>
ok         <thumb>http://ia.ec.imdb.com/media/imdb/01/I/25/65/31/10f.jpg</thumb>
n/a        <playcount>0</playcount>
n/a        <watched>false</watched>
n/a        <filenameandpath>c:\Dummy_Movie_Files\Movies\...So Goes The Nation.avi</filenameandpath>
stc        <trailer></trailer>
ok         <genre></genre>
ok         <credits></credits>
ok         <premiered>single instance/optional</premiered>
n/a        <fileinfo>
n/a           <streamdetails>
n/a              <video>
n/a                 <codec>h264</codec>
n/a                 <aspect>2.35</aspect>
n/a                 <width>1920</width>
n/a                 <height>816</height>
n/a              </video>
n/a              <audio>
n/a                 <codec>ac3</codec>
n/a                 <language>eng</language>
n/a                 <channels>6</channels>
n/a              </audio>
n/a              <subtitle>
n/a                 <language>spa</language>
n/a              </subtitle>
n/a           </streamdetails>
n/a        </fileinfo>
ok         <director>Adam Del Deo</director>
ok         <actor>
ok            <thumb></thumb>
ok            <name></name>
ok            <role></role>
ok         </actor>
       </movie>

Regards, Eisbahn
(This post was last modified: 2010-09-19 17:00 by Eisbahn.)
find quote
Spaggi Offline
Senior Member
Posts: 178
Joined: May 2010
Reputation: 0
Post: #2
Great work! Will test on the weekend Smile
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #3
Because I'm new to XBMC: what tags are supported/should be provided by a scraper?
Could you give me an overview of mandantory and optional tags?

Regards,

Eisbahn
find quote
vdrfan Offline
Team-XBMC Developer
Posts: 2,787
Joined: Jan 2008
Reputation: 7
Location: Germany
Post: #4
Eisbahn Wrote:Because I'm new to XBMC: what tags are supported/should be provided by a scraper?
Could you give me an overview of mandantory and optional tags?

Regards,

Eisbahn

Check out the other scrapers. The imdb.com is pretty feature complete.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules
For troubleshooting and bug reporting please make sure you read this first.
find quote
donabi Offline
Senior Member
Posts: 295
Joined: Apr 2006
Reputation: 3
Location: germany
Post: #5
well, that differs very much.
some users "need" studio-tags, to have fancy icons in the skin.
or the narator.
others, like me, just need things like playtime, year, actors, fsk (mpaa) and ONE genre.
the orignal imdb-scraper gets a lot of genre-tags.
which makes the genre-filter sense-less.

p.s.:
we would like to see you at german xbmc.de

http://www.xbmcnerds.com - german xbmc community
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #6
@vdrfan:
Hmmm, sorry. Do we have a spec showing which tags are mandantory/optional? If not: how can I figure out which tags are supported? The IMDB com scraper fetches no infos about sound, subtitle, video-format (if I looked right), in several screenshots I could see infos about these things... So the answer: please do reverse engineering because everybody can implement tags however he/she likes is a bit contra productive and shows kind of quick-and-dirty-hacking without any concept? Is this the XBMC style?
What about:
Code:
<details>
    <title></title>
    <year></year>
    <director></director>
    <top250></top250>
    <mpaa></mpaa>
    <tagline></tagline>
    <runtime></runtime>
    <thumb></thumb>
    <credits></credits>
    <rating></rating>
    <votes></votes>
    <genre></genre>
    <actor>
        <name></name>
        <role></role>
    </actor>
    <outline></outline>
    <plot></plot>
</details>

@donabi: to cut some infos away is not a real problem and done in few seconds. But gathering all possible things is a bit more complicated. So first I would have a scraper which gets all infos.
If you have a decription of the alowed tags, please provide it. Is the order/sequence relevant, what tags are supported, what format is expected and so on. If the german board has active members, why not. But to be honest: think after the scraper my active work is over :=(

@all: Where can I get infos which tags are supported by XBMC? If the skins shows the infos doesn't matter at all, think a "good scrapper" should gather as much as possible. For the result of a scraper: is the order/sequence relevant, what tags are supported, what format is expected and so on. Today all I've done is reverse engineering, but I think thats not the right way...

Eisbahn
find quote
olympia Online
Team-XBMC Member
Posts: 2,381
Joined: May 2008
Reputation: 30
Post: #7
As for the starting point:
http://lmgtfy.com/?q=xbmc+nfo

I think you are chasing something like the first result?

Other than that, I am not sure you are seriously calling "reverse engineering" to just have a look at what tags are being used by other scrapers.
find quote
Eisbahn Offline
Junior Member
Posts: 43
Joined: Jun 2010
Reputation: 2
Post: #8
@olympia:
great, I found google. If you know the right words and do not type "scraper, tags, xbmc" as a newbee or anything like that, it realy works. If my questions are so easy: why do I get only from you an answer? Think it's a bit frustrating for both of us: for you as expert and me as new user...
- The set tag is for a standalone XBMC useless because you could not edit the tag before importing, so should not be used by a scraper. Am I right?
- what about the order. Is it relevant? Seemed to be not (looking at your nfo and the imdb.com output)
- fileinfos are imported by XBMC by analysing the video file on its own without interaction?

[edit] new version:
- year gets imported if quartal is added, like in "Insomnia (2002)"
- importing up to 6 genres (9 easy possible)
- triming of spaces
=> to come: all tags like in the nfo
[/edit]
(This post was last modified: 2010-06-05 23:08 by Eisbahn.)
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #9
Eisbahn Wrote:@vdrfan:
Hmmm, sorry. Do we have a spec showing which tags are mandantory/optional? If not: how can I figure out which tags are supported? The IMDB com scraper fetches no infos about sound, subtitle, video-format (if I looked right), in several screenshots I could see infos about these things... So the answer: please do reverse engineering because everybody can implement tags however he/she likes is a bit contra productive and shows kind of quick-and-dirty-hacking without any concept? Is this the XBMC style?
What about:
Code:
<details>
    <title></title>
    <year></year>
    <director></director>
    <top250></top250>
    <mpaa></mpaa>
    <tagline></tagline>
    <runtime></runtime>
    <thumb></thumb>
    <credits></credits>
    <rating></rating>
    <votes></votes>
    <genre></genre>
    <actor>
        <name></name>
        <role></role>
    </actor>
    <outline></outline>
    <plot></plot>
</details>

@donabi: to cut some infos away is not a real problem and done in few seconds. But gathering all possible things is a bit more complicated. So first I would have a scraper which gets all infos.
If you have a decription of the alowed tags, please provide it. Is the order/sequence relevant, what tags are supported, what format is expected and so on. If the german board has active members, why not. But to be honest: think after the scraper my active work is over :=(

@all: Where can I get infos which tags are supported by XBMC? If the skins shows the infos doesn't matter at all, think a "good scrapper" should gather as much as possible. For the result of a scraper: is the order/sequence relevant, what tags are supported, what format is expected and so on. Today all I've done is reverse engineering, but I think thats not the right way...

Eisbahn

All tags are optional , but i would say its best that the TITLE is at least supplied

Code:
<details>
    <title>single instance/Required</title>
    <id>single instance/optional</id>
    <studio>single instance/optional</studio>
    <year>single instance/optional</year>
    <director>multiple instance/optional</director>
    <top250>single instance/optional</top250>
    <mpaa>single instance/optional</mpaa>
    <tagline>single instance/optional</tagline>
    <runtime>single instance/optional</runtime>
    <thumb>multiple instance/optional</thumb>
    <credits></credits>
    <rating>single instance/optional</rating>
    <votes>single instance/optional</votes>
    <genre>multiple instance/optional</genre>
    <actor>
        <name></name>
        <thumb></thumb>
        <role></role>
    </actor>
    <outline>single instance/optional</outline>
    <plot>single instance/optional</plot>
    <premiered>single instance/optional</premiered>
    <set>multiple instance/optional</set>
    <trailer>multiple instance/optional</trailer>
    <streamdetails>
       <audio/>
          <codec></codec>
          <channels></channels>
       </audio>
       <video>
           <codec></codec>
           <height></height>
           <width></width>
      </video>
      <subtitle>
         <language></language>
      </subtitle>
   </streamdetails>
</details>

of course it goes without saying that actor, audio (inside stream info), video(inside stream info) and subtitle(inside stream info) are multiple instance and optional

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2010-06-06 11:58 by Nicezia.)
find quote
olympia Online
Team-XBMC Member
Posts: 2,381
Joined: May 2008
Reputation: 30
Post: #10
Eisbahn Wrote:- The set tag is for a standalone XBMC useless because you could not edit the tag before importing, so should not be used by a scraper. Am I right?
Yes, it's only useful when you have an xbmc compliant external nfo to import from. Nevertheless you couldn't even scrape this info from anywhere

Eisbahn Wrote:- what about the order. Is it relevant? Seemed to be not (looking at your nfo and the imdb.com output)
Order doesn't matter

Eisbahn Wrote:- fileinfos are imported by XBMC by analysing the video file on its own without interaction?
Yes, if this option is enabled in xbmc. But obviously this is again an info what you couldn't scrape from a web site. These tags are existing for an nfo, because you might don't want xbmc to do the extraction from the media file in itself, because you use an external nfo manager for that purposes and you want xbmc to import the data generated by that.
find quote
Post Reply