XBMC Community Forum
[WIP] AniDB.net Anime Video Scraper - Printable Version

+- XBMC Community Forum (http://forum.xbmc.org)
+-- Forum: Help and Support (/forumdisplay.php?fid=33)
+--- Forum: Add-ons Help and Support (/forumdisplay.php?fid=27)
+---- Forum: Metadata scrapers (/forumdisplay.php?fid=147)
+---- Thread: [WIP] AniDB.net Anime Video Scraper (/showthread.php?tid=64587)



- MukiDA - 2010-01-02 08:52

Thanks for the help, Spiff, that helped out substantially.

zonsky, feel free to beta test the scraper as-is (no episodes yet, hence why I said pre-pre-alpha, but at the moment the thumb, title, and plot summary work just fine Wink ) I would, of course, suggest testing it on only a few folders at a time, however (e.g. don't just set the anime root folder to it Wink )

Here's the latest version. I'm working on episode support, and once again, I have no idea what I'm doing wrong Wink (seems to be a running theme)

Code:
<?xml version="1.0" encoding="utf-8"?><scraper framework="1" date="2009-11-15" name="AniDB.net" content="tvshows" thumb="anidb.jpg" language="en">
    <NfoUrl dest="3">
        <RegExp input="$$1" output="\1" dest="3">
            <expression></expression>
        </RegExp>
    </NfoUrl>
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/animedb.pl\?show=animelist&amp;adb.search=\1&lt;/url&gt;" dest="3">
            <expression></expression>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
            <!--     Multiple Results  -->
            <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\3&lt;/title&gt;&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes" noclean="1">&lt;a href="(animedb.pl\?show=anime&amp;amp;aid=([0-9]*))"&gt;([^&lt;]*)&lt;/a&gt;</expression>
            </RegExp>
            <expression noclean="1"></expression>
            
            <!--     Only one Result  -->
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\1&lt;/title&gt;&lt;url gzip=&quot;yes&quot;&gt;\2&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="no" noclean="1">&lt;th class=&quot;field&quot;&gt;Main Title&lt;/th&gt;.....&lt;td class=&quot;value&quot;&gt;(.[^\n]*)....&lt;a class=&quot;shortlink&quot; href=&quot;(http.[^&quot;]*)</expression>
            </RegExp>
        </RegExp>
    </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$8" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="8">
                <expression repeat="yes">&lt;th class="field"&gt;Main Title&lt;/th&gt;.....&lt;td class="value"&gt;(.[^\n]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;year&gt;\1&lt;/year&gt;" dest="8+">
                <expression trim="1" noclean="1">&lt;th class="field"&gt;Year&lt;/th&gt;.[^&gt;]*&gt;([^&lt;]*)|$</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="8+">
                <expression>&lt;div class="image".[^"]*"(http.[^"]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;rating&gt;\1&lt;/rating&gt;" dest="8+">
                <expression>animevotes&amp;amp;aid=[0-9]*"&gt;(.[^&lt;]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="8+">
                <expression>class=&quot;desc&quot;&gt;(.*)&lt;/div&gt;</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;episode&gt;\1&lt;/episode&gt;" dest="8+">
                <expression repeat="no">&lt;td class=&quot;epno lastep&quot;&gt;([0-9]+)&lt;/td&gt;</expression>
            </RegExp>

            <expression noclean="1"></expression>
        </RegExp>
        <RegExp input="$$10" output="&lt;episodeguide&gt;\1&lt;/episodeguide&gt;" dest="3+">
            <RegExp input="$$1" output="&lt;episode&gt;&lt;title&gt;\2&lt;/title&gt;&lt;epnum&gt;\1&lt;/epnum&gt;&lt;/episode&gt;" dest="10">
                <expression repeat="yes">&lt;td class=&quot;id eid&quot;&gt;&lt;a href.[^&gt;]*&gt;([0-9]+).*?label.[^&gt;]*&gt;(.[^&lt;]*)</expression>
            </RegExp>
            <expression noclean="1"></expression>
        </RegExp>
    </GetDetails>
</scraper>

From what I can gather (I am SO adding this to the scraper Wiki once I understand it) from glancing at the tvdb scraper source, the episode format is part of the "GetDetails" (are these "sections" arbitrary?) section. The format seems to be as follows:
Code:
<episodeguide>
    <episode>
        <title>Title of this ep</title>
        <enum>XX</enum>
    </episode>
</episodeguide>

And it seems to be after <details>. At the moment, I think I'm doing this right, but once again, I can only check my regex with ScraperXML. Is there any way to see XBMC's output on a scrape run? Is there a scraper log flag I need to toggle?


- spiff - 2010-01-02 11:05

debug logging logs the entire process. and that is not the format, it is

Code:
<episodeguide>
  <episode>
    <title>.</title>
    <url>..</url>
    <season>...</season>
     <epnum>...</epnum>
     <id>..</id>
     <airdate>...</airdate>
</episode>
</episodeguide>
where season, epnum, url is mandatory and title a big plus.


- MukiDA - 2010-01-02 15:59

Thanks again, Spiff. I inadvertently caught a bug I'd almost ignored. Plot details was grabbing WAY too much junk after the actual plot details; the regex was wrong. So now, as far as I can tell, the info I got for the episode guide is just fine, but it keeps telling me I have "0" episodes in my test show. Just to make sure I wasn't do anything wrong, I renamed one of the shows "3x3 Eyes S01E01.mkv" (I know nobody judges here, but for the sake of clarity I own 3x3 Eyes on DVD and can produce a photo of me holding the disc on request Wink )

Here's my episode guide from the debug log (I added whitespace myself; anyone know an automated way to do this in vim?). It starts immediately before </details>. Does the <url> in it have to be a full episode info file?

Code:
<episodeguide>
    <episode>
        <url>http://anidb.net/</url>
        <season>1</season>
        <title>Transmigration</title>
        <epnum>1</epnum>
    </episode>
    <episode>
        <url>http://anidb.net/</url>
        <season>1</season>
        <title>Yakumo
        </title>
        <epnum>2</epnum>
    </episode>
    <episode>
        <url>http://anidb.net/</url>
        <season>1</season>
        <title>Sacrifice</title>
        <epnum>3</epnum>
    </episode>
    <episode>
        <url>http://anidb.net/</url>
        <season>1</season>
        <title>Straying</title>
        <epnum>4</epnum>
    </episode>
</episodeguide>

Here's the full scraped info so far on 3x3 eyes (straight from the log, unedited):
Code:
<details><title>3x3 Eyes</title><title>3x3 Eyes</title><year>25.07.1991 till 19.03.1992</year><thumb>http://img7.anidb.net/pics/anime/22311.jpg</thumb><rating>6.85</rating><plot>Pai is the last of the Sanjiyan -- a magical race of 3-eyed creatures, and she comes in search of Tokyo high-school student Yakumo with news of his father's death and hopes of becoming human. After a fatal accident, Pai is forced to absorb Yakumo's soul to keep him from dying, making him an undead creature bound to her. Their journey to make Pai human becomes complicated with dark forces seeking to stop them, especially when Pai's crueler nature emerges...
                                                    </plot><episode>4</episode><episodeguide><episode><url>http://anidb.net/</url><season>1</season><title>Transmigration</title><epnum>1</epnum></episode><episode><url>http://anidb.net/</url><season>1</season><title>Yakumo</title><epnum>2</epnum></episode><episode><url>http://anidb.net/</url><season>1</season><title>Sacrifice</title><epnum>3</epnum></episode><episode><url>http://anidb.net/</url><season>1</season><title>Straying</title><epnum>4</epnum></episode></episodeguide></details>


Of course, here's the newest version (should I just attach instead?)

Code:
<?xml version="1.0" encoding="utf-8"?><scraper framework="1" date="2009-11-15" name="AniDB.net" content="tvshows" thumb="anidb.jpg" language="en">
    <NfoUrl dest="3">
        <RegExp input="$$1" output="\1" dest="3">
            <expression></expression>
        </RegExp>
    </NfoUrl>
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/animedb.pl\?show=animelist&amp;adb.search=\1&lt;/url&gt;" dest="3">
            <expression></expression>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
            <!--     Multiple Results  -->
            <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\3&lt;/title&gt;&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes" noclean="1">&lt;a href="(animedb.pl\?show=anime&amp;amp;aid=([0-9]*))"&gt;([^&lt;]*)&lt;/a&gt;</expression>
            </RegExp>
            <expression noclean="1"></expression>
            
            <!--     Only one Result  -->
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\1&lt;/title&gt;&lt;url gzip=&quot;yes&quot;&gt;\2&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="no" noclean="1">&lt;th class=&quot;field&quot;&gt;Main Title&lt;/th&gt;.....&lt;td class=&quot;value&quot;&gt;(.[^\n]*)....&lt;a class=&quot;shortlink&quot; href=&quot;(http.[^&quot;]*)</expression>
            </RegExp>
        </RegExp>
    </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$8" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="8">
                <expression repeat="yes">&lt;th class="field"&gt;Main Title&lt;/th&gt;.....&lt;td class="value"&gt;(.[^\n]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;year&gt;\1&lt;/year&gt;" dest="8+">
                <expression trim="1" noclean="1">&lt;th class="field"&gt;Year&lt;/th&gt;.[^&gt;]*&gt;([^&lt;]*)|$</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="8+">
                <expression>&lt;div class="image".[^"]*"(http.[^"]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;rating&gt;\1&lt;/rating&gt;" dest="8+">
                <expression>animevotes&amp;amp;aid=[0-9]*"&gt;(.[^&lt;]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="8+">
                <expression>class=&quot;desc&quot;&gt;(.[^&lt;]+)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;episode&gt;\1&lt;/episode&gt;" dest="8+">
                <expression repeat="no">&lt;td class=&quot;epno lastep&quot;&gt;([0-9]+)&lt;/td&gt;</expression>
            </RegExp>

            <expression noclean="1"></expression>
        </RegExp>
        <RegExp input="$$10" output="&lt;episodeguide&gt;\1&lt;/episodeguide&gt;" dest="3+">
            <RegExp input="$$1" output="&lt;episode&gt;&lt;url&gt;http://anidb.net/&lt;/url&gt;&lt;season&gt;1&lt;/season&gt;&lt;title&gt;\2&lt;/title&gt;&lt;epnum&gt;\1&lt;/epnum&gt;&lt;/episode&gt;" dest="10">
                <expression repeat="yes">&lt;td class=&quot;id eid&quot;&gt;&lt;a href.[^&gt;]*&gt;([0-9]+).*?label.[^&gt;]*&gt;(.[^&lt;]*)</expression>
            </RegExp>
            <expression noclean="1"></expression>
        </RegExp>
    </GetDetails>
</scraper>



- eldon - 2010-01-02 17:29

@MukiDA

been trying to make a scrapper for anidb too based on your first draft.

I think you can safely use regex lazyness as it's widely used in all the other scrapers although it is said to be "not working" in the documentation. So for example your plot expression could look like this :
Code:
<expression trim="1">class=&quot;desc&quot;&gt;\s*(.*?)\s*&lt;/div</expression>
it makes sure no whitespace is left around. Doesn't make much difference with your code but it's just an example, lazyness is very useful when parsing is complicated.

Anyways i made a few other modifications but there's one thing that bothers me right now which is the year entry in GetDetails, for some reason it will always be shown as 65535 (0xFFFF) no matter what i put in the <year></year> field. Any idea what's going on ?

And i'll try adding some fanart from thetvdb as i've seen there's quite a lot there for animes.


- MukiDA - 2010-01-02 18:38

You can scrape from more than one site? o_0

... oh wait, I guess the spec does have quite a bit of room for that. Wink

Okay, here's another shot at finding the episode data. For some odd reason, GetEpisodeDetails never shows up in the log. Of course, I don't have a clear understand of exactly how that function works. Hopefully someone can point out my folly. Once again, thanks to everyone here for the help, especially spiff, who seems to be the ultimate one-man-orchestra on scraping.

Code:
<?xml version="1.0" encoding="utf-8"?><scraper framework="1" date="2009-11-15" name="AniDB.net" content="tvshows" thumb="anidb.jpg" language="en">
    <NfoUrl dest="3">
        <RegExp input="$$1" output="\1" dest="3">
            <expression></expression>
        </RegExp>
    </NfoUrl>
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/animedb.pl\?show=animelist&amp;adb.search=\1&lt;/url&gt;" dest="3">
            <expression></expression>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
            <!--     Multiple Results  -->
            <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\3&lt;/title&gt;&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes" noclean="1">&lt;a href="(animedb.pl\?show=anime&amp;amp;aid=([0-9]*))"&gt;([^&lt;]*)&lt;/a&gt;</expression>
            </RegExp>
            <expression noclean="1"></expression>
            
            <!--     Only one Result  -->
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\1&lt;/title&gt;&lt;url gzip=&quot;yes&quot;&gt;\2&lt;/url&gt;&lt;/entity&gt;" dest="5+">
                <expression repeat="no" noclean="1">&lt;th class=&quot;field&quot;&gt;Main Title&lt;/th&gt;.....&lt;td class=&quot;value&quot;&gt;(.[^\n]*)....&lt;a class=&quot;shortlink&quot; href=&quot;(http.[^&quot;]*)</expression>
            </RegExp>
        </RegExp>
    </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$8" output="&lt;details&gt;\1" dest="3">
            <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="8">
                <expression repeat="yes">&lt;th class="field"&gt;Main Title&lt;/th&gt;.....&lt;td class="value"&gt;(.[^\n]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;year&gt;\1&lt;/year&gt;" dest="8+">
                <expression trim="1" noclean="1">&lt;th class="field"&gt;Year&lt;/th&gt;.[^&gt;]*&gt;([^&lt;]*)|$</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="8+">
                <expression>&lt;div class="image".[^"]*"(http.[^"]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;rating&gt;\1&lt;/rating&gt;" dest="8+">
                <expression>animevotes&amp;amp;aid=[0-9]*"&gt;(.[^&lt;]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="8+">
                <expression>class=&quot;desc&quot;&gt;(.[^&lt;]*)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;episode&gt;\1&lt;/episode&gt;" dest="8+">
                <expression repeat="no">&lt;td class=&quot;epno lastep&quot;&gt;([0-9]+)&lt;/td&gt;</expression>
            </RegExp>

            <expression noclean="1"></expression>
        </RegExp>
        <RegExp input="$$10" output="&lt;episodeguide&gt;\1&lt;/episodeguide&gt;&lt;/details&gt;" dest="3+">
            <RegExp input="$$1" output="&lt;episode&gt;&lt;url gzip=&quot;yes&quot;&gt;http://animedb.net/animedb.pl\?show=ep\&amp;\1&lt;/url&gt;&lt;season&gt;1&lt;/season&gt;&lt;title&gt;\3&lt;/title&gt;&lt;epnum&gt;\2&lt;/epnum&gt;&lt;/episode&gt;" dest="10">
                <expression repeat="yes">&lt;td class=&quot;id eid&quot;&gt;&lt;a href=&quot;animedb.pl\?show=ep\&amp;amp\;(.[^&quot;]*)&quot;&gt;([0-9]+).*?label.[^&gt;]*&gt;(.[^&lt;]*)</expression>
            </RegExp>
            <expression noclean="1"></expression>
        </RegExp>
    </GetDetails>
    <GetEpisodeDetails dest="3">
        <RegExp input="$$7" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot; standalone=&quot;yes&quot;?&gt;&lt;details&gt;&lt;title&gt;\1&lt;/title&gt;&lt;season&gt;1&lt;/season&gt;&lt;title&gt;\3&lt;/title&gt;&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="\1" dest="7">
                <expression noclean="1">Main Title&lt;/th&gt;.[^&lt;]*&lt;td class=&quot;value&quot;&gt;(.[^\(]*)\(</expression>
            </RegExp>
        </RegExp>
    </GetEpisodeDetails>
</scraper>



- zosky - 2010-01-03 10:32

Very cool. im created a temp source & added excel saga (Official Title)
aka Heppoko Jikken Animation Excel Saga (Main Title)

main and official title dont differ often but here is another ex >
> official = Koukaku Kidoutai S.A.C. 2nd GIG
> main = Ghost in the Shell: Stand Alone Complex 2nd GIG
could you please grab that main or make it an option Smile

also beyond fanArt from thetvDB
please consider the wideBanner too

more impotently -we need eps Laugh
my files are enumerated (for the tvdb which does have this one).

hope this will help ... (BTW. i have no tvshow.nfo in this dir)

Code:
02:48:30 T:2997828464 M:1976143872   ERROR: CVideoInfoScanner::OnProcessSeriesFolder: Asked to lookup episode /mnt/newHD/anime2/Excel.Saga/Excel_Saga_s01e01.mkv online, [color=Red]but we have no episode guide. Check your tvshow.nfo and make sure the <episodeguide> tag is in place.[/color]



- eldon - 2010-01-03 12:23

@zosky
afaik the episode guide does not work with that version yet, it is missplaced in the data.

i managed to correct that and fecth the episodes list and data but for some reason i can't get it displayed anywhere in my xbmc.

on top of that when i switch to library mode the Anime folder i use for my test is empty, although the files are present and that anime folder has the correct info, so i assume the episodes were not recognized ?

That's probably a side effect of the info "Episodes: 0 (0 watched - 0 unwatched)" i get no matter what i do..

Could someone please explain to me how the scraper engine is supposed to handle episodes list ? or point me to an up to date documentation on the subject.
I have mimicked tvrage and tvdb scrapers but won't manage to get anything to work here with anidb, although <details>, <episodes> and GetEpisodeDetails <details> are populated.

thx

- edit -
i'm leaving the original post so anyone can see the problem i had.
reading zosky's post i figured out what was wrong in my setup, the mkv files i have are not "enumerated" by xbmc, although they all contain the show title and an episode number, but xbmc was not able to identify them clearly because of other (group) info in the filename.
Once i've renamed the files they got enumerated after refreshing the anime folder data and i now have all the files and their respective data present..

Now i'll be able to finish my first draft, i still have to understand how caching works because i'm fetching unnecessary pages.


- eldon - 2010-01-03 16:34

@spiff (and others)

i'm struggling with a bug in the scrapper, need some quick help.

i managed to understand how you fetch additional pages and external pages so i added some fanart scrapping but i have the following problem :

in order to get the fanart on thetvdb i need to perform a search on their engine, i don't have any id to pass to the site from anidb. I use a function as follows :
Code:
<url gzip="yes" function="GetFanart">http://www.thetvdb.com/index.php?seriesname=Michiko to Hatchin&fieldlocation=1&language=7&genre=Animation&year=&order=fanartcount+desc&searching=Search&tab=advancedsearch</url>

using the anime name i get from the current anidb page being parsed. I don't know if the anime name used in the selection dialog is still somewhere but it probably makes no difference..
and so xbmc returns the following error :
Code:
ERROR: InternalGetDetails: Unable to parse web site [http://www.thetvdb.com/index.php?seriesname=Michiko to Hatchin&fieldlocation=1&language=7&genre=Animation&year=&order=fanartcount+desc&searching=Search&tab=advancedsearch]

it's probably an urlencoding error caused by the presence of spaces in the anime name.

Is there a way i can replace the spaces (with "+") or do some urlencoding on the pattern match i'll use for the <url> ?


Then could you point me to the current xml format for thumb and fanart. And there are some posters and banners on that website, can we use them for something in xbmc skins ?

thx


- MukiDA - 2010-01-03 18:55

@eldon

The site you're trying to parse isn't g-zipped (I just tried wget-ing it and it cats just fine), so you can remove the gzip="yes" part. In addition, you might need to escape the question mark (e.g. \? instead of ?) and possibly the ampersands (\&).

@zosky

I don't think my per-episode code works AT ALL yet. GetEpisodeDetails doesn't return ANYTHING (it doesn't even get called) in the XBMC debug log. I'm doing something horribly wrong there and I don't yet know what.

The TVDB.com scraping is a pretty brilliant idea, and I've love to use it as a realtime fallback. (e.g. scrape it if AniDB is missing info, and maybe even add its poster thumbs)

As a side note, and I think the lack of this is due more to a lack of momentum than anything else, but Anime could really use its own section, or some way to add custom sections to the Library. If it required adding a few more image text "blocks" for existing skins, so be it. Right now the closest thing we can do is sort by genre, and it feels unusably cumbersome if you have a frag-ton of anime and someone in your abode that's not into it is looking for a random TV show to watch; it puts Ghost in the Shell next to both Firefly 'n Dexter's Lab (as it would be categorized under "Animation", and "Science Fiction"). I guess it's not a remotely easy fix (as its Library section would need to be capable of dealing with both Movies and TV shows simultaneously) but it'd be nice.


- eldon - 2010-01-03 19:53

okay i managed to get fanart working so i'll post my first draft which is quite complete and just misses some cast & crew info.

Code:
<?xml version="1.0" encoding="utf-8"?><scraper framework="1" date="2009-11-15" name="AniDB.net" content="tvshows" thumb="anidb.png" language="en">
    <GetSettings dest="3">
        <RegExp input="$$5" output="&lt;settings&gt;\1&lt;/settings&gt;" dest="3">
            <RegExp input="$$1" output="&lt;setting label=&quot;Enable fanart from thetvdb.org&quot; type=&quot;bool&quot; id=&quot;fanart&quot; default=&quot;true&quot;&gt;&lt;/setting&gt;" dest="5">
                <expression/>
            </RegExp>
        </RegExp>
    </GetSettings>
    <CreateSearchUrl dest="3">
        <RegExp input="$$1" output="&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/animedb.pl?type.web=1&amp;type.unknown=1&amp;type.tvspecial=1&amp;type.tvseries=1&amp;type.ova=1&amp;type.other=1&amp;type.musicvideo=1&amp;type.movie=1&amp;show=animelist&amp;orderby.name=0.1&amp;noalias=1&amp;do.update=update&amp;adb.search=\1&lt;/url&gt;" dest="3"> -->
            <expression>([^\)\(]+)</expression>
        </RegExp>
    </CreateSearchUrl>
    <GetSearchResults dest="8">
        <RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="8">
            <!--     Multiple Results  -->
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\3 - \4&lt;/title&gt;&lt;year&gt;\5&lt;/year&gt;&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/\1&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="yes" noclean="1">&lt;a href=&quot;(animedb.pl\?show=anime&amp;amp;aid=([0-9]*))&quot;&gt;([^&lt;]*)&lt;/a&gt;.*?&lt;td class=&quot;type[^&gt;]+&gt;([^&lt;]+)&lt;/td&gt;.*?airdate.*?([0-9]{4})?&lt;/td&gt;</expression>
            </RegExp>
            <expression noclean="1"></expression>            
            <!--     Only one Result  -->
            <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\1 - \3&lt;/title&gt;&lt;year&gt;\4&lt;/year&gt;&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/\2&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="5">
                <expression repeat="no" noclean="1">Main Title&lt;/th&gt;.*?&gt;([^\r\n\t]+).*?href=&quot;http://anidb.net/([^&quot;]*).*?Type&lt;/th&gt;[^&gt;]+&gt;([^,&lt;]*).*?Year&lt;/th&gt;.*?([0-9]{4})?(?: till|&lt;/)</expression>
            </RegExp>
            <expression clear="yes" noclean="1"/>
        </RegExp>
    </GetSearchResults>
    <GetDetails dest="3">
        <RegExp input="$$8" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="8">
                <expression trim="1">Main Title&lt;/th&gt;.*?&gt;([^\r\n\(]+)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;year&gt;\1&lt;/year&gt;" dest="8+">
                <expression>Year&lt;/th&gt;.*?([0-9]{4})(?: till|&lt;/)</expression>
            </RegExp>
            <!--<div class="image"> \n <img src="http://img7.anidb.net/pics/anime/13614.jpg" alt="Michiko to Hatchin" />-->
            <RegExp input="$$1" output="&lt;thumb&gt;\1&lt;/thumb&gt;" dest="8+">
                <expression>&lt;div class=&quot;image&quot;.*?(http[^&quot;]*)</expression>
            </RegExp>
            <!--<a href="animedb.pl?show=animevotes&amp;aid=5779">7.74</a>-->
            <RegExp input="$$1" output="&lt;rating&gt;\1&lt;/rating&gt;" dest="8+">
                <expression>animevotes&amp;amp;aid=[0-9]*&quot;&gt;([^&lt;]*)</expression>
            </RegExp>
            <!-- <a href="animedb.pl?show=lexicon&amp;vtype=cat&amp;relid=4" title="search for other anime with this category">Action</a>,-->
            <RegExp input="$$1" output="&lt;genre&gt;\1&lt;/genre&gt;" dest="8+">
                <expression repeat="yes">animedb.pl\?show=lexicon&amp;amp;vtype=cat&amp;amp;relid=[0-9]+[^&gt;]*?&gt;([^&lt;]+)&lt;/a</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;studio&gt;\1&lt;/studio&gt;" dest="8+">
                <expression>Animation Work[^&gt;]*&gt;([^&lt;]+)&lt;/a</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;premiered&gt;\1&lt;/premiered&gt;" dest="8+">
                <expression>Year&lt;/th&gt;.*?([0-9]{4})(?: till|&lt;/)</expression>
            </RegExp>
            <RegExp input="$$1" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="8+">
                <expression trim="1">class=&quot;desc&quot;&gt;\s*(.*?)\s*&lt;/div</expression>
            </RegExp>
            <!--<table id="characterlist" class="characterlist"> .. </table>-->
            <RegExp input="$$6" output="&lt;actor&gt;&lt;thumb&gt;&lt;/thumb&gt;&lt;name&gt;\2&lt;/name&gt;&lt;role&gt;\1&lt;/role&gt;&lt;/actor&gt;" dest="8+">
                <RegExp input="$$1" output="\1" dest="6">
                    <expression noclean="1">&lt;table id=&quot;characterlist&quot; class=&quot;characterlist&quot;&gt;(.*?)&lt;/table&gt;</expression>
                </RegExp>    
                <expression repeat="yes">animedb\.pl\?show=character&amp;amp;charid=[0-9]+&quot;&gt;([^&lt;]+)&lt;/a.*?animedb\.pl\?show=creator&amp;amp;creatorid=[0-9]+&quot;&gt;([^&lt;]+)&lt;/a</expression>
            </RegExp>
            <RegExp input="$$3" output="&lt;url function=&quot;GetFanart&quot;&gt;http://www.thetvdb.com/index.php?seriesname=$$3&amp;fieldlocation=1&amp;language=7&amp;genre=Animation&amp;year=&amp;order=fanartcount+desc&amp;searching=Search&amp;tab=advancedsearch&lt;/url&gt;" dest="8+">
                <RegExp input="$$1" output="\1" dest="7">
                    <expression trim="1">Main Title&lt;/th&gt;.*?&gt;([^\r\n\(]+)</expression>
                </RegExp>
                <RegExp input="$$7" output="\1+" dest="3">
                    <expression repeat="yes" trim="1">([^\s]+)</expression>
                </RegExp>
                <RegExp input="$$3" output="\1" dest="3">
                    <expression>([^\s]+)\+</expression>
                </RegExp>
                <expression noclean="1"/>
            </RegExp>            
            <!-- <input type="hidden" name="aid" value="5779" /> OR use cache ? -->
            <RegExp input="$$1" output="&lt;episodeguide&gt;&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/animedb.pl\?show=anime&amp;aid=\1&lt;/url&gt;&lt;/episodeguide&gt;" dest="8+">
                <expression>&lt;input type=&quot;hidden&quot; name=&quot;aid&quot; value=&quot;([0-9]+)&quot; /&gt;</expression>
            </RegExp>
            <expression noclean="1"/>
        </RegExp>
    </GetDetails>
    <GetFanart dest="5">
        <RegExp input="$$1" output="&lt;details&gt;&lt;url gzip=&quot;yes&quot; function=&quot;GetFanartData&quot;&gt;http://www.thetvdb.com/index.php?tab=series&amp;id=\1&amp;lid=\2&lt;/url&gt;&lt;/details&gt;" dest="5">
            <expression>&lt;a href=&quot;/index\.php\?tab=series&amp;amp;id=([0-9]+)&amp;amp;lid=([0-9]+)&quot;.*?[1-9]+&lt;/td&gt;&lt;/tr&gt;</expression>
        </RegExp>
    </GetFanart>
    <GetFanartData dest="5">
        <RegExp input="$$8" output="&lt;details&gt;&lt;fanart&gt;\1&lt;/fanart&gt;&lt;/details&gt;" dest="5">
            <RegExp input="$$1" output="&lt;thumb preview=&quot;http://www.thetvdb.com\1&quot;&gt;http://www.thetvdb.com/\2&lt;/thumb&gt;" dest="8">
                <expression repeat="yes">&lt;img src=&quot;(/banners/_cache/fanart/original/[^&quot;]+)&quot;.*?&lt;a href=&quot;(banners/fanart/original/[^&quot;]+)&quot;</expression>
            </RegExp>    
            <expression noclean="1"/>
        </RegExp>
    </GetFanartData>
    <GetEpisodeList dest="3">

        <RegExp input="$$8" output="&lt;episodeguide&gt;\1&lt;/episodeguide&gt;" dest="3">
            <RegExp input="$$1" output="&lt;episode&gt;&lt;url gzip=&quot;yes&quot;&gt;http://anidb.net/perl-bin/animedb.pl?show=ep&amp;eid=\1&lt;/url&gt;&lt;season&gt;1&lt;/season&gt;&lt;title&gt;\3&lt;/title&gt;&lt;epnum&gt;\2&lt;/epnum&gt;&lt;/episode&gt;" dest="8+">
                <expression repeat="yes">id=&quot;eid_([0-9]+)&quot;.*?eid=[0-9]+&quot;&gt;([0-9]+)&lt;/a.*?label[^&gt;]*&gt;([^&lt;]+)</expression>
            </RegExp>
            <expression noclean="1"></expression>
        </RegExp>
    </GetEpisodeList>
    <GetEpisodeDetails dest="3">
        <RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
            <RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="5">
                <expression>Main Title&lt;/th&gt;.*?&gt;([^\r\n\(]+)</expression>
            </RegExp>                        
            <RegExp input="$$1" output="&lt;plot&gt;&lt;/plot&gt;" dest="5+">
                <expression/>
            </RegExp>
            <!--class="rating ep mid">7.74 <span-->    
            <RegExp input="$$1" output="&lt;rating&gt;\1&lt;/rating&gt;" dest="5+">
                <expression>class=&quot;rating[^&gt;]*&gt;([0-9\.]+)</expression>
            </RegExp>    
            <!--
                <th class="field">Air/Release Date</th>
                <td class="value">16.10.2008</td>
            -->
            <RegExp input="$$1" output="&lt;aired&gt;\1&lt;/aired&gt;" dest="5+">
                <expression>Air/Release.*?&gt;([0-9\.]+)&lt;/td</expression>
            </RegExp>                    
            <expression noclean="1"/>
        </RegExp>        
    </GetEpisodeDetails>
</scraper>

the scrapper settings dialog doesn't show up with the fanart option (on/off), i don't know why, something's missing somewhere.

cast & crew info is fetched from the anime main page so no thumbnail for them at the moment, and no crew at all is parsed yet so only cast is there.

I'll add the posters and banners from thetvdb and see what happens..

i've left some of the raw html, to be parsed, above most of the expressions so you can see how i parse it and modify it if you find something unnecessary. I had to remove big chunks because it was exceeding the max post size :\

the GetFanart function call code is quite dirty and only there to substitute spaces with "+" to urlencode the search text, it's quite ugly, let me know if there's a more natural way to do that task.

I also took a few minutes making a scrapper icon for anidb :
[Image: f9262e62261318.gif]

let me know if it works with your anime library, there are probably quite a few bugs here and there as i only tested that on two animes (shigurui and Michiko to hatchin and cowboy bebop for a complete cast page).

Let me know if it works with your anime library.