[WIP] AniDB.net Anime Video Scraper

  Thread Rating:
  • 3 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
GrEn Offline
Junior Member
Posts: 18
Joined: Dec 2004
Reputation: 0
Post: #46
That's definitely what it is. I am still using the 9.11 stable release version. Haven't had a reason to update till now. Guess I should have paid more attention to eldons post. I only tried using the non google version with the uncommented line once (so I could see where anidb.xml was going) because I know anidb doesn't like scraping of it servers like that. I Will be upgrading today or tomorrow to the svn build to see how the anidb.xml works out.

Thanks for the matching info. Once AOM has renamed them ill try it out and see how it goes.
find quote
gokudo Offline
Member
Posts: 77
Joined: Dec 2009
Reputation: 1
Location: Germany
Post: #47
Wow, i'm just happy there is finally some progress on that subject. I'll try using bambi's version with google, i hope someday we will finally have a full working method. It's just sad that xbmc and anidb are so incompatible.

Update: Have been playing around with it. Since i'm using only stable version 9.11 i switched to google. Some anime he did find, some he didn't. When i try to manually search for and see some proposed results and click one of them i noticed a strange behaviour. Most times he returns 0 content though i clicked a result. Is this because google search is very buggy? But even then why is he showing actual results when he doesn't follow them and grab the content i wonder. Also, although i have let's say Banner of the Stars III he don#t find it, even not Sekai no Senki III when i enter it to search for although thats the official anidb title.
(This post was last modified: 2010-04-14 18:16 by gokudo.)
find quote
bambi73 Offline
Senior Member
Posts: 194
Joined: Jan 2010
Reputation: 0
Location: Czech Republic
Post: #48
As I wrote in my "release" post i don't use google search, so it wasn't tested and tweaked so much, so yeah, you can call it buggy Nod (originally i even didn't planned to include it).
I checked your case with Banner of the Stars and looks like i see problem. You can check yourself (these are search urls produced by scraper):

Banner of the Stars III search:
http://www.google.com/search?q=site:anid...I&filter=0

Hayate no Gotoko 2 search:
http://www.google.com/search?q=site:anid...2&filter=0

I think you can see difference in results. Current version of scrapper is able correctly process second search, but not first. I can look at it when i get back from work at evening, but even if i'll able to add this link to search result it will look like a2673 or anidb.net/a2673 because there is no anime title, which is not so helpfull imho.
Best for you will be to consider update XBMC to some newer build supporting scraper cache and use anidb.xml, but i can understand that it can be risky for some working "productive" system. I'll see what i can do for you.

P.S. I'm now working on new version of scraper which introduces some new features which depends on scraper cache even more, so you should consider XBMC update Wink.
find quote
bambi73 Offline
Senior Member
Posts: 194
Joined: Jan 2010
Reputation: 0
Location: Czech Republic
Post: #49
So i did some changes to Google search part, but had not so much time to test it. If you can try it, replace

Code:
<GetSearchResults .....
..
..
</GetSearchResults>

part with following code:

Code:
<GetSearchResults clearbuffers="no" dest="4">
    <RegExp input="$$4" output="&lt;results&gt;\1&lt;/results&gt;" dest="4">
      <RegExp conditional="Google" input="$$4" output="\1" dest="4">
        <RegExp input="$$1" output="&lt;anidbid&gt;\1&lt;/anidbid&gt;&lt;title&gt;\2&lt;/title&gt;" dest="5">
          <expression clear="yes" repeat="yes">(?i)&lt;a href=&quot;http://anidb\.net/perl-bin/animedb\.pl\?show=anime&amp\;aid=(\d+)&quot;[^&gt;]*&gt;(.*?)&lt;/a&gt;</expression>
        </RegExp>
        <RegExp input="$$1" output="&lt;anidbid&gt;\1&lt;/anidbid&gt;&lt;title&gt;\2&lt;/title&gt;" dest="5+">
          <expression repeat="yes">(?i)&lt;a href=&quot;http://anidb\.net/a(\d+)&quot;[^&gt;]*&gt;(.*?)&lt;/a&gt;</expression>
        </RegExp>
        <RegExp input="$$5" output="&lt;entity&gt;&lt;title&gt;Google Search : A\1 ~ \2&lt;/title&gt;&lt;url gzip=&quot;yes&quot; cache=&quot;\1.xml&quot;&gt;http://api.anidb.net:9001/httpapi?request=anime&amp;client=xbmcscrap&amp;clientver=1&amp;protover=1&amp;aid=\1&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="4">
           <expression clear="yes" repeat="yes">(?i)&lt;anidbid&gt;(\d+)&lt;/anidbid&gt;&lt;title&gt;(?!anidb\.net)([^&lt;]*)&lt;/title&gt;</expression>
        </RegExp>
        <RegExp input="$$5" output="&lt;url function=&quot;GetSearchResultsExt&quot; gzip=&quot;yes&quot; cache=&quot;\1.xml&quot;&gt;http://api.anidb.net:9001/httpapi?request=anime&amp;client=xbmcscrap&amp;clientver=1&amp;protover=1&amp;aid=\1&lt;/url&gt;" dest="4+">
           <expression>(?i)&lt;anidbid&gt;(\d+)&lt;/anidbid&gt;&lt;title&gt;(?=anidb\.net)[^&lt;]*&lt;/title&gt;</expression>
        </RegExp>
        <RegExp input="$$5" output="\1" dest="20">
           <expression clear="yes" noclean="1">(?i)&lt;anidbid&gt;\d+&lt;/anidbid&gt;&lt;title&gt;(?=anidb\.net)[^&lt;]*&lt;/title&gt;((?:&lt;anidbid&gt;\d+&lt;/anidbid&gt;&lt;title&gt;(?=anidb\.net)[^&lt;]*&lt;/title&gt;)*)</expression>
        </RegExp>
        <RegExp input="" output="\1" dest="19">
          <expression/>
        </RegExp>
        <expression noclean="1"/>
      </RegExp>
      <RegExp conditional="!Google" input="$$4" output="\1" dest="4">
        <RegExp input="$$2" output="\1" dest="6">
          <expression>title=(.+)</expression>
        </RegExp>
        <RegExp input="$$6" output="\1[^&lt;]*" dest="6">
          <expression repeat="yes">(?i)([a-z0-9]+)(?:%[a-f0-9]{2})*</expression>
        </RegExp>
        <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;Anidb Search : \2&lt;/title&gt;&lt;url gzip=&quot;yes&quot; cache=&quot;\1.xml&quot;&gt;http://api.anidb.net:9001/httpapi?request=anime&amp;client=xbmcscrap&amp;clientver=1&amp;protover=1&amp;aid=\1&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="4">
          <expression repeat="yes">&lt;anime aid=&quot;(\d+)&quot;&gt;(?:[^&lt;]+&lt;title[^&lt;]+&lt;/title&gt;){0,}[^&lt;]+&lt;title type=&quot;main&quot;[^&gt;]*&gt;([^&lt;]*$$6[^&lt;]*)&lt;/title&gt;</expression>
        </RegExp>
        <expression noclean="1"/>
      </RegExp>
      <expression noclean="1"/>
    </RegExp>
  </GetSearchResults>

  <GetSearchResultsExt clearbuffers="no" dest="4">
    <RegExp input="$$4" output="&lt;results&gt;\1&lt;/results&gt;" dest="4">
      <RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;Google Search : A\1 ~ \2&lt;/title&gt;&lt;url gzip=&quot;yes&quot; cache=&quot;\1.xml&quot;&gt;http://api.anidb.net:9001/httpapi?request=anime&amp;client=xbmcscrap&amp;clientver=1&amp;protover=1&amp;aid=\1&lt;/url&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/entity&gt;" dest="19+">
        <expression trim="2">&lt;anime.*?id=&quot;(\d+)&quot;.*?&lt;title.*?type=&quot;main&quot;&gt;([^&lt;]+)</expression>
      </RegExp>
      <RegExp input="$$19" output="\1" dest="4">
        <expression noclean="1"/>
      </RegExp>
      <RegExp input="$$20" output="&lt;url function=&quot;GetSearchResultsExt&quot; gzip=&quot;yes&quot; cache=&quot;\1.xml&quot;&gt;http://api.anidb.net:9001/httpapi?request=anime&amp;client=xbmcscrap&amp;clientver=1&amp;protover=1&amp;aid=\1&lt;/url&gt;" dest="4">
         <expression>(?i)&lt;anidbid&gt;(\d+)&lt;/anidbid&gt;&lt;title&gt;[^&lt;]*&lt;/title&gt;</expression>
      </RegExp>
      <RegExp input="$$20" output="\1" dest="20">
         <expression clear="yes" noclean="1">(?i)&lt;anidbid&gt;\d+&lt;/anidbid&gt;&lt;title&gt;[^&lt;]*&lt;/title&gt;((?:&lt;anidbid&gt;\d+&lt;/anidbid&gt;&lt;title&gt;[^&lt;]*&lt;/title&gt;)*)</expression>
      </RegExp>
      <expression noclean="1"/>
    </RegExp>
  </GetSearchResultsExt>

And please report how it's working Wink
find quote
gokudo Offline
Member
Posts: 77
Joined: Dec 2009
Reputation: 1
Location: Germany
Post: #50
Sorry for the late reply, have been away over weekend. I will do that ASAP!

Update: Ok, i was able to test it now. Well, the problem that he returns no content after clicking a result is gone. The problem that google simply doesn't *know* certain shows on anidb even when i enter the official anidb title manually is still there. But as i understand it that's nothing you could fix, it's just so that google hasn't indexed those sites or something? Anyway thx for your work, i will get back to your scraper when i updated xbmc so that i don't have to rely on the google search anymore.
(This post was last modified: 2010-04-21 14:57 by gokudo.)
find quote
bambi73 Offline
Senior Member
Posts: 194
Joined: Jan 2010
Reputation: 0
Location: Czech Republic
Post: #51
Sorry, missed your reply because you only edited post.
You can try google search yourself in your browser, for example (simply replace spaces in name by +):

Code:
http://www.google.com/search?q=site:anidb.net+Lucky+Star&filter=0

If it return nothing then you are out of luck (and from my point of view you are using some obscure names, because google does decent job in indexing Wink).
If it returns something and you still got nothing in XBMC then best will be to post problematic titles here so i can check them directly.
find quote
bambi73 Offline
Senior Member
Posts: 194
Joined: Jan 2010
Reputation: 0
Location: Czech Republic
Post: #52
There is new version of AniDB scraper, now with major parts redone, some new features, optimalizations etc ...

Note: Last time i got too much into details and produced unreadable wall of text, so now i'll try to describe only basics, if you are interested in details ask me directly in this thread.

New features/major changes:

TheTVDB lookup chanages:
There is major change in philosophy how thetvdb lookup works now. In previous version scraper always tried to match anidb and thetvdb animes and episodes somehow, sometimes it worked sometimes not, but always it required post-scrape check by user (of course only if he care Wink) which is bit inconvenient for HTPC. In this new version i reversed this process, so now it's more like "check first, don't bother later". There is now anime-list.xml which works like mapping list between anidb and thetvdb. You can check and fill this mapping first on your desktop computer so later you don't need to care about scraper results. What it contains:
- anidb anime id to thetvdb show id mapping
- default thetvdb season - for 1:1 season relation between anidb and thetvdb
- explicit episode-to-episode, season-to-season mapping - for 1:N season relation between anidb and thetvdb or when numbering on both sites doesn't match
- explicit specials mapping before specific episode
- supplemental info for anime, all episodes and/or single episode (in case anidb info isn't enough or is missing)

If your anime id isn't found in anime-list.xml scraper will try find appropriate thetvdb show for fanart lookup in same way like previous version (recursive lookup over anime titles and then prequels), but extra episode details lookup will be ignored completely because without correct mapping it's more guessing than anything else.

Specials support:
- scraper is now able to process anidb specials - XBMC pick everything from season 0 as specials, unfortunately there is no way how to force season 0 so you must name your specials as S00Exx (or anything what fits your tvshow matching regexp) - if you are able compile your XBMC you can check posted diff.
- there is possibility to place specials either at the start or at the end of episode list (switchable in settings)
- there is possibility to place special before specific episode (for example usable for dvd only episodes/specials which belongs somewhere between regular episodes, often marked for example as episode 8.5 on anidb)
-- this placement can be explicitly set in anime-list.xml (there is no such info on anidb)
-- if anime-list.xml placement isn't used thetvdb one is used (if exist)

Other changes:
- some improvements in anime search
-- google one should work better now, but as i stated in previous release post, i don't use it so it isn't so thoroughly tested (post problematic titles if you want some help)
-- anidb.xml one now is extended to english official and x-jat synonym title names too
-- added anime id (that Axxxx) to search result to distinguish duplicates with different names
- some additional filtering for genres (only 10kB of regular expressions Wink)
- some additional filtering for plot summary (another 10kB)
- rest of changes are not visible to user


anime-list.xml:
- this file is currently hosted at Google sites and contains 767 unique (more or less checked) anime mappings
- xml structure should be understandable (at least for peoples who have idea how xml looks Wink), if there are some uncertainties ask me here
- you can edit file in your cache if you want add/change something (you need XBMC which support cache persistency - no 9.11, some newer 2010 builds)
- better solution (share it with others) is to post requested changes in this thread and i will add it to hosted file (post only changes, i won't compare whole files)

anidb.xml:
- this file for anidb search (not google one) is experimentaly hosted at Google sites for peoples without XBMC cache persistency
- but you should be aware that this file is ~2.5MB in size and your XBMC will download it for every single scrape, so it's imho more for testing than real usage, but suit yourself Smile

Forced season XBMC patch/diff:
- as you most likely know, XBMC forces season 1 for all files without season in name (like Angel Beats E01 - abc.mkv) but unfortunately there is no way how to force different season for other files. So in our case you must rename your specials to looks like Angel Beats S00E01 - Special 1.mkv to force season 0 for specials.
- personaly i name specials as Angel Beats S01 - Special 1.mkv so i did small modification to XBMC source code which allows me to use following configuration in advancedsettings.xml:

Code:
    <tvshowmatching>
        <regexp forcedSeason="1">(?i)[/\\].*? ()E(\d{2,3})([^/\\]*)</regexp>
        <regexp forcedSeason="0">(?i)[/\\].*? ()S(\d{2,3})([^/\\]*)</regexp>
    </tvshowmatching>


- if you are able to compile XBMC for yourself, you can download diff (against last stable svn revision) bellow. But you should be warned that last time when i did something in C/C++ was 15 years ago .... Big Grin


Scraper (use Download link in upper right corner):
http://pastebin.com/MxPu21eg

XBMC diff/patch (use Download link in upper right corner):
http://pastebin.com/LHQ2jG7E


Doh, tldr; wall of text again :o
find quote
sa10 Offline
Junior Member
Posts: 6
Joined: May 2010
Reputation: 0
Post: #53
Thank you bambi73!

This scraper worked nicely and downloaded the right info for many anime shows. However, it never seems to find the actual episodes. It always shows 0 out of 0 watched. My files have the usual naming convention, e.g. Angel Beats/[Mazui]_Angel_Beats_-_01v2_[B1437C35].mkv. Do I have to rename all of my files?

Also the scraper has trouble when the dircetory name is not the exact name of the show. Whenever I had something like codec information or underscores in the path, I had to add the show manually. Maybe you could change the scraper so that it interprets underscores as blanks, and also have it ignore phrases in brackets like [gg] or (h264)?
find quote
gokudo Offline
Member
Posts: 77
Joined: Dec 2009
Reputation: 1
Location: Germany
Post: #54
sa10 Wrote:Thank you bambi73!

This scraper worked nicely and downloaded the right info for many anime shows. However, it never seems to find the actual episodes. It always shows 0 out of 0 watched. My files have the usual naming convention, e.g. Angel Beats/[Mazui]_Angel_Beats_-_01v2_[B1437C35].mkv. Do I have to rename all of my files?

I think i can answer your first question. Yes you have to. XBMC only recognizes episodes which have the format SXXEXX somewhere in the filename. E.g. your episode should be named like Angel Beats S01E01.
(This post was last modified: 2010-05-02 15:07 by gokudo.)
find quote
bambi73 Offline
Senior Member
Posts: 194
Joined: Jan 2010
Reputation: 0
Location: Czech Republic
Post: #55
sa10 Wrote:Thank you bambi73!

This scraper worked nicely and downloaded the right info for many anime shows. However, it never seems to find the actual episodes. It always shows 0 out of 0 watched. My files have the usual naming convention, e.g. Angel Beats/[Mazui]_Angel_Beats_-_01v2_[B1437C35].mkv. Do I have to rename all of my files?

Also the scraper has trouble when the dircetory name is not the exact name of the show. Whenever I had something like codec information or underscores in the path, I had to add the show manually. Maybe you could change the scraper so that it interprets underscores as blanks, and also have it ignore phrases in brackets like [gg] or (h264)?
Hello,

there is XBMC advanced configuration which helps you with cleaning directory names. By default it should clean everything in (...) or [...] (never tried it), but be warned that this feature works terribly (at least up to last last stable pre-merge svn revision) because it cleans EVERYTHING from first match, so if you have for example directory named [gg] Canaan it will return empty string. If you have brackets at the end of directory names then it should work fine, if not please post there some examples where it works bad.

About episode file names, it is not 100% correct what gokudo wrote above, there is another XBMC advanced configuration which allows you fine tunning episode numbers matching, even without season number (season 1 is forced in this case which is what we need). Scraper has nothing to do with it, so i can't help you much Smile.
But i must agree with gokudo that you should rename your files because naming by groups vary and it'll be pain in the ass for you to correctly configure these matching regexps Big Grin. If i can propose you should try some AniDB client, personaly i use WebAOM started as Webstart application so you need only install Java on your computer.
find quote
gokudo Offline
Member
Posts: 77
Joined: Dec 2009
Reputation: 1
Location: Germany
Post: #56
bambi, I have to say thank you very very much. I just tested your new version of the scraper together with your provided .xml files and anidb search instead of google. For me that's the perfect choice, if it is possible to maintain. Since I can only use 9.11 stable on my htpc because the official ppa build has shutdown/reboot broken and build from newest svn doesn't even boot this would be the perfect choice. Reload of 2,5 MB each time doesn't really matter on my side, i only hope it doesn't get me banned (on anidb.net?) as said before somewhere in this thread?
(This post was last modified: 2010-05-02 16:53 by gokudo.)
find quote
bambi73 Offline
Senior Member
Posts: 194
Joined: Jan 2010
Reputation: 0
Location: Czech Republic
Post: #57
I mirrored anidb.xml file on Google sites, so you needn't be afraid of ban on anidb.net. In worst case my Google site will be suspended/banned, but i hope they don't care so much about bandwidth :o.
Of course i plan maintaing anime-list.xml because i'm using it personaly too. I'll add new titles always when anime season ends. If you want add some entries for older animes to that xml please post them in this thread. I plan occasionally (few times per year) update anidb.xml too, but it doesn't need to be so often because it contains only list of titles and corresponding ids. Guys at anidb.net add new titles long before show air so long before you actually need them.
find quote
Zarbis Offline
Junior Member
Posts: 17
Joined: Nov 2009
Reputation: 0
Post: #58
First of all i want to thank you bambi for this work. And i have a question: Is this scraper supposed to be installed in old way or it supports\will support new xbmc plugin system?
find quote
bambi73 Offline
Senior Member
Posts: 194
Joined: Jan 2010
Reputation: 0
Location: Czech Republic
Post: #59
I tried to compile trunk some time ago but it was big mess so reverted back to safe pre-merge revision, so till now i never had runing XBMC v10.05. Of course i plan to add all requirements for new plugin/addon system when it comes out in some stable form, but right now i have no idea what it needs Tongue. Looks like you have some knowledge about plugin/addon system, can you point me to some source of informations?
find quote
Zarbis Offline
Junior Member
Posts: 17
Joined: Nov 2009
Reputation: 0
Post: #60
bambi73 Wrote:Looks like you have some knowledge about plugin/addon system, can you point me to some source of informations?

Actually nope, I'm nothing more that lastest-svn-build-user. Just needed to get better ASS subs support (e.g. unsorted subtitles).

Edit: Actually made your script work by doing this:
1) Placed anidb.net.xml to "addons/net.anidb.scraper" folder
2) Copy-pasted description.xml from another scraper to the same folder and edited it a bit:

Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<addoninfo>
  <id>net.anidb.scraper</id>
  <type>scraper</type>
  <title>AniDB.net</title>
  <library>anidb.net.xml</library>
  <version>1.0.0</version>
  <platforms>
    <platform>all</platform>
  </platforms>
  <minversion>
    <xbmc>20000</xbmc>
  </minversion>
  <summary>AniDB.net Scraper Library</summary>
  <description>some desription</description>
  <author>bambi73</author>
  <supportedcontent>
    <content>movies</content>
  </supportedcontent>
</addoninfo>

That's all, it actually works. I will give some feedback about usability and search results quality soon. Smile
http://dl.dropbox.com/u/459039/Screens/s...hot001.png
http://dl.dropbox.com/u/459039/Screens/s...hot002.png
http://dl.dropbox.com/u/459039/Screens/s...hot003.png
(This post was last modified: 2010-05-03 13:18 by Zarbis.)
find quote
Post Reply