[RELEASE] [MOD] AniDB.net scrapers for TV shows and Movies

  Thread Rating:
  • 6 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #1
AniDB.net Scraper Mods for Anime TV shows and Movies

[Image: icon.png]

So here finally are my mods of the AniDB.net scraper.

Installation
You can download and install both of them through my new repo:
repository.scudlee
Note: If you only downloaded 2.0.0rc1, You need to download the repo to get the latest version

AniDB.net Scraper Mod for Anime TV shows
Current version: 2.1.0
There are several major improvements to the TV show scraper:

New anime-list.xml
The scraper now uses my updated anime-list.xml by default. This list is significantly more complete the the old list, and being actively maintained. (Help very much welcome!)

Movie fanart support
Using the new anime-list.xml, the scraper can now retrieve movie fanart directly from themoviedb.org. Movies linked to TV shows will fetch both movie and TV fanart, prioritizing movie.
(Of course, this is intended more for the movie scraper, but if you want to stick with using one scraper, it also works here.)

Search improvements
Several tweaks to the search:
Shows distinguished by a year in brackets (e.g. Bakuman. (2012)) now match correctly.
Shows distinguished by a final apostrophe (i.e. Gintama' and Dog Days') now match correctly.
Some punctuation marks were preventing titles being scraped correctly in Frodo builds, this has been fixed.
Google search works again, should you want to use it.

Support for OPs/EDs, trailers, etc.
OPs/EDs, Trailers, Parodies, Other are now treated as high-valued specials:
OPs/EDs "C": Season 0 Episodes 101-199 (e.g. S00E101)
Trailers "T": Season 0 Episodes 201-299
Parodies "P": Season 0 Episodes 301-399
Other "O": Season 0 Episodes 401-499
These "specials" will always be mapped to the end of the episode list, independent of your settings. I can change this if need be.
Note: This is only fully supported in Frodo builds. Eden will be slightly unpredictable.

Cast handling changes
Cast without a picture in AniDB are now included.
(Note: None of the following currently works because of changes in the xml returned by AniDB (episode data is no longer provided for the cast). The old scraper no longer works either, so both scrapers will appear to behave identically in this regard. I've included the change because if there's a way to get it working, they will be different.)
In XBMC, actors can be added to either TV shows or individual episodes. In the tvdb scraper, Main cast are only added to the TV Show, and guest stars to the episodes they appear in. In AniDB, the cast is split into three, Main, Secondary, and "appears in" (i.e guest stars). The old scraper would ignore the "appears in" cast completely, and add the main and secondary cast to both the TV show and the episodes they appear in (with an option to also ignore secondary cast). The Mod scraper follows the tvdb model: Main only to the show, "appears in" only to episodes, with an option to group Secondary with Main or "appears in".

1.1.0 Features
See Post#10

2.0.0 Features
See Post#270

2.1.0 Features
See Post#292

Planned improvements include: Support for the new thumb aspect feature (added 1.1.0), bugfix for empty plots returning a tag description (I have a fix but it's ugly) (added 1.1.0),... Suggestions welcome.


AniDB.net Scraper Mod for Anime Movies
Current version: 2.1.0
This is a rather crude first pass at a movie scraper. It basically treats any title in AniDB as a movie, ignoring any episode details, so it's not suitable for use with movie series like Break Blade, Kara no Kyoukai, or Mardock Scramble.

There was one slight problem, the movie scraper is essentially the TV scraper with the episode parts chopped off, and without them, it's a little fast. Too fast for AniDB's liking - you'd get banned after only a few titles. To combat this, I've added a delay loop to the scraper, in which the scraper idly runs through some increasingly (and then decreasingly) time-consuming regexps. The amount that it does is controlled by a delay parameter in the settings. The default value (125) produced for me a, let's say, sedate pace. You can decrease it if you want, but if you get banned, you're on your own! Although I would be interested in finding a reasonable sweet spot, if people want to report their findings with lower values.

1.1.0 Features
See Post#10

2.0.0 Features
See Post#270

2.0.0 Features
See Post#292

Planned improvements include: Support for scraping more information (Trailers/certification added 2.0.0)/posters from themoviedb.org (added 1.1.0) (currently only the fanart is), support for movie sets (either scraped from themoviedb.org (added 1.1.0) or by adding them to anime-list.xml, or both (added 2.0.0)),... Suggestions welcome.
Changelogs:

metadata.tvshows.anidb.net.mod
2.1.0
Added: Original Work and Location tagging
Added: Ability to use theTVDB.com Absolute order from mapping list

2.0.1
Version bump to update over 2.0.0rc1

2.0.0
Changed: GetDetails broken into separate shared functions
Changed: Genre handling rewritten
Changed: Simplified artwork handling
Changed: Google search rewritten
Added: Delay parameter to slow scraping
Added: Season artwork support
Fixed: Episode titles with an apostrophe followed by a space would lose the space

1.1.1:
Changed: Dropped the www from thetvdb URLs

1.1.0:
Changed: Simplified genre count code
Changed: Prioritised defaulttvdbseason wide banners
Changed: Now uses displayafterseason/displaybeforeseason
Added: Support for tagging
Added: Movie posters from themoviedb.org
Added: Support for thumb aspects
Fixed: Empty plot description would result in a category description used instead

1.0.0:
(Changes from official anidb.net scraper 2.0.0)
Changed: Default locations of anime lists
Changed: Handling of Main/Secondary/"Appears in" cast altered to better match tvdb scraper
Added: Support for treating OPs/EDs, trailers, etc. as specials (requires Frodo)
Added: Support for retrieving fanart from themoviedb.org
Added: Support for shows distinguished by a year in brackets
Fixed: Cast without pictures were being ignored
Fixed: Shows distinguished by a final apostrophe now match correctly (e.g. Gintama')
Fixed: Some punctuation marks are no longer being percent-encoded
Fixed: Google search results parsed correctly again

metadata.movies.anidb.net.mod
2.1.0
Added: Original Work and Location tagging

2.0.1
Version bump to update over 2.0.0rc1

2.0.0
Changed: GetDetails broken into separate shared functions
Changed: Genre handling rewritten
Changed: Simplified artwork handling
Changed: Google search rewritten
Added: Movie trailers from themoviedb.org
Added: Certification from themoviedb.org
Added: Movie sets from anime-movieset-list.xml

1.1.1:
Changed: Dropped the www from thetvdb URLs

1.1.0:
Changed: Simplified genre count code
Changed: Switched alternate id to imdb/tmdb
Added: Support for movie sets from themoviedb.org
Added: Support for tagging
Added: Movie posters from themoviedb.org
Added: Support for thumb aspects
Fixed: Empty plot description would result in a category description used instead
Removed: No wide banners for movies

1.0.0:
Initial Commit
To compliment the new scrapers, I also adapted and expanded the various AniDB Client TagSystem rules that were posted in the previous thread.

These rules will separate movies and one-shot OVAs into a movie folder, to be scraped by the movie scraper. They will also renumber OPs/EDs, trailers, etc. to be picked up the TV scraper.

AniDB Client TagSystem Rules
Version 1.1.0
(Full version) http://pastebin.com/raw.php?i=MkswMaME
(Unformatted) http://pastebin.com/raw.php?i=9dgzsZKB
The Full version is too large to save in the AniDB client, but contains explanations and alternatives for each step, so only use it to modify the Unformatted version to your preferences.

To use these rules you'll need to have an account on AniDB, and then go to the AniDB Client and in the options press "Go Advanced" and then enable Filemoving and Filerenaming, using the Tagging System for both. Then edit the Tagging System, deleting the original content and pasting in the contents of the link above.

You'll first need to edit the BaseTVShowPath and BaseMoviePath variables to point to their respective folders, and you can also edit the FileInfo variable to however you like it.

Default file names will look like:
Code:
Z:\Anime\TV Shows\Hyouka\Hyouka - 01v2 - The Esteemed Classics Club Has Been Restored [Mazui][HDTV][1280x720][h264][F2BB20F65].mkv
Z:\Anime\TV Shows\Akazukin Chacha\Akazukin Chacha - S101 - Opening 1 [GCP][640x480][h264][7AA0FBDB].avi
Z:\Anime\Movies\Gekijouban Macross F Itsuwari no Utahime\Gekijouban Macross F Itsuwari no Utahime [Doki][BluRay][1920x1080][h264][4C96D537].mkv

In order to use the default episode numbering, I recommend the following tvshowmatching rules in your advancedsettings.xml:
Code:
<advancedsettings>
  <tvshowmatching action="prepend">
    <regexp> - ()(\d+)((?:-\d+)*)(?:v\d+)? - [^\\/]*$</regexp>
    <regexp defaultseason="0"> - ()s(\d+)((?:-\d+)*)(?:v\d+)? - [^\\/]*$</regexp>
  </tvshowmatching>
</advancedsettings>
The defaultseason regexp will only work in Frodo builds. If you're still on Eden, you'll need to remove it and also un-comment one of the alternate "Special" variables in the rules.
The " - " in the regexps is intended to match the Separator variable in the rules, if you're running into conflicts on non-anime files, you can change these to something unique.

Known issues:
Shows only distinguished by an added question mark in the title will be placed in separate folders (the "?" is replaced by an "_" ), but will still be scraped as the same show. You'll need to manually refresh the second folder.
...It might only be Shinryaku!? Ika Musume this applies to, so if you don't have that, you can probably safely comment out the line that replaces the question marks, if you'd rather them just be removed.

Suzumiya Haruhi no Yuuutsu (2009) episode numbering is... funky. It includes the 2006 episode numbers, and the filenames will need to be manually fixed.
(This post was last modified: 2013-09-02 20:23 by scudlee.)
find quote
Vaneska Offline
Member
Posts: 70
Joined: Dec 2011
Reputation: 3
Location: Switzerland
Post: #2
Great work, scudlee! Highly appreciated.

------------------

One thing, Gintama` wasn't being scraped (only the Episodes). In the anime-list it's written as "Gintama ' " (apostrophe), should be a " Gintama` " (Grave accent)? Big Grin

Everything else, scraped perfectly.
(This post was last modified: 2012-10-15 22:39 by Vaneska.)
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #3
The ` is used as an apostrophe all over AniDB - the title really is meant to be Gintama' not Gintama`. Most AniDB clients will automatically replace any `s with an apostrophe when renaming (as they should).

Unfortunately XBMC would consider "Gintama" to be a better fit to "Gintama' " than "Gintama` ", so for the mod, I had the `s in the search results replaced.
find quote
Vaneska Offline
Member
Posts: 70
Joined: Dec 2011
Reputation: 3
Location: Switzerland
Post: #4
Ah, you nasty AniDB.

Then i will have to use the replace ` in the TagSystem. Thx
(This post was last modified: 2012-10-15 23:02 by Vaneska.)
find quote
Vaneska Offline
Member
Posts: 70
Joined: Dec 2011
Reputation: 3
Location: Switzerland
Post: #5
Just some cosmetic stuff. Some Specials are getting showed in XBMC as Season 01, some as Season 02.

For example "Durarara!!"

Sx01 Extra Episode, 12.5: Heaven's ... Season 01 / Episode S1
Sx02 Extra Episode, 25: Peace Reigns Over the land ... Season 02 / Episode S2

Wolf's Rain, Sx01-04 / Season 02

Is there a way to show those Specials as Season 01 too?
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #6
(2012-10-16 10:45)Vaneska Wrote:  Just some cosmetic stuff. Some Specials are getting showed in XBMC as Season 01, some as Season 02.

For example "Durarara!!"

Sx01 Extra Episode, 12.5: Heaven's ... Season 01 / Episode S1
Sx02 Extra Episode, 25: Peace Reigns Over the land ... Season 02 / Episode S2

Wolf's Rain, Sx01-04 / Season 02

Is there a way to show those Specials as Season 01 too?

Might be possible. The reason for this is that in order to get the specials to show at the end of the episode list the scraper adds <displayseason>2</displayseason> to the episode details (and to get them before, it uses <displayseason>0<displayseason>). It should however be possible to switch them to use <displayafterseason>1</displayafterseason> (and <displaybeforeseason>1</displaybeforeseason> respectively). Not sure if there'll be any knock on effects from that, though. I'll have a play.
Actually, a simpler method might be just to add a <before>;2-25</before> to the anime-list.xml (for Durarara).

If I push that change, can you test it?
(This post was last modified: 2012-10-16 11:58 by scudlee.)
find quote
Vaneska Offline
Member
Posts: 70
Joined: Dec 2011
Reputation: 3
Location: Switzerland
Post: #7
I think most specials are actually where they belong (sort by Episodes). Like Durarara!! Sx01 is between 12 and 13. That is really great. Sx02 is at the end, but thats ofc correct too.
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #8
It's a minor thing, but I also think it would be better if they were listed as season 1.

I've tested both my suggested fixes and they both work, I'll push the <before>'s to the anime-list.xml - That's the right thing to do anyway, because that will force the specials to their correct place as long as you have the "specials inside" option set, regardless of the "specials at the end" option. The displayafter fix will come later, with some other updates. I might still leave the OPs/EDs etc. as displayseason 2, see if that keeps them a little more separate...
find quote
Ned Scott Offline
Team-Kodi Wiki Guy
Posts: 21,215
Joined: Jan 2011
Reputation: 276
Location: Arizona, USA
Post: #9
Very awesome!

You can make easy links to the XBMC wiki using double brackets around common XBMC words: [[debug log]] = debug log, [[Video library]] = Video library, [[SMB]] = SMB , [[userdata]] = userdata, etc
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #10
New features in 1.1.0

Thumbnail improvements
The scraper now supports the thumb "aspect" attribute, which allows you to use posters and wide banners simultaneously. Requires a fairly recent frodo build, and a skin that supports it.
In order to take advantage of this you will need to set "enable banners" to true in the settings. I've switched the default to true, but for people who've already downloaded, you'll need to enable it manually ("Posters/Banners order" will be irrelevant). When Frodo is released, I'll likely remove the option entirely.
Posters are now also fetched from the themoviedb.org.
Thumb priority is:
Posters
  1. AniDB poster
  2. themoviedb.org poster(s)
  3. theTVDB.org series poster(s)
  4. theTVDB.org season poster(s)

Banners (TV show scraper only)
  1. theTVDB.org Season-specific banner(s) (based on defaulttvdbseason in the anime-list)
  2. theTVDB.org Series banner(s)
  3. theTVDB.org Remaining seasons banners

Tag support
It's now possible to add "tags" directly from the scraper (again, requires a very recent Frodo build).
This comes in two flavours, first you can add a specific tag (or set of tags) to every title you scrape, the default is simply "Anime", but you could for example use "Anime;Animation", or even "Anime;Japanese;Animation", whatever tags you like - just separate them with a semi-colon and they'll each be added.

The second kind are direct from AniDB itself - have a look at the top 500 tags to give yourself an idea of the sort of thing you'd be in for if you enable this.
Tags on AniDB can be voted on and the "approval" rating is part of the scraped xml, so I've added an option to set the minimum approval for the tags to be used (1-20, default 10), and also an option for including tags marked as spoilers (off by default).

For example, the movie Akira currently has these (non-spoiler) tags (approval in brackets):
Quote:classic (48), awesome animation (32), social commentary (32), Tokyo destroyed (24), call my name (16), a god am I (13), cult (12), great soundtrack (12), ESP (11), civil war setting (8), high definition (8), visually stunning (8), must see classic (7), epic (6), stand-alone movie (6), battle of wits (5), futuristic motorcycle (4), power corrupts (4), adaptation distillation (3), childhood friend (3), strong male lead (3), rivalry (2).
Meaning the default minimum of 10 would result in the nine tags classic through ESP being added. Including spoilers would increase this to eleven.

A word of caution, tags in XBMC are allowed to be empty, which means if you just want to test out the tags, you might want to set the minimum up high, because even if you turn off the setting and rescrape, the now-empty tags will still remain and have to be manually deleted one-by-one.

Movie set support
Movie sets are now retrieved from themoviedb.org (Movie scraper only). On by default.
Still deciding on the best way to add missing sets to the anime-list.xml (plus I'll need to go through and add them before I add the feature).

Misc.
You can now store the imdb/tmdb id for movies (may or may not help the extra fanart addon, not sure.)
The settings have been rearranged slightly to accomodate some of these features, but it doesn't affect anything.
There were also a couple of fixes and tweaks that don't affect much of anything.

See the changelogs in the first post for all changes.
find quote
Myrddraal Offline
Senior Member
Posts: 101
Joined: Apr 2012
Reputation: 1
Post: #11
Thanks for the scraper mod. It is great.

I am having an issue using the Frodo nightlies. When scrapping a tv show, it returns an empty screen, so the show is not added to the library.
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #12
Based on this comment, I have a feeling this is the same issue. Hopefully it'll be fixed quickly.

Edit: There is patch for the above issue, and I just did a test build with it applied, and it did indeed fix the problem, so hang tight.
(This post was last modified: 2012-10-26 17:20 by scudlee.)
find quote
saitoh183 Online
Posting Freak
Posts: 959
Joined: Jul 2011
Reputation: 15
Location: Canada
Post: #13
I just wanted to know, is there an advantage of using your scraper vs the original if its only for tv shows (use imdb for anime movies since i dont have many)

[Image: watched-clearlogo.jpg]

If my replies help you, please click on my reputation [Image: rep_xbmc.JPG] below :) thanks :)
find quote
scudlee Offline
Team-Kodi Member
Posts: 839
Joined: Jul 2011
Reputation: 51
Post: #14
Even not ignoring the movie stuff, I would say by far the biggest advantage over the original is the improvements to the search. Just having the scraper able to correctly match a show distinguished only by a year without manual intervention is a big plus. And if you're running any recent pre-Frodo build, there's an issue with some punctuation marks that will mess up some matches in the old scraper.

After that, the new anime-list.xml would probably be the next big advantage, but since you can technically use that in the old scraper too, the advantage is only really not having to type in the long address in the settings.

Being able to add OPs/EDs, etc. to the library, being able to add tags, being able to use banners and posters simultaneously, as new features it really depends on whether you'd use them as to whether you'd consider them pluses.


Ironically, the big thing I'm currently working on is just tidying up and reorganizing the internal code, with probably zero external changes, so there might literally be no apparent advantage between the current version and the next!

(...Well, I might sneak in some movie trailer support...)
find quote
saitoh183 Online
Posting Freak
Posts: 959
Joined: Jul 2011
Reputation: 15
Location: Canada
Post: #15
cool thanks Scudlee. I dont use Frodo since just dont want to be updating constantly.(Wife will get on my case if i would get to a build that is buggy) so im sticking with Eden for now. The matching of shows without manual intervention still applies to Eden users and do we need to have year in naming convention?

[Image: watched-clearlogo.jpg]

If my replies help you, please click on my reputation [Image: rep_xbmc.JPG] below :) thanks :)
find quote
Post Reply