[WIP] AniDB.net Anime Video Scraper

  Thread Rating:
  • 3 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
calico Offline
Junior Member
Posts: 2
Joined: Feb 2011
Reputation: 0
Post: #181
I just thought to share my tip to getting 446 out of 451 series to lookup correct, 3 didn't work right (stuff like Hagane no Renkinjutsushi vs Hagane no Renkinjutsushi (2009)), and 2 that didn't lookup and would crash xmbc any time I did lookup it (I have a workaround however).

1. My folder setup is thus...

/Anime
/series 1
/series 2
etc

I am using the Main Title as given by Anidb, so Hagane no Renkinjutsushi vs FullMetal Alchemist and so on. I have almost NO lookup misses, and I can deal with fixing 5 series out of 451.

2. Episodes using Absolute numbering (which is how anime likes it)

I use the Anidb client to rename all files to the default in AOM.

like so...
Hagane no Renkinjutsushi - 02 - Body of the Sanctioned [Keep-ANBU](32446438)[AniDB].avi

and the key part... adding regex to advancedsettings.xml (which goes in your userdata directory (c:\users\userid\appdata\roaming\xmbc\userdata on my Windows 7 box)

<advancedsettings>
<tvshowmatching action="prepend">
<regexp>(?:[\ _-]{2,3})(\d{1,3})</regexp>
</tvshowmatching>
</advancedsettings>

It's actually a really simple regex that so far works flawlessly. Basically I match any combination of a space a _ or a - from 2 to 3 of them(which I don't capture as part of the regex, but is used for matching), followed by between 1 and 3 digits. End result is I will match between 1 and 3 digits as long as they follow the anidb default naming convention.

Note: This doesn't work for specials, but I gather that's an XMBC thing so for specials I rename them to the default XBMC way of S00Exx as specials belong in Season 00

Now I don't have any "normal" TV shows (I mainly want XMBC for Movies & Anime), so I haven't tested it with normal stuff...

I have also experienced the dreaded (and sometimes frequent) XBMC crash on lookup, which I have a workaround that has worked 100% for me, but I've not had the time to test it more then a few times. I'll post after I add the next batch of series and try looking them up.
find quote
pathw Offline
Junior Member
Posts: 28
Joined: Feb 2011
Reputation: 0
Post: #182
Hi, this scraper seems to be working for me. But how do people deal with movies and specials?

with movies anidb's episode number data will not be very useful.

and anidb's xml doesnt seem to contain specials.
find quote
salival Offline
Member
Posts: 65
Joined: May 2010
Reputation: 0
Post: #183
pathw Wrote:Hi, this scraper seems to be working for me. But how do people deal with movies and specials?

with movies anidb's episode number data will not be very useful.

and anidb's xml doesnt seem to contain specials.

There are a number of options for the movies.
The first is to use the anidb scraper, in which case you don't get any info from the tvdb, since it's a movie.
The second is to set the content of the folder to "movies". It will not show up in tv shows. You will have fanart from the moviedb.
You can link movies and tv-shows, although I don't know the specifics.

Specials need to be recognized as season 0. The easiest way to accomplish this is to put "S00E##", where ## is the specials episode number. e.g. special 01 becomes S00E01, special 2 becomes S00E02.
find quote
pathw Offline
Junior Member
Posts: 28
Joined: Feb 2011
Reputation: 0
Post: #184
hi
I am using the anidb scraper. But with movies it fails. Something to do with the episode numbering. Hence my question. Similarly with debugging on when i see the xml spit out from anidb, I dont see the specials.
find quote
salival Offline
Member
Posts: 65
Joined: May 2010
Reputation: 0
Post: #185
For movies, with this scraper, you should put in "S01E01" in the filename, or depending on your advancedsettings.xml, something like "ep1". This way the scraper thinks it's episode 1 and fetches the info. This all has to do with how anidb.net is structured and how XBMC handles movies versus tv-shows.

The specials should work if you put S00E01 etc. in the file name. Movies aren't considered specials though.
find quote
pathw Offline
Junior Member
Posts: 28
Joined: Feb 2011
Reputation: 0
Post: #186
ah I see.

Ive been using anidb clients to name my files for a while. So I changed the scraper to accommodate this (and its a better naming convention imho). The advantage is, I can be sure that my advanced settings regex is precise.

so under GetEpisodeList Ive made 2 changes

Quote: <RegExp input="$$1" output="&lt;episode&gt;&lt;title&gt;\4&lt;/title&gt;&lt;url cache=&quot;$$20.xml&quot;&gt;\1&lt;/url&gt;&lt;epnum&gt;\2&lt;/epnum&gt;&lt;season&gt;1&lt;/season&gt;&lt;id&gt;\1&lt;/id&gt;&lt;aired&gt;\3&lt;/aired&gt;&lt;/episode&gt;" dest="8">
<expression clear="yes" repeat="yes">(?i)&lt;episode\s+id=&quot;(\d+)&quot;[^&gt;]*&gt;\s*&lt;epno&gt;([A-Z]?\d+)&lt;/epno&gt;\s*(?:&lt;length&gt;[^&lt;]*&lt;/length&gt;\s*)?(?:&lt;airdate&gt;([^&lt;]+)&lt;/airdate&gt;\s*)?(?:&lt;rating[^&gt;]*&gt;[^&lt;]*&lt;/rating&gt;\s*)?(?:&lt;title[^&gt;]*&gt;[^&lt;]*&lt;/title&gt;\s*)*?&lt;title xml:lang=&quot;en&quot;&gt;(?!(?:Complete\sMovie|Part \d+ of \d+)&ltWink([^&lt;]+)&lt;/title&gt;.*?&lt;/episode&gt;</expression>
</RegExp>
<RegExp input="$$1" output="&lt;episode&gt;&lt;title&gt;\3&lt;/title&gt;&lt;url cache=&quot;$$20.xml&quot;&gt;\1&lt;/url&gt;&lt;epnum&gt;\3&lt;/epnum&gt;&lt;season&gt;1&lt;/season&gt;&lt;id&gt;\1&lt;/id&gt;&lt;aired&gt;\2&lt;/aired&gt;&lt;/episode&gt;" dest="8+">
<expression repeat="yes">(?i)&lt;episode\s+id=&quot;(\d+)&quot;[^&gt;]*&gt;\s*&lt;epno&gt;[A-Z]?\d+&lt;/epno&gt;\s*(?:&lt;length&gt;[^&lt;]*&lt;/length&gt;\s*)?(?:&lt;airdate&gt;([^&lt;]+)&lt;/airdate&gt;\s*)?(?:&lt;rating[^&gt;]*&gt;[^&lt;]*&lt;/rating&gt;\s*)?(?:&lt;title[^&gt;]*&gt;[^&lt;]*&lt;/title&gt;\s*)*?&lt;title xml:lang=&quot;en&quot;&gt;(Complete\sMovie|Part \d+ of \d+)&lt;/title&gt;.*?&lt;/episode&gt;</expression>
</RegExp>

I still have some bugs though.

I've got an anime called A.LI.CE http://anidb.net/perl-bin/animedb.pl?sho...e&aid=4448,
but this gets identified as http://anidb.net/perl-bin/animedb.pl?sho...e&aid=4448

similarly akikan (2009) (crappy anime) http://anidb.net/perl-bin/animedb.pl?sho...e&aid=7132 gets identified as http://anidb.net/perl-bin/animedb.pl?sho...e&aid=6182

so I'll look into that bit to see if it can be made more accurate. But any pointers would be appreciated Smile
(This post was last modified: 2011-02-19 19:14 by pathw.)
find quote
salival Offline
Member
Posts: 65
Joined: May 2010
Reputation: 0
Post: #187
Most of the time this happens you should try to refresh the info of the show (just go the info screen and choose refresh). You will get a list of possible matches where you can choose the correct one.

It also can help to name the containing folder exactly as the series is called on anidb.net.
find quote
pathw Offline
Junior Member
Posts: 28
Joined: Feb 2011
Reputation: 0
Post: #188
since I have a lot of shows, I'd like to avoid false positives. Im thinking of using the nfourl functionality. How do I name my nfo files?
find quote
pathw Offline
Junior Member
Posts: 28
Joined: Feb 2011
Reputation: 0
Post: #189
nm I found it. tvshow.nfo

I wrote a script to generate this for my folders, so most of my problems are gone. Smile

If this script is useful to people I can share it. Also I made changes to anidb.xml that works with anidb's naming convention for specials.

I'm left with only one thing at the moment.

I modified the getEpisodeList in anidb.xml to allow non numeric episode numbers. For movies it generates something like
<epnum>Complete Movie</epnum>
or <epnum>Part 1 of 5</epnum>

my advancedsettings.xml finds the correct name on my file. but for some reason a movie that is in multiple parts is named as Complete Movie. I'll try to get the portion of the debug log and post it here.
find quote
Finalspace Offline
Junior Member
Posts: 43
Joined: Jan 2011
Reputation: 0
Post: #190
I tried the anidb scrapper as well to detect my renamed files.
All files are renamed in the following format:

Code:
anime/[group] anime - ep01 [optional crc].fileext
anime/[group] anime - ep01-02 [optional crc].fileext
anime/[group] anime - ep01-03 [optional crc].fileext
anime/[group] anime - ep01v2 [optional crc].fileext
anime/[group] anime - epS1 [optional crc].fileext

All anime titles are the same as the anidb romaji anime title.

I used the following regex mask only in my advanced settings (prepend):

Code:
[/\._ \-]()ep([S0-9]+)(-[0-9]+)?v?V?[1-5]?

After scanning, it detected 80% of all series, but there were several mismatches which resulted in a wrong fanart/banner and wrong episode informations and no specials was detected :-(

What is definitily weird, that some folders are not processed by xbmc, even its not hidden and does not contains special characters.
Its simply skipped by the "automatic content detection", but why?

If i manually "Detect for new content" for the selected folder, the selected anime is definitily processed and the episodes will be detected.

Btw: XBMC crashes after a few hours often (system reset requires).

-----------------

Would that be the better(perfect) way to generate the .nfo files for all anime folders? So that only the "banners/fanart" will be retrieved??

If yes, i will make a application in python which will be doing this using anidb UDP API :-)
find quote
pathw Offline
Junior Member
Posts: 28
Joined: Feb 2011
Reputation: 0
Post: #191
I already wrote a small script to generate the nfo files using the udp api, so I can share that Smile.

I have 100% correct results on anime detection at the moment. Right now the only problems I have are specials and movie parts because xbmc only allows numeric episode numbers, so it doesnt support anidb's format. I've filed a report for that, but I have no idea if it will be picked up.


The other problem is fanart come's from tvdb, so it's sometimes wrong. The mediaInfo 2 view seems to use fanart instead of scaling the images from anidb :/
find quote
Finalspace Offline
Junior Member
Posts: 43
Joined: Jan 2011
Reputation: 0
Post: #192
pathw Wrote:I already wrote a small script to generate the nfo files using the udp api, so I can share that Smile.
:/

That would be great if you can share your script =)

I checked the documentations for the tvshow.nfo at it seems that every episode filename must contain a .nfo also which contains the actual episode infos... is that correct?

I thought that all episode infos are stored in a single xml/nfo file :-(
find quote
pathw Offline
Junior Member
Posts: 28
Joined: Feb 2011
Reputation: 0
Post: #193
so for tv shows there is one nfo for the filename and one nfo for each episode. But this doesnt matter. As long as your files are named correctly, the problem you have with it not recognising episodes has nothing to do with the scraper. It's because of an issue in xbmc.

I have filed a ticket which hopefully will be fixed. Otherwise the only way to fix it is to number specials as Season 0 Episode (special number). And this only takes into account specials which start with S. Not Trailers, or Openings and such.

Its possible to change the tracker to take all these files into account without the xbmc bug being fixed. but it will require renaming all specials according to some retarded convention that we will have to come up with. Which will suck, but will atleast work I guess :/.

I hope the bug gets fixed though. I'm not keen on renaming my files.

I'll upload my script to github today Smile.
Edit: Here you go. https://github.com/pathsny/anidb-ruby
(This post was last modified: 2011-02-26 20:51 by pathw.)
find quote
Finalspace Offline
Junior Member
Posts: 43
Joined: Jan 2011
Reputation: 0
Post: #194
pathw Wrote:so for tv shows there is one nfo for the filename and one nfo for each episode. But this doesnt matter. As long as your files are named correctly, the problem you have with it not recognising episodes has nothing to do with the scraper. It's because of an issue in xbmc.

I have filed a ticket which hopefully will be fixed. Otherwise the only way to fix it is to number specials as Season 0 Episode (special number). And this only takes into account specials which start with S. Not Trailers, or Openings and such.

Its possible to change the tracker to take all these files into account without the xbmc bug being fixed. but it will require renaming all specials according to some retarded convention that we will have to come up with. Which will suck, but will atleast work I guess :/.

I hope the bug gets fixed though. I'm not keen on renaming my files.

I'll upload my script to github today Smile.
Edit: Here you go. https://github.com/pathsny/anidb-ruby

I checked out your script and looked over the code... but i dont think this script would help me.

Because i have an issue with xbmc that not all tv series folder are used for the scrapper. Some folders are simply skipped and if i "detect" this folders manually it works. Almost every folder which are detected manually or automatic seems to be correct, with some exceptions like anime "Burn up" which are not detected as an anime :-( There seems to be a US TV show called burn up and this is returned from TVDB.

Of course the specials does not appear, because all special files are renamed to epS1, epS2... and you have already tracked a bug for this and we can simply go to the files folders from within xbmc and start the special from the non-database mode, so i think i can live with that for now.

The issue with movies, it seems i dont have that problem.
All my movies are renamed as ep1-x and for movies which contains several parts it starts with ep2 instead of ep1, because animes mostly have a complete version of that movie (ep1) and parted movies (ep2-x).

This is the way i got the data from the anidb udp api.

Sometimes i got a C1, or T1 as episodes names and does are not detected as well. I think T stands for trailers but i dont know what C stands for.

But, thanks for sharing =)


Some tip for your rubi anidb client. Add some local caching to your application. Files which are already renamed and all infos was scanned should be saved in a database like system and skipped if not forced for rescan.
(This post was last modified: 2011-02-27 18:35 by Finalspace.)
find quote
pathw Offline
Junior Member
Posts: 28
Joined: Feb 2011
Reputation: 0
Post: #195
Finalspace Wrote:I checked out your script and looked over the code... but i dont think this script would help me.

Because i have an issue with xbmc that not all tv series folder are used for the scrapper. Some folders are simply skipped and if i "detect" this folders manually it works. Almost every folder which are detected manually or automatic seems to be correct, with some exceptions like anime "Burn up" which are not detected as an anime :-( There seems to be a US TV show called burn up and this is returned from TVDB.


if you have the nfo files, it uses the nfo files for lookup. So I'm surprised you have this problem. I dont.

Finalspace Wrote:Of course the specials does not appear, because all special files are renamed to epS1, epS2... and you have already tracked a bug for this and we can simply go to the files folders from within xbmc and start the special from the non-database mode, so i think i can live with that for now.

The issue with movies, it seems i dont have that problem.
All my movies are renamed as ep1-x and for movies which contains several parts it starts with ep2 instead of ep1, because animes mostly have a complete version of that movie (ep1) and parted movies (ep2-x).

This is the way i got the data from the anidb udp api.

oh ok. For movies, I used the "name" to rename them from the udp api.


Finalspace Wrote:Some tip for your rubi anidb client. Add some local caching to your application. Files which are already renamed and all infos was scanned should be saved in a database like system and skipped if not forced for rescan.

hehe. The reason I havent done caching is my script moves files that are recognised. So I keep all identified files in a separate physical location Smile.
find quote
Post Reply