Anime News Network Scraper (Release?)

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
volforto Offline
Junior Member
Posts: 17
Joined: Jan 2010
Reputation: 5
Post: #1
I have been working on a scraper for Anime News Network. Initially I was going to use Google for the searching, since ANN already uses it for its search. However, I learned from this bing thread that Google does not allow scraping. So I am using Bing instead, using the AppID from that same thread. I am not too sure what is ANN's policy for scraping, but it seems they don't mind (from this thread 5 years ago).

TV Shows: v0.46 download (xml+jpg)
Movies: v0.12 download (xml+jpg)

Settings for TV Shows scraper:

Enable All Language Casts
Retrieve other language voice actors in addition to Japanese.

Enable Unlisted Specials / 1 Episode OVA Workaround
Allow the same amount of special episodes as normal season 1 episodes. So if the series has 26 normal episodes, then you can also include 26 specials (season 0). These episodes will not have any title and will be named "Special Episode" with "Special" air date. This workaround also allow you to include OVA with a single episode (eg. Hoshi no Koe) which ANN will not include episode listing. Just name it 0x01.

Enable TVDB Fanart
Retrieve fanarts from TVDB using the main title from ANN. Matches with the same premiere date as the one listed on ANN will be preferred.

Include Alternative Titles in Fanart Search
In addition to the main title, also search with all the alternative titles listed on ANN.

Enable TVDB Banner (With ANN Thumbnail Fallback)
Get banners from ANN using the main title.

Enable TVDB Poster (With ANN Thumbnail Fallback)
Or get posters.

Enable TVDB Episode Details (Using Episode Title Matching)
Retrieve episodes overview and other details from TVDB. The matches are done by comparing the episode title (rather than episode number).

Movies scraper has some of the same settings with TMDB, but TMDB's search doesn't function so well, so fanart search will fail more often.

UPDATES:
2010/03/26:
TV: Adapted scraper to ANN's new html.
TV: Fixed a bug where the scraper tries to get fanart no matter the setting.
MOVIE: Fixed a small bug introduced in v0.11.
2010/03/17:
Recovering from the missing information from old forum backup.
-----------
2010/02/08 - 2010/03/16:
Bunch of changes during this period.
-----------
2010/02/08:
Changed method for ANN thumbnail fallback (to fix a possible bug)
Changed fanart code to include fanarts from all alternative names, instead of just the first one with fanarts
Fixed some exception cases for the voice actors scraping
2010/02/04:
Updated scraper to work with ANN's new html for the casts section
(This post was last modified: 2010-03-27 05:49 by volforto.)
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,234
Joined: Nov 2003
Reputation: 82
Post: #2
Smile)

maybe we can finally shut you anime fans up ;P

as for 2), i don't see how a bunch of functions would be required. this should do it

Code:
<GetSearchResults>
<url cache="something">..</url>
        ^^ to avoid fetching the same page more than once
<GetDetails>
...
<url function="gettvdbthumb">searchforthumb</url>
</GetDetails>

<gettvdbthumb>
  <RegExp ..><expression>matchthethumb?</expression></RegExp>
  conditionally push <url function="getannthumb" cache="something">somerandomcrap,cachewilloverride</url>
</gettvdbthumb>

<getannthumb>
<details><thumb>..</thumb></details>
</getannthumb>

get my drift?

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
(This post was last modified: 2010-02-03 11:56 by spiff.)
find quote
volforto Offline
Junior Member
Posts: 17
Joined: Jan 2010
Reputation: 5
Post: #3
Thanks spiff! I didn't know cache can be used in such a way. I have added the functionality to the scraper and updated the first post. Now there will always be a thumbnail on any anime series.

With eldon's release of his anidb scraper, there should be more options for scraping anime now.
find quote
volforto Offline
Junior Member
Posts: 17
Joined: Jan 2010
Reputation: 5
Post: #4
Added the functionality to also search TVDB fanart using the alternative titles listed on ANN. I found I needed this function for some of the series.

I think this is it in terms of adding functionality (as the most important part for me is getting fanart/banner/thumbnail). Maybe others can improve it if they found a bug or want to add something Smile
find quote
nuclearsunshine Offline
Junior Member
Posts: 2
Joined: Feb 2010
Reputation: 0
Post: #5
I just tried this and it lumped all the episodes for all my anime under the first show the scraper identified.
find quote
volforto Offline
Junior Member
Posts: 17
Joined: Jan 2010
Reputation: 5
Post: #6
Really? That's odd. I am not getting that. Does it work with the normal TVDB scraper before this? The only way I can think of that would cause this is if your XBMC uses cache differently from mine (which delete cache on searching each TV series).

P.S. Updated to v0.31 because ANN changed the html for their casts section. I also took the opportunity to include actors with multiple roles (possible now with the new html).
(This post was last modified: 2010-02-05 09:17 by volforto.)
find quote
nuclearsunshine Offline
Junior Member
Posts: 2
Joined: Feb 2010
Reputation: 0
Post: #7
I haven't tried the new version, but everything works fine with TheTVDB and the anidb.net scrapers.
find quote
volforto Offline
Junior Member
Posts: 17
Joined: Jan 2010
Reputation: 5
Post: #8
This is definitely a cache thing then. In order to fallback to ANN thumbnail I am caching the ANN page. Since I don't know a way to pass the ID to that sub-sub-sub-function, I am using a generic details.html as the cache name. It sounds like your version of XBMC is keeping that cache across series.

I am not sure how to fix that, unless I remove the thumbnail fallback. Perhaps there's a way to pass parameters to the functions?

EDIT:
I was doing some research and found the clearbuffers="no" option from a previous thread. I can't believe I missed it. Maybe I will be able to do something with it.
(This post was last modified: 2010-02-08 02:31 by volforto.)
find quote
TREX6662k5 Offline
Member+
Posts: 214
Joined: Oct 2006
Reputation: 0
Location: London, United Kingdom
Post: #9
Handy thank you!

WYSIWYG
find quote
volforto Offline
Junior Member
Posts: 17
Joined: Jan 2010
Reputation: 5
Post: #10
TREX6662k5 Wrote:Handy thank you!

Thanks Smile


I have updated the code which hopefully fixed the problem of the lumped series. It turns out there's another simple method for the thumbnail fallback, so I don't even need the clearbuffers="no" option. I used it anyway to update the functionality of the fararts grabber.

Also fixed some specials cases for the getting the casts info.
find quote
Post Reply