Anime News Network Scraper (Release?)

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
jsc315 Offline
Junior Member
Posts: 5
Joined: Oct 2009
Reputation: 0
Post: #11
thanks Smile
find quote
volforto Offline
Junior Member
Posts: 17
Joined: Jan 2010
Reputation: 5
Post: #12
Trying to recover some of the first post info from the old forum backup.

Also, the scrapers have been submitted as ticket 8961 about a week ago.
http://trac.xbmc.org/ticket/8961
find quote
Zarbis Offline
Junior Member
Posts: 17
Joined: Nov 2009
Reputation: 0
Post: #13
I've made a comparison of ANN shows and TheTVDB.com scrapers on my anime archive.

How i did measure scraper's quality?
Scraper found title - 1 point
Scraper found fan art - 3 points
One of three:
Scraper found banner - 3 points
Scraper found thumb - 2 points
Scraper only got fallback thumb - 1 point

So scraper can get maximum 7 points per title, 399 points for all 57 titles.

The most strange in ANN scraper's behavior was that it always downloads thumb, even if i choose banner, may be i did something wrong and that's why test result became irrelevant.

I've made full archive scan with full database removal before scan for each scraper.

Here is results:

Fully automatic scan:
ANN: 302 points
TheTVDB.com: 323 points

With manual corrections:
ANN: 302 points
TheTVDB.com: 373 points

However ANN scraper automatically found 55/57 titles, when TheTVDB.org found only 47/57. And the main reason why ANN got less points - as I mentioned before, it downloads only banners. So I hope it's a trivial bug or my fault.

Disclaimer:

First of all testing titles are not ideally representative , it's just my 400 GB of anime.
Second: points system is subjective and based on my tastes, if someone prefers banners over posters or thinks 3 points for fan art is overrated - results will change significantly.

P.S. However I think ANN scraper worth including into official XBMC distribution. It gives needed functionality to deal with specific media formats rather then generic TV shows and helps to solve anime-specific problems which will never happen in generic TV shows scraping.

P.S.S. Spreadsheet with comparison results: http://dl.dropbox.com/u/459039/comparison.xls
(This post was last modified: 2010-03-18 17:34 by Zarbis.)
find quote
volforto Offline
Junior Member
Posts: 17
Joined: Jan 2010
Reputation: 5
Post: #14
Thanks for the comparison, Zarbis Smile It certainly looks useful. It's good to see the TV scraper can find most title automatically (even some movies titles by the look of it).

I am not too sure I understand the problem you mentioned. Are you saying you have selected the option "Enable TVDB Posters" and the scraper ended up getting banners?
find quote
Zarbis Offline
Junior Member
Posts: 17
Joined: Nov 2009
Reputation: 0
Post: #15
volforto Wrote:I am not too sure I understand the problem you mentioned. Are you saying you have selected the option "Enable TVDB Posters" and the scraper ended up getting banners?
Exactly, both "Enable TVDB ***" is getting banners. I will try to reproduce that behavior and if i will succeed i will try to provide more information.

I confirm that i've done something wrong, rescanned my archive and got posters, not banners, now results are:

Fully automatic scan:
ANN: 369 points
TheTVDB.com: 323 points

With manual corrections:
ANN: 369 points
TheTVDB.com: 373 points

ANN scraper now really close and found all of 57 test titles. However it looks strange, that ANN scraper haven't found posters for 10 titles, which TheTVDB.com found.
Any way I personally now would use this scraper for anime content, because it gives much more relevant results with no need of making manual corrections of silly search requests (e.g. To Aru -> Toaru and some more).

P.S. Updated table: http://dl.dropbox.com/u/459039/comparison.xls
(This post was last modified: 2010-03-19 15:11 by Zarbis.)
find quote
volforto Offline
Junior Member
Posts: 17
Joined: Jan 2010
Reputation: 5
Post: #16
Thanks for the update, Zarbis.

I can understand why the ANN scraper does not find some TVDB posters. It is because the "Include Alternative Titles..." option is only for the fanart. I did not include this option for the banners/posters since it might put too much stress on TVDB, as there was no way to cache those search pages.
find quote
lolfrk Offline
Junior Member
Posts: 2
Joined: Mar 2010
Reputation: 0
Post: #17
Oh, thank you, thank you, thank you, this just made my life SO much easier, having 600GB of anime was making it a ball buster to get it into xmbc.

THANKS again,

lolfrk
find quote
EMK0 Offline
Senior Member
Posts: 207
Joined: Oct 2008
Reputation: 0
Post: #18
i keep getting this for all my episodes
DEBUG: could not enumerate file smb://192.168.1.121/Anime/Naruto/[ANBU-AonE]_Naruto_02_[F40F1F44].avi
what do i need to do to make this work ?
find quote
Abnormal1 Offline
Junior Member
Posts: 44
Joined: Jul 2006
Reputation: 0
Post: #19
I am unable to get the Movie version of this scraper to work. In xbmc it comes up with "Could not download information". The TV Shows version DOES work.

Also if I test both scrapers in ScraperXML Editor it errors with

could not download webpage
Error Returned:
Invalid URI: The URI scheme is not valid.
It displays the following as the search url and when I click validate XML it comes up with error "'=' is an unexpected token. The expected token is ';'. Line 1, position 89."
PHP Code:
<url>http://api.bing.com/xml.aspx?AppId=16E50AB9947899C41433EB944C60174737855036&Sources=web&xmltype=attributebased&Query=gantz%20+site%3Aanimenewsnetwork.com%2Fencyclopedia</url> 
find quote
EMK0 Offline
Senior Member
Posts: 207
Joined: Oct 2008
Reputation: 0
Post: #20
figure it out i added
<advancedsettings>
<tvshowmatching action="append">
<regexp> ([0-9]+)</regexp>
<regexp>_([0-9]+)</regexp>
<regexp>-([0-9]+)</regexp>
</tvshowmatching>
</advancedsettings>

now it picks up everything Smile
find quote
Post Reply