[WIP] AniDB.net Anime Video Scraper

  Thread Rating:
  • 3 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #31
you're telling me this xml api doesn't even have a search?

the uhrm, for lack of another non-offensive term, choices of the anidb guys (slight offense only ommina) truely amazes me.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
(This post was last modified: 2010-01-14 03:42 by spiff.)
find quote
eldon Offline
Junior Member
Posts: 14
Joined: Jan 2010
Reputation: 0
Post: #32
well as far as i could tell the http api only provides access to limited per anime data, provided you already know the anidb ID.

As for the search they provide a daily copy of the anime titles in their database (around 2.2MB in the current state), and leave the client side deal with it.

I understand by your answer that google will be the best choice for this scrapper to identify the anime. Using the database could be left as an option and/or further identification could be done by cross referencing the id found on google and the title present in the database matching that id.

i'll try to see if parsing the database is fast enough when you have the anime id and then maybe use it to double check google's result.

Besides that the scrapper looks fine, anidb api data is used and missing media or episode info are fetched from thetvdb api.

I still have to see how i can access the animes separately from the other tv shows in xbmc, i guess there are some shortcuts you can use.

I'll post the srapper over the weekend, thx again for your answer.
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #33
while running through 2.2mb shouldn't really take a minute with a tight expression, it still won't be nice. atleast the scraper will operate nice for those it can find with google, else there is always the url nfo's.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Der Idiot Offline
Junior Member
Posts: 1
Joined: Jan 2010
Reputation: 0
Post: #34
you can't be serious... regexp to parse xml... and all that using python?! *facepalm* every time i see people abusing regexp for everything and their dog i die a little bit innerly...

there are various very fast xml frameworks for py. so why oh why don't you use one of them?!

cheap example:


Code:
from lxml import etree

searchterm = "naruto"
result     = []
tree       = etree.parse(file('animetitles.xml'))

for anime in tree.getroot().getchildren():
    for titles in anime.getchildren():
        if titles.text.lower().find(searchterm) >= 0:
            result.append(anime.get("aid"))
            break

print result

that's not even an attempt at being efficient or fast, but takes only a few ms for the 2mb file.

also the point is to grab the xml file and process it completely locally (and then reget it every odd blue moon). which has speed advantages and doesn't cause additional load on anidb's server. aside of that you yourself can decide on how you want to search.

considering anidb is run completely free of cost, ads or money donations and is not backed by a multimio corporation it should be quite obvious that we have restricted facilities. so don't expect us to provide supreme additional services which the majority of our users will never ever use.
(This post was last modified: 2010-01-15 01:44 by Der Idiot.)
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #35
uhm. i don't want/intended to pick a fight. but who gave you the idea python is involved? it is not. we use a custom scraper framework based on regexps+xml.
i agree that using regexp to parse xml is stupid in general, but i made the choice since it allows handling html/xml/whatever text format in one unified framework, and .py's were just too slow for the job on the xbox.

from my point of view you would save bandwith by offering a search. i have to do a helluvalot of searches to download 2mb from your site. this is obviously not valid if cpu is your concern.

of course i do not expect anything from a free service based on volunteery work. i just find the choice of offering data, yet making no search available, hence forcing every client to reimplement the wheel, peculiar. to be a bit cheeky, i would like to point out that both tmdb and thetvdb falls into exactly the same category (free, voluntary) and both survive just fine having a search in their api Tongue

in either case, your services are inadequate/unfit for use by our system, which is unfortunate for those of our users that care (i am not included). they will have to live with the not-always-functioning google search.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
eldon Offline
Junior Member
Posts: 14
Joined: Jan 2010
Reputation: 0
Post: #36
well no matter how the client side handles things, anidb provides it with 95% of the data required to get something out of it and i guess that's good enough.

If google fails, the xml titles database can be used as a failsafe, moreover you can tweak google search by using "manual" inputs in xbmc.

If it can be of any help i'm not sure some of the data present in the current http api is really necessary, such as full tags and categories descriptions when the anime cast is missing.

Anyways i'll make sure the scrapper can handle both google and the local database.

Anidb is a truly great website and i can only encourage the devs and admins keep up the good work.
find quote
Xeijin Offline
Member
Posts: 62
Joined: Jul 2007
Reputation: 0
Post: #37
How is this going? Any more progress yet or has the lack of features with the API put a stop to this one? TVDB seems unable to scrape my Anime (episodes) so I'd definitely be interested in this.
find quote
eldon Offline
Junior Member
Posts: 14
Joined: Jan 2010
Reputation: 0
Post: #38
hi sorry for the late update, i couldn't find the time to make more tests with the anime database so i'm still only using google at the moment.

If i can't find time to update it with additional anidb database processing, i'll post it in its current state this weekend. It's working without the db but relying only on google can't be considered to be safe on the long term.
find quote
mashles Offline
Junior Member
Posts: 7
Joined: Jan 2010
Reputation: 0
Post: #39
Have you found any way of separating Anime from regular TV Shows?

I tried Anime in my library using thetvdb scraper, it wasnt too bad but I will be sure to give your scraper a try. It did get a bit annoying though, having them so mixed, so I just switched to using library for TV and Movies and file view for Anime.
find quote
Xeijin Offline
Member
Posts: 62
Joined: Jul 2007
Reputation: 0
Post: #40
mashles Wrote:Have you found any way of separating Anime from regular TV Shows?

I tried Anime in my library using thetvdb scraper, it wasnt too bad but I will be sure to give your scraper a try. It did get a bit annoying though, having them so mixed, so I just switched to using library for TV and Movies and file view for Anime.

I was also looking for a way to do this as I too find it annoying when the two are mixed. Currently I use the "genre" filter in the Aeon skin (you can get to it by hitting the Down directional key on your remote while TV Shows is highlighted) to show just tv-shows tagged with "Anime" or "Animation". I think a "proper" solution would maybe require some change of code at xbmc's level in order to create a new category for anime (i.e. Movies, TV Shows, Anime) which would have its own associated scrapers? (Or I may just be talking out of my arse).

Edit: Just noticed your reply eldon, looking forward to it, I can understand that the google search method is not ideal but frankly TVDB is failing so severely at recognising any episodes of my Anime that I'm willing to take the risk.
find quote
Post Reply