![]() |
|
[WIP] AniDB.net Anime Video Scraper - Printable Version +- XBMC Community Forum (http://forum.xbmc.org) +-- Forum: Help and Support (/forumdisplay.php?fid=33) +--- Forum: Add-ons Help and Support (/forumdisplay.php?fid=27) +---- Forum: Metadata scrapers (/forumdisplay.php?fid=147) +---- Thread: [WIP] AniDB.net Anime Video Scraper (/showthread.php?tid=64587) |
[WIP] AniDB.net Anime Video Scraper - MukiDA - 2009-12-20 07:10 I can't, for the life of me, figure out what's wrong with my scraper, though my initial guess probably has to do with my not properly using the "<url>" tag. With that in mind, I don't suppose I might be able to ask for a smidgen of help. I've been working on it with GVim and ScraperXML Editor, and the following comes up:
Code: <?xml version="1.0" encoding="utf-8"?>- Nicezia - 2009-12-20 13:06 as far as i know the only available attributes for the url tag are 1: spoof="foo" (the referer page) 2. post="yes" (tells the http api to use POST method and only used if calling a custom function (in GetDetails, GetSettings, or custom functions 3. function="CustomFunctionName" I could be wrong on this, considering i don't fully understand the http in XBMC but i don't think there is a gzip="yes" option for url (i think xbmc automatically detects gzipped sites and decompresses them - spiff - 2009-12-20 21:14 there is indeed a gzip="yes" parameter to enable gzipped content. this is useful in the cases where it's not set explicity in the http headers. in the latter case, it's handled automagically by curl. Hey spiff! - MukiDA - 2009-12-20 21:50 Is there any other clear mistake in my code, then? I can't figure out what's wrong, and AFAIK there's no testing programs out there (the only two I know about have the above-listed problems that prevent me from performing a complete search test) Also, is there a way to specify "ignored" portions of my search query? At least for my specific use, I want it to ignore anything in the folder name inside parentheses. - Nicezia - 2009-12-20 21:56 spiff Wrote:there is indeed a gzip="yes" parameter to enable gzipped content. this is useful in the cases where it's not set explicity in the http headers. in the latter case, it's handled automagically by curl. good to know, i guess that's something i need to code for w00t, another dev! =3 - MukiDA - 2009-12-20 22:20 I didn't know my version of the editor was so far behind. (caught the 3.5 link in your sig) - MukiDA - 2009-12-31 19:25 Okay, now it almost works! This gets the thumbnail, year, title, and rating. Herein lies the two questions I have to any other scraper writers: 1. How do I add episode titles? 2. How do I add "excluded" terms for the search query? I place format info (codec, audio tracks) in parenthesis in the folder name, so at least for the version of the scraper I keep personally, I'd like to remove all of that from the text that goes to the search query. 2.a. I tried adding the expression: Code: <CreateSearchUrl dest="3">([^)(]+)\([a-zA-Z0-9]+\)+ ([^)(]+(?=(?:\([a-zA-Z0-9]+\))+)) Edit note : Okay, scrap.exe works (as far as getting a useful search URL, obviously the g-zip's a no-go) with: ([^)(]+) What am I doing wrong?! Is there ANY way to figure out WTF XBMC is doing with this expression?! >_< Code: <?xml version="1.0" encoding="utf-8"?><scraper framework="1" date="2009-11-15" name="AniDB.net" content="tvshows" thumb="anidb.jpg" language="en">EDIT : Solved how to deal with a single search result that defaults to the info page. Just add a regex to "getsearchresults" that looks specifically for the info page. Not sure what to do if the info page doesn't have a link to itself like it does on AniDB, tho. - spiff - 2010-01-01 14:50 $#pages+1 holds the url to the page that is scraped for this very reason. cleaning filenames has nothing to do with the scraper. see <cleanstrings> or something like that in advancedsettings.xml. no idea why those expressions doesn't work and way too january 1. in my head atm
- eldon - 2010-01-01 20:48 hi, i'm trying to improve that anidb scraper a bit and was wondering if the wiki regex info are still correct or not. i'm on linux using the 9.11 xbmc and it looks like regex lazyness do work although it was stated not working in the wiki. Can i assume it is now working and plateform independant or should i stick to painfull lazyness free regex ? And do you hints on the current regex version, and limitations, running in xbmc, in order to clear things up for my quick shot at this scrapper ? thx - zosky - 2010-01-01 23:06 MukiDA, props for your effort. i been dreaming of the link between aniDB and xbmc for a while. out of ~43 (on my NAS) theTVdb has found all but 1, i begrudgingly added it a few days back, but this makes adding new stuff a MASSIVE effort & now i have another (missing from theTVdb) ![]() once you're ready to beta can i give it a shot ? (im afraid that's as helpful as i can be in this situation ) |