I've started a project similar to ScraperXML but in Python and the goal is compability with dharma+ addons.
However all information about scraper development is kind of (well thats a understatement) outdated, or perhaps I've missed something?
I'm trying to reverse engineer the ones that are included in dharma release but i'm getting very confused. Is there -any- information on how the dharma engine works with scrapers?
perhaps a flowchart
?
code. see addons/Scraper.cpp, and video/VideoInfoDownloader.cpp
Oh, my c/c++ is very rusty. This will be interssting
.
-Z
Ok, i've built an addon class that builds a stack with all functions from it's addon and from dependencyn.
I have a couple of questions though:
* The buffer(s) has 20 slots, is there a local buffer in every function or is it one global?
* A snippet from tmdb.xml:
Quote:<CreateSearchUrl dest="3">
<RegExp input="$$1" output="<url>http://api.themoviedb.org/2.1/Movie.search/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1</url>" dest="3">
<RegExp input="$$2" output="+\1" dest="4">
<expression clear="yes">(.+)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</CreateSearchUrl>
The basics are simple;
- Do regex-replace on buffer 1 with output and use buffer 1 as source and put the result in buffer 3.
However, sinces there are a nested RegExp should i run the regex on the parent buffer and if so, should i do it before or after i've applied the parents regex?
the buffers are global to the parser. if you dig a bit you'll see the 'clearbuffers=no' tag. that's a way to pass information between functions.
expressions are evaluated in an lifo/depth-search fashion, i.e. dig into the deepest one and evaluate that first.
spiff Wrote:the buffers are global to the parser. if you dig a bit you'll see the 'clearbuffers=no' tag. that's a way to pass information between functions.
But if the buffers are global for the scraper, why is the 'clearbuffers=no' needed? When does it clean itself?
spiff Wrote:expressions are evaluated in an lifo/depth-search fashion, i.e. dig into the deepest one and evaluate that first.
Alright, i thought so, thanks.
Thanks for the info
-Z
by default, if that tag isn't set, you clear the buffers at the end of a function call (or well, somewhere before the next function is called, but logic wise it's easiest to have it at the end of an evaluation).