Kodi Community Forum
Python scraper - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: Python scraper (/showthread.php?tid=98759)



Python scraper - ztripez - 2011-04-07

I've started a project similar to ScraperXML but in Python and the goal is compability with dharma+ addons.
However all information about scraper development is kind of (well thats a understatement) outdated, or perhaps I've missed something?

I'm trying to reverse engineer the ones that are included in dharma release but i'm getting very confused. Is there -any- information on how the dharma engine works with scrapers?

perhaps a flowchart Tongue?


- spiff - 2011-04-07

code. see addons/Scraper.cpp, and video/VideoInfoDownloader.cpp


- ztripez - 2011-04-07

Oh, my c/c++ is very rusty. This will be interssting Tongue.

-Z


- ztripez - 2011-04-07

I've put up a git on github with the project. Not much yet since i started today. But here it is anyway.

https://github.com/ztripez/pyScraper


- ztripez - 2011-04-08

Ok, i've built an addon class that builds a stack with all functions from it's addon and from dependencyn.

I have a couple of questions though:

* The buffer(s) has 20 slots, is there a local buffer in every function or is it one global?


* A snippet from tmdb.xml:
Quote:<CreateSearchUrl dest="3">
<RegExp input="$$1" output="<url>http://api.themoviedb.org/2.1/Movie.search/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1</url>" dest="3">
<RegExp input="$$2" output="+\1" dest="4">
<expression clear="yes">(.+)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</CreateSearchUrl>

The basics are simple;
- Do regex-replace on buffer 1 with output and use buffer 1 as source and put the result in buffer 3.

However, sinces there are a nested RegExp should i run the regex on the parent buffer and if so, should i do it before or after i've applied the parents regex?


- spiff - 2011-04-08

the buffers are global to the parser. if you dig a bit you'll see the 'clearbuffers=no' tag. that's a way to pass information between functions.

expressions are evaluated in an lifo/depth-search fashion, i.e. dig into the deepest one and evaluate that first.


- ztripez - 2011-04-08

spiff Wrote:the buffers are global to the parser. if you dig a bit you'll see the 'clearbuffers=no' tag. that's a way to pass information between functions.
But if the buffers are global for the scraper, why is the 'clearbuffers=no' needed? When does it clean itself?

spiff Wrote:expressions are evaluated in an lifo/depth-search fashion, i.e. dig into the deepest one and evaluate that first.
Alright, i thought so, thanks.


Thanks for the info
-Z


- spiff - 2011-04-08

by default, if that tag isn't set, you clear the buffers at the end of a function call (or well, somewhere before the next function is called, but logic wise it's easiest to have it at the end of an evaluation).


- ztripez - 2011-04-08

Alright, thanks


- fastestcomputer - 2011-04-15

Thanks for the info