Python scraper

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #1
I've started a project similar to ScraperXML but in Python and the goal is compability with dharma+ addons.
However all information about scraper development is kind of (well thats a understatement) outdated, or perhaps I've missed something?

I'm trying to reverse engineer the ones that are included in dharma release but i'm getting very confused. Is there -any- information on how the dharma engine works with scrapers?

perhaps a flowchart Tongue?
find quote
spiff Online
Grumpy Bastard Developer
Posts: 12,179
Joined: Nov 2003
Reputation: 82
Post: #2
code. see addons/Scraper.cpp, and video/VideoInfoDownloader.cpp

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #3
Oh, my c/c++ is very rusty. This will be interssting Tongue.

-Z
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #4
I've put up a git on github with the project. Not much yet since i started today. But here it is anyway.

https://github.com/ztripez/pyScraper
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #5
Ok, i've built an addon class that builds a stack with all functions from it's addon and from dependencyn.

I have a couple of questions though:

* The buffer(s) has 20 slots, is there a local buffer in every function or is it one global?


* A snippet from tmdb.xml:
Quote:<CreateSearchUrl dest="3">
<RegExp input="$$1" output="<url>http://api.themoviedb.org/2.1/Movie.search/$INFO[language]/xml/57983e31fb435df4df77afb854740ea9/\1</url>" dest="3">
<RegExp input="$$2" output="+\1" dest="4">
<expression clear="yes">(.+)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</CreateSearchUrl>

The basics are simple;
- Do regex-replace on buffer 1 with output and use buffer 1 as source and put the result in buffer 3.

However, sinces there are a nested RegExp should i run the regex on the parent buffer and if so, should i do it before or after i've applied the parents regex?
find quote
spiff Online
Grumpy Bastard Developer
Posts: 12,179
Joined: Nov 2003
Reputation: 82
Post: #6
the buffers are global to the parser. if you dig a bit you'll see the 'clearbuffers=no' tag. that's a way to pass information between functions.

expressions are evaluated in an lifo/depth-search fashion, i.e. dig into the deepest one and evaluate that first.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #7
spiff Wrote:the buffers are global to the parser. if you dig a bit you'll see the 'clearbuffers=no' tag. that's a way to pass information between functions.
But if the buffers are global for the scraper, why is the 'clearbuffers=no' needed? When does it clean itself?

spiff Wrote:expressions are evaluated in an lifo/depth-search fashion, i.e. dig into the deepest one and evaluate that first.
Alright, i thought so, thanks.


Thanks for the info
-Z
find quote
spiff Online
Grumpy Bastard Developer
Posts: 12,179
Joined: Nov 2003
Reputation: 82
Post: #8
by default, if that tag isn't set, you clear the buffers at the end of a function call (or well, somewhere before the next function is called, but logic wise it's easiest to have it at the end of an evaluation).

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #9
Alright, thanks
find quote
fastestcomputer Offline
Junior Member
Posts: 1
Joined: Apr 2011
Reputation: 0
Post: #10
Thanks for the info
find quote