Any plans for reformating the Scraper XML format any time soon?

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
althekiller Offline
Team-XBMC Developer
Posts: 4,703
Joined: May 2004
Reputation: 12
Post: #11
May I suggest changing the name to "ScrapeMe"? Smile

No sense in promoting all the uneducated folk calling them "scrappers" in the forums.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #12
The name of it is ScrapeMe, that was just a typo when i was making created the project and i've just been to busy coding to go back and fix it, as you'll notice in the actual running window the name is correct (I got the name cause i was listening to Nirvana's "Rape Me" when i came up with the idea!
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #13
Okay i have a few questions about the how the scraper xml communicates with XBMC

when it starts execution of nested statements does it start from the deepest nested RegExp or from the outer RegExp.

I tried reading the C++ source code, but its a bit complex for me to read, with lots of stuff i still don't understand.

I've already got the tester working at the root level RegExp and already managed to get a entire scraper running on the tester all the way through (a really simple one i wrote no custom functions and no nested expresssions

Code:
<RegExpA>
    <RegExpB>
       <RegExpC>
           <expression/>
       </RegExpC>
       <expression/>
   </RegExpB>
   <espression/>
</RegExpA>

would A or C be the first to execute?


And is there 9 buffers total to save data or 9 buffers per expression?
(This post was last modified: 2009-04-26 08:26 by Nicezia.)
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,181
Joined: Nov 2003
Reputation: 82
Post: #14
it's evaluated as a lifo, last in, first out, i.e. innermost first, so C then B then A.

there is a total of 20 buffers and they are global to the scraper parser.
usually these are cleared after executing a function, unless the clearbuffers="no" param is set.
the reason for having this parameter avail, is that it allows for passing info between functions that is executed after each others (i.e. <url function="foo"..> chains)

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #15
so if i'm understanding right there are 20 global buffers (cleared between functions unless specified) and there are 9 buffers available for RegExp captures, and execution of expressions works its way backwards towards the root expression?

Last question i need to ask is about noclean... what exact html is stripped if this is NOT set?
(This post was last modified: 2009-04-26 21:23 by Nicezia.)
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,181
Joined: Nov 2003
Reputation: 82
Post: #16
yes yes and yes. exactly like i have already explained. the nine buffers available for regexp captures is a property of your regexp parser, 9 is the minimal it must support.

i can hardly be more precise here than the code here, CHTMLUtil::RemoveTags

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #17
Sorry to bother you one more time, but i'm just about done with my regExp engine, and there's actually one more thing i need to know, i wouldn't ask but i'm not so good at c++ at all and the regexp engine looks all greek to me, (other things like the httputil, and xml utils were easy as pie to read through.. )

do you make use of nested parenthesis in regular expressions in the scrapers?
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,181
Joined: Nov 2003
Reputation: 82
Post: #18
they can so you must support it yah

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Schenk2302 Offline
Senior Member
Posts: 103
Joined: Feb 2009
Reputation: 4
Post: #19
Hi Nicezia,

are you planing to release the editor in near future?

Thanks in advance

Schenk
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #20
Schenk2302 Wrote:Hi Nicezia,

are you planing to release the editor in near future?

Thanks in advance

Schenk


The Editor will be an extension of my ScraperXML library, so my first goal is to support all scraper types (And A few i've come up withon my own) in that library before actually moving on to making the editor... but now that i have the (nearly all of the) base methods coded into the library its just a matter of accounting for the different functions of other type scrapers and development of the library is moving pretty fast. So I will say, (and don't hold me to this) in a months time i should be able to actually get to developing the scraper editor.

Even though its not really neccessary, i may even just turn the whole scraper editor project into an xml editor so one can test the scrapers as you're making changes without having to use an extra program for the process.
(This post was last modified: 2009-05-22 04:57 by Nicezia.)
find quote
Post Reply