[RELEASE] Scraper Editor (Based on ScraperXML open source C# Library) - Help wanted!

  Thread Rating:
  • 1 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Daniel Malmgren Offline
Senior Member
Posts: 187
Joined: Jul 2009
Reputation: 0
Location: Sweden
Thumbs Up   
Post: #16
Just joined the forum because I wanted to say that this is a really great tool! Only weak points are the lack of documentation and some functions that aren't implemented yet. Once this is fixed this will be the absolutely ultimate scraper editor.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #17
Daniel Malmgren Wrote:Just joined the forum because I wanted to say that this is a really great tool! Only weak points are the lack of documentation and some functions that aren't implemented yet. Once this is fixed this will be the absolutely ultimate scraper editor.

thanks, working on new features & documentation - but its going slow since my priority is to finish up ScraperXML, the source of the editor and the release are now availabe at Sourceforge

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #18
Does XBMC follow some kind of standard indentation model? I'd like for the scrapers made or edited with the editor to be able to directly patch with svn

New Release Features:

Changed things around in Tester a bit, allowing to set ALL buffers individual, UrlEncode data in buffers download webpages or open files to any given buffer.

Buffer Chase Mode, (When you are walking through functions, the first watch window will show you what's in the current target buffer, the second will show you the buffer of the last executed RegExp, When running a function the third watch window will show you what's in the Function's Dest buffer.

New window that displays the state of all buffers (ReadOnly and is disabled during RunFunction Mode)

Next Version Features:
Complete execution/Walk=through of scraper from createsearchurl to getdetails (+ custom function execution) with automatic downloading and buffer-fill of webpages

Context Menu with common Regular expressions and capture options for the expression box.

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2009-07-27 22:37 by Nicezia.)
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #19
[Image: Preview4.jpg]

Download

Source code available in svn as well

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #20
bit off topic and probably does not directly relate to your app but thought i'd poke you in any case;

http://forum.xbmc.org/showthread.php?got...st&t=55460
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #21
also, another change that will hit svn shortly which will definitely affect your lib/app; includes.

Code:
<scraper>
  <include>common/something.xml</include>
</scraper>
where common/something.xml looks like
Code:
<scraperfunctions>
  <SomeFunction dest="5">
    ....
  </SomeFunction>
</scraperfunctions>

this to allow sharing functions between scrapers (tmdb fanart is popular for instance).
basically; on load parse all <include> tags, then inject all nodes under<scraperfunctions>
into the <scraper> node of the parent file
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #22
spiff Wrote:bit off topic and probably does not directly relate to your app but thought i'd poke you in any case;

http://forum.xbmc.org/showthread.php...ewpost&t=55460


thanks for the heads up, (and yeah it doesn't really affect ScraperXML Editor per se, but its definately something i needed to know for ScraperXML, as i'm going to have handling in there for reading the actual xml formatted nfo files)
i now have both taken into account for scraperxml;



spiff Wrote:also, another change that will hit svn shortly which will definitely affect your lib/app; includes.


Code:
<scraper>
  <include>common/something.xml</include>
</scraper>
where common/something.xml looks like
Code:
<scraperfunctions>
  <SomeFunction dest="5">
    ....
  </SomeFunction>
</scraperfunctions>

this to allow sharing functions between scrapers (tmdb fanart is popular for instance).
basically; on load parse all <include> tags, then inject all nodes under<scraperfunctions>
into the <scraper> node of the parent file

ah another things that's going to take some creative coding,

and these <includes> will they run as a custom function?

oh and one more question while i'm at it... how are the multiple urls handled for episodeguide (are they chained loaded into buffers same as standard functions or are they concated into one buffer?)

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2009-07-28 02:24 by Nicezia.)
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #23
for some reason episodeguide seems to do a linear processing of several url's. there is no reason why i wrote it like this, old old code. i should change it to follow the same rules.

the point of include are that after load, it would be just as if the function resided in the same xml main scraper. that's why i say inject. it's trivial in tinyxml;

Code:
// inject includes
        const TiXmlElement* include = m_pRootElement->FirstChildElement("include");
        while (include)
        {
          if (include->FirstChild())
          {
            CStdString strFile = CUtil::AddFileToFolder(strPath,include->FirstChild()->Value());
            TiXmlDocument doc;
            if (doc.LoadFile(strFile))
            {
              const TiXmlNode* node = doc.RootElement()->FirstChild();
              while (node)
              {
                 m_pRootElement->InsertEndChild(*node);
                 node = node->NextSibling();
              }
            }
          }
          include = include->NextSiblingElement("include");
        }

and can't be that much harder with your xml parser?
(This post was last modified: 2009-07-28 09:31 by spiff.)
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #24
please see http://forum.xbmc.org/showthread.php?tid=55353&page=4
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #25
spiff Wrote:for some reason episodeguide seems to do a linear processing of several url's. there is no reason why i wrote it like this, old old code. i should change it to follow the same rules.

the point of include are that after load, it would be just as if the function resided in the same xml main scraper. that's why i say inject. it's trivial in tinyxml;

Code:
// inject includes
        const TiXmlElement* include = m_pRootElement->FirstChildElement("include");
        while (include)
        {
          if (include->FirstChild())
          {
            CStdString strFile = CUtil::AddFileToFolder(strPath,include->FirstChild()->Value());
            TiXmlDocument doc;
            if (doc.LoadFile(strFile))
            {
              const TiXmlNode* node = doc.RootElement()->FirstChild();
              while (node)
              {
                 m_pRootElement->InsertEndChild(*node);
                 node = node->NextSibling();
              }
            }
          }
          include = include->NextSiblingElement("include");
        }

and can't be that much harder with your xml parser?

ah, actually that's going to be pretty easy to work with. The harder problem is going to be that now i have to make it so the program using scraperxml reports all load all common functions or keep track of them to report to the scraper code or have to rewrite code so that the scraperxml does all the management of scrapers.

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2009-07-29 03:41 by Nicezia.)
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #26
Just added a "Settings Wizard" that creates a GetSettings Function based on the settings details you provide it, allows you to add or remove settings at any time, integrated ScraperXML directly into the code, and added a "New Scraper Wizard" that will walk inexperience scraper makers through the process.

Currently in the process of adding the ability to create Include Files (Common scraper functions), context menu additions that include "Insert Replacement Refernce" (To add settings text, labelenum, and integer replacement indicators to the "input", "output" and "expression" content selection has been changd to a dropdown list containing all the types of content handled by XBMC and ScraperXML, as has conditional (which the values of which are read from the settings created or edited by the "Settings Wizard". Adding programs settings which allow you to specify two folders for Scrapers (one for the XBMC scraper folder, and another for a media manager app's scraper folder (though no media manager apps really completely integrate ScraperXML yet. And working on a help file with all documentation.

Internal changes being made as well as now i only use Linq to parse xml (due to the dificulty in formatting the xml) internally as string, and saving with streamwriter, rather than Linq. So now output should be alot more pretty than it used to be.

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2009-08-04 04:28 by Nicezia.)
find quote
redtapemedia Offline
UMM Project
Posts: 551
Joined: Mar 2009
Post: #27
Nicezia Wrote:Currently in the process of adding the ability to create Include Files (Common scraper functions), context menu additions that include "Insert Replacement Refernce".

I've been tearing my hair out for the last week trying to modify the IMDB scraper to return rotten tomatoes ratings, so creation of include files support would greatly benefit me.

Also, just played around briefly with your editor, but how does testing work? Is there a way I can just specify a movie name and then have it step through the scraper? Managed to get it to step through one function by pasting in the HTML of a website, but I can't get it to go through the entire thing.

Another thing that would be useful to a regular expression noob like me is context highlighting of regular expressions, similiar to how http://gskinner.com/RegExr/ does it.

Thanks for all your hard work.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #28
redtapemedia Wrote:Also, just played around briefly with your editor, but how does testing work? Is there a way I can just specify a movie name and then have it step through the scraper? Managed to get it to step through one function by pasting in the HTML of a website, but I can't get it to go through the entire thing.

ah not til i release the version i'm working on at the moment. currently you can only throw information at it on a per function basis....

actually if you start with the Create search results function you can step through a scraper quite easily, just (making sure you have version 2.15 which is up for download at my sourceforge ScraperXML site) click the Set Buffers checkbox in the tester, change one of the buffers to 1 - using the numeric up/down control on the left, type in the name and press url encode then set, and then run or step through the CreateSearchUrl Function copy and paste the return to the url textbox in the tester, and pres download to $$1, then run the getsearchresults function, from the details there copy one of the results url to url textbox change one of the buffers to $$2 and put in the id (if provided by the search results, and in the $$ put in the url of the page being downloaded, press download to $$1 and then execute the function or walk through it) my next release will have the ability to do this automatically.


redtapemedia Wrote:Another thing that would be useful to a regular expression noob like me is context highlighting of regular expressions, similiar to how http://gskinner.com/RegExr/ does it.

Thanks for all your hard work.

That's going to be some work there (I am STILL an amateur coder), but i'll work on it.

redtapemedia Wrote:I've been tearing my hair out for the last week trying to modify the IMDB scraper to return rotten tomatoes ratings, so creation of include files support would greatly benefit me.

the include files are a new feature coming to XBMC, its scraper functions that can be shared amongst scrapers, (Read the last few posts between me and spiff above).

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2009-08-04 05:24 by Nicezia.)
find quote
redtapemedia Offline
UMM Project
Posts: 551
Joined: Mar 2009
Post: #29
Nicezia Wrote:That's going to be some work there (I am STILL an amateur coder), but i'll work on it.

No worries, Just a wishlist feature, and I'm sure a lot of people who use your tool are quite familiar with regular expressions. It's not too much of a deal for me to use that website to construct them while I'm learning. I'll probably find I won't need something like that once I've learned regex a bit more.

Thanks for the timely reply.
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #30
the includes wont make it one bit easier for you to add anything. only difference is the function's code would be in a different .xml file
find quote
Post Reply