Kodi Community Forum
ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... (/showthread.php?tid=50055)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22


- spiff - 2009-07-30

it's me that has to do the justification Wink

the problem is that a lot of scrapers needs to share code. alot needs to e.g. grab fanart from tmdb. it tmdb changes, all scrapers needs to change. it's bad for maintainability.

hence i introduced (not yet, i'm giving nicezia some time to adapt) the <include> so scrapers can share code. nicezia targets this by introducing the manager class


- smeehrrr - 2009-07-30

Got it, that makes sense. I'm already instantiating all of the scrapers in my own code, so I forgot that ScraperXML really only knows about one at a time.


- spiff - 2009-07-30

only now caught your question nicezia; input is blank by default


- ThePluginFac - 2009-07-30

Hi nicezia,

This looks good, so much so I want to make use of it. I develop metadata lookup plugins for the open source 'Open Media Library' plug for Vista Media Center-

http://www.openmedialibrary.org/
http://code.google.com/p/open-media-library/

We are looking to make use of ScraperXML to improve the movie metadata lookups of our project. Assuming you have no problems with that we intend to write an interface dll which will use the functions provided by you library.

Keep up the good work,

Mike


- Nicezia - 2009-07-31

spiff Wrote:only now caught your question nicezia; input is blank by default

hmmm.... well if there is no input field that means the input should be an empty string? because The tvdb, leaves it off and expects input still from $$1

by the way I'm all done with preparing for the changes, ScraperManager Class is fit to go now, complete with include injection.

oh and could you change the title of this topic to reflect that scraperXML is now C#


- Nicezia - 2009-07-31

ThePluginFac Wrote:Hi nicezia,

This looks good, so much so I want to make use of it. I develop metadata lookup plugins for the open source 'Open Media Library' plug for Vista Media Center-

http://www.openmedialibrary.org/
http://code.google.com/p/open-media-library/

We are looking to make use of ScraperXML to improve the movie metadata lookups of our project. Assuming you have no problems with that we intend to write an interface dll which will use the functions provided by you library.

Keep up the good work,

Mike

I have no complaints with you using it, that's ScraperXML's whole purpose, all i ask is to keep the spirit of open source, and if you make improvements or changes to the existing code, make it available for other's use. (same goes with the scrapers as they are liscenced as well under the GPL).


- Nicezia - 2009-08-01

i just did a search for web-scraping utility and this link comes up as the second search result in google... i'm so proud!!!!!


One more signifigant change - Nicezia - 2009-08-01

Except this change i will be making to some of the scrapers which need an api-key as this dll will be in use by multiple programs i will be adding a Api Key setting to those scrapers that use api's that require one

i would rather the authors of these site be able to track useage as the api key was meant to enable - and i wouldn't want to piss off any api managers either as most of them specify 1 api-key per program.mind you the default apikey will still be kept in this setting, i just suggest the first time a program uses scraperxml, to get thier own specific api key or not, its a choice, but mind you the heat is off my back from any api managers

This will not break compatibility with XBMC as the apikey will be a text setting that will be replaced with the $INFO[api-key] reference.
The default one will be the XBMC api-key, but i STRONGLY suggest each implementing program aquire thier own api-keys for each scraper site so as not to foster any ill-will.


- spiff - 2009-08-01

the one slight problem with this is that the tvdb scraper queries the server to get the languages to list in its settings Smile


- Nicezia - 2009-08-01

ah yes. then perhaps just for that part of it i'll leave the api key in the getsettings function. but i'll use my own api-key for that one. (seeing as i already have api-keys for all the sites that XBMC scrapes and more..

btw. if you're waiting on me for the includes update .. i'm good to go
already set up for injecting and everything, actually waiting on you before i upload the newest version. I want to run some test on include files before i commit and work the possibility of bugs out of my code


- spiff - 2009-08-01

heh, i'm waiting for one of us get the inspiration to update the scrapers, not you Smile

but good to know in any case


Last update until there is another significgant change in XBMC scraper code. - Nicezia - 2009-08-08

Well i'm up to date with XBMC code (even actually ahead of it, if scraper includes are what they say they are going to be)

Changes:

1). There are no more content handlers as code started to get redundant - all is handled form the ScraperManager.
2). The Results are recieved by subscribing to the event (i.e. Manager.ResultsRetrieved += new RecievedResultsHandler(myHandlerFunction) [ResultsRetrieved sends a <List>ScrapreResultsEntity] Manager.RetrievedXxxxxDetails += new XxxxxDetailsRetrievedHandler(myDetailsHandlerFunction) [RetrievedXxxxxDetails (where Xxxxx is the type of item i.e movie tvshow Episode, etc sends a object which is actually the tag for whichever kind of search is performed] since i suck at threading so far, it won't actually run on another thread its up to the program to handle that) there is also a NoResultsFound event to in the case that there are no results. the delegate retrieves no input and returns no output.

3). I was bored and added a nice little feature that allows you to get results from all scrapers (in movies, you have to specify wether you want adult scrapers or just regular movie scrapers - this is acheived by adding a [adult="true"] attribute to the scraper root element for adult scrapers)

of course you can only send the ONE result to the Scraper manager to retreive (unless you loop it) Wink

so in the next hour or so I'm going to be uploading to Svn, as well as submitting 5.9 for release (5.9 because i'll actually have to see the includes and test my code against them before i commit to 6.0)


- smeehrrr - 2009-08-09

Nicezia Wrote:2). The Results are recieved by subscribing to the event (i.e. Manager.ResultsRetrieved += new RecievedResultsHandler(myHandlerFunction) [ResultsRetrieved sends a <List>ScrapreResultsEntity] Manager.RetrievedXxxxxDetails += new XxxxxDetailsRetrievedHandler(myDetailsHandlerFunction) [RetrievedXxxxxDetails (where Xxxxx is the type of item i.e movie tvshow Episode, etc sends a object which is actually the tag for whichever kind of search is performed] since i suck at threading so far, it won't actually run on another thread its up to the program to handle that) there is also a NoResultsFound event to in the case that there are no results. the delegate retrieves no input and returns no output.
Can you talk a little bit about why you made this change? I'm finding this eventing model much more cumbersome to program to than the previous synchronous model, and looking at the source it seems like it would be trivial to just return the results from the individual function calls. Is the eventing model a requirement coming from the changes to the ScraperManager broader change or was there some other reason for it?

In the meantime I'm going to try to add return types to the various GetResults and GetDetails calls and see what happens.


- Nicezia - 2009-08-09

just something i was trying out actually i have been playing with it myself after having had done it and find it more trouble than its worth

in the next few days i will be changing that, though i will have event modeling left in place for log reporting

already done this in my source but i don't have my code with me at the moment to upload

is there anyway to edit code via svn without having a compiler?


- smeehrrr - 2009-08-10

Nicezia Wrote:just something i was trying out actually i have been playing with it myself after having had done it and find it more trouble than its worth

in the next few days i will be changing that, though i will have event modeling left in place for log reporting

already done this in my source but i don't have my code with me at the moment to upload

is there anyway to edit code via svn without having a compiler?

I'd recommend leaving the eventing stuff in place throughout, just changing the function calls to return the result synchronously as well. If you ever decide to allow async calls, or if there's someone out there who can benefit from the existing event model, it's in there.

With respect to SVN, what do you normally use to do the updates? I've noticed that every time you update it, it appears that the entire code directory gets deleted and recreated, and I have to hand-merge any changes I had out.