• 1
  • 17
  • 18
  • 19(current)
  • 20
  • 21
  • 22
ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work...
A couple of observations on the latest rev of the code:

1) ScraperManager isn't as tolerant of malformed scrapers (several of which appear to be included with XBMC) as the previous version. Adding a catch on XmlException in ScraperManager() fixes that problem and skips the bogus scrapers.

2) The various Get*Details methods on ScraperManager actually modify the ScrapeResultsEntity passed in in such a way that calling the function twice with the same ScrapeResultsEntity leads to the second call failing, because instead of a single Url it now has a bunch. I'm not sure if this is correct behavior or not, but it was certainly unexpected. To get around this I added a Clone() method to ScrapeResultsEntity and have the Get*Details calls clone the input parameter before use, which works for me but may not be the intended usage. Are those calls supposed to return information in the resultsEntity parameter?
Reply
Another weirdness: Calling Get*Details without having done a prior Get*Results causes a crash, because the scrParser field isn't initialized. It's making me wonder why scrParser is a field at all, instead of a local variable, because I think you'll also get unpredictable results if you do something like this:

1) Get*Results from scraper A
2) Get*Results from scraper B
3) Get*Details with the results from A

It looks like you'll have an scrParser from scraper B cached in the scraper manager, when you really want A.

Can you talk a little bit about why scrParser isn't a local? I tried commenting it out to see the scope of the change that would be required and it's used in 68 places, so I don't want to go change it if there's some reason it's set up that way.
Reply
smeehrrr Wrote:A couple of observations on the latest rev of the code:

1) ScraperManager isn't as tolerant of malformed scrapers (several of which appear to be included with XBMC) as the previous version. Adding a catch on XmlException in ScraperManager() fixes that problem and skips the bogus scrapers.

2) The various Get*Details methods on ScraperManager actually modify the ScrapeResultsEntity passed in in such a way that calling the function twice with the same ScrapeResultsEntity leads to the second call failing, because instead of a single Url it now has a bunch. I'm not sure if this is correct behavior or not, but it was certainly unexpected. To get around this I added a Clone() method to ScrapeResultsEntity and have the Get*Details calls clone the input parameter before use, which works for me but may not be the intended usage. Are those calls supposed to return information in the resultsEntity parameter?


1) I use the latest svn scrapers in Each release, and it has no problem loading any of the scrapers (or not loading in the event of failure (the catch for this is in ScraperInfo - not scraperManager) , not quite understanding what you're talking about, If the xml is malformed the scraper manager won't load it. Proper formation of the xml is the responsibility of the scraper writer, not scraperxml.

2) never did this to me, let me know how you're calling it, and scraper results entity is supposed to be able to support multiple urls (because that's the way the XBMC code works.. multiple urls for a result are possible (example tv.com which uses three urls in the getdetails function)
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
Nicezia Wrote:1) I use the latest svn scrapers in Each release, and it has no problem loading any of the scrapers (or not loading in the event of failure (the catch for this is in ScraperInfo - not scraperManager) , not quite understanding what you're talking about, If the xml is malformed the scraper manager won't load it. Proper formation of the xml is the responsibility of the scraper writer, not scraperxml.

2) never did this to me, let me know how you're calling it, and scraper results entity is supposed to be able to support multiple urls (because that's the way the XBMC code works.. multiple urls for a result are possible (example tv.com which uses three urls in the getdetails function)

1) This may just be an artifact of a bad installation, but the XmlException doesn't get caught. In the ScraperManager constructor it enumerates all the .xml files in the directory I give it, then calls XDocument.Load on each one, and there's no catch between that and my code in the stuff that's in the SVN. My first bomb is on mtime.xml, but I also hit naver.xml and KinoPoisk.xml.

2) My code caches search results for different scrapers and lets me go back and do GetDetails on a UI-selected choice later. The search initially includes one Url. After the call to GetDetails, it contains 7 or so, most of which are URLs to thumbnails. If I turn around and pass the same object back, the GetDetails call fails. This happens with the IMDB scraper, I haven't tried it with any of the others yet.
Reply
smeehrrr Wrote:1) This may just be an artifact of a bad installation, but the XmlException doesn't get caught. In the ScraperManager constructor it enumerates all the .xml files in the directory I give it, then calls XDocument.Load on each one, and there's no catch between that and my code in the stuff that's in the SVN. My first bomb is on mtime.xml, but I also hit naver.xml and KinoPoisk.xml.

2) My code caches search results for different scrapers and lets me go back and do GetDetails on a UI-selected choice later. The search initially includes one Url. After the call to GetDetails, it contains 7 or so, most of which are URLs to thumbnails. If I turn around and pass the same object back, the GetDetails call fails. This happens with the IMDB scraper, I haven't tried it with any of the others yet.

1) Sorry, you were right that behaviour is in the version in svn currently, however its changed in the code that i'm updating to svn today.

2) the reason this happens in your code is because the ScrapeResultsEntity gets passed as a refrence, therefore any modifications made to the ScrapeResultsEntity, change the item passed to it, - its probably a good idea to do a clone when sending a cached result that way your cached results stays in tact, i do it this way to keep the amount of memory used by scraperxml down to the absolute minimum, it uses the scraperesults entity info directly - modifying it as the process continues (for each custom function). I am not sure why it would set the thumbnail links into the scraperesultsentity though, as it doesn't seem to do that in my code currently (or the version in svn) on my end. I would suggest tring this against he code I'll upload to svn today, and see if it still occurs.

Also, just a warning, i'm sure you haven't run against this yet, but cache does not clear when running a Multi-Search... in the case that some item may use a cached page from the search results for getdetails (I haven't seen this happen, but its possible, as some sites have a tendency to return direct hits given enough info to do so (AllMusic, JadedVideo, etc) so i leave items in cache in case someone wants to take advantage of this feature.
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
Nicezia Wrote:btw. if you're waiting on me for the includes update .. i'm good to go
already set up for injecting and everything, actually waiting on you before i upload the newest version. I want to run some test on include files before i commit and work the possibility of bugs out of my code

i was inspired - in svn as of r22098
Reply
ah!, inspiration is a wonderful thing
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
spiff Wrote:i was inspired - in svn as of r22098

interesting, my imdb include actually had some extra stuff in it (like the ability to get fields that some sites don't have, like Rating, and Top250, and stuff like that)

Note to everyone else: this may delay my update planned for today as i need to play with includes for a little while to check for possible errors.
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
i didn't chop it up more than what was currently needed
Reply
spiff Wrote:i didn't chop it up more than what was currently needed

not complaining at all, I'm just happy its done!!!

now i have something new to play with!
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
got that, you explaining why i didn't split it as much as one could do
Reply
heck honestly, my imdb include file was basically the whole get details (with added conditionals) and custom functions... and my actual imdb was a skeleton with the imdb include call. =)

well i do have a question, it seems that TMDB works in XBMC... but everytime i update it seems to have this one error (in GetSearchResults - output="<results>\1</result>")

how does XBMC seem to process this even though its malformed?
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
Just finished testing the include files and my code works flawlessly with them. I'm a little dissapointed that i got it right before it ever hit svn! I thought i'd have to do some error tracking but no such luck!
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
Nicezia Wrote:2) the reason this happens in your code is because the ScrapeResultsEntity gets passed as a refrence, therefore any modifications made to the ScrapeResultsEntity, change the item passed to it, - its probably a good idea to do a clone when sending a cached result that way your cached results stays in tact, i do it this way to keep the amount of memory used by scraperxml down to the absolute minimum, it uses the scraperesults entity info directly - modifying it as the process continues (for each custom function).

I understand the mechanics, I was asking about the API design. I never would have expected that call to modify the ScrapeResultsEntity that I passed in, that's really counter-intuitive and I can't think of anything in the .Net framework that has a similar side-effect without explicitly passing the parameter by reference. If you keep the existing behavior, I'd strongly recommend adding the 'ref' keyword to the ScrapeResultsEntity parameter on those Get*Details methods.

Ideally, though, you should just copy the ScrapeResultsEntity inside your library and operate on the copy. The memory savings you're getting by not doing that is inconsequential.
Reply
first of all the item used inside the function IS a copy of the item passed to it, (unless specified by the ref keyword - there's no way to modify the actual item passed as i'm not using any pointers at all and all code is purely .NET if you don't have to specify ref when passing the item then its not going to modify the origional item, - unless the scraperxml code has been modified in someway on your part) and shouldnt be modifying your cached copy. secondly there's absolutely nothing inside the code that should make it copy anything to your cached copy of the item, the ref'd item is only internal as it has to be passed to another internal function and from there another function - i don't really see how this could be causing problems in your code, and i can't tell anything having not seen the code you're using to pass it - granted i'm only an amateur program (very amateur) but i unless you've made some modifications to the scraperxml code that i don't know about it shouldn't really be doing what you say its doing

-message me on yahoo(niceziavincent) or gmail(niceziavincent) or icq (109693377)
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
  • 1
  • 17
  • 18
  • 19(current)
  • 20
  • 21
  • 22

Logout Mark Read Team Forum Stats Members Help
ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work...0