ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Printable Version
+- XBMC Community Forum (http://forum.xbmc.org)
+-- Forum: Development (/forumdisplay.php?fid=32)
+--- Forum: Scraper Development (/forumdisplay.php?fid=60)
+--- Thread: ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... (/showthread.php?tid=50055)
Some more feedback - smeehrrr - 2009-07-08 20:11
OK, I've been working with the new version for a while now and have some initial feedback. First of all, thanks again for doing this work, it's saved me a ton of time. Please take this as constructive criticism, virtually everything on this list is a "nice to have" and not a bug, and the bugs that are there are easily fixed or worked around.
1) "Paramater" is misspelled in ScraperSetting, it should be "Parameter".
2) It would be nice for ScraperSettings to have a public default constructor and make the Deserialize method public. I'm not sure why I believe this is true anymore, but I wrote it down so at some point I thought it was interesting. This may just be an artifact of how my application worked with settings in the prior version.
3) Deserialize calls in general could use an overload with takes a string, rather than always having to construct an XElement. This would allow your callers to not have to be Linq aware.
4) It would be nice if I could specify a null as the path to the cache directory on the various scraper constructors, and you'd use the Temp directory by default. Easy to work around, but would be nice.
5) There's a small bug in the video scraper (may be in the others, I haven't checked) where if you make a GetDetails call without doing a prior CreateSearch, you get an exception due to WebpageDownload being uninitialized.
6) Polymorphism: This library could really use some. There are lots of common elements between the various types of scrapers and the various types of media tags, and it would be very useful (for me, anyway) if those could be pulled out into base classes so I could work with the data in a more general way. In my particular case, I'm working with a large library of movies and TV shows, and I'd like to avoid special-case code wherever possible. Currently I wrote a wrapper object to abstract out the common elements, but this is really something that should be in the base design, I think.
Have you done any performance testing on this version versus the previous? Qualititatively the C# version seems faster, but I wonder if that's just my perception.
- smeehrrr - 2009-07-08 20:22
Disregard comment #2, I went back and revisited my own code and this is indeed just an artifact of me being lazy. It works fine the way you have it.
- smeehrrr - 2009-07-09 07:53
Hm, are TV show scrapers supposed to be working at all? The only one that seems to work properly for me is IMDB.
- Nicezia - 2009-07-09 16:42
well i'm having to tweak it, apparently my implementation for the Zip files isn't working properly, because of the way .Net handles the stream... HttpWebResponse.GetResponseStream() is not seekable (which leads to the tvdb failure as my function tries to read directly from the stream and it being non-seekable makes that impossible) so i'm working on a fix
- Nicezia - 2009-07-09 17:25
xyber Wrote:Nicezia, I've added you on gtalk. Request from plyoung will be me.
VerifyLogfile will create the logfile if it doesn't exist, no need to call a create on the file. Calling a create seems to leave the file open if not specifically closed. If you want to create the file yourself, be sure to close the file so its available for open by other processes
or maybe i don't understand exactley what it is you're saying
I'm working on learning more about events so that programs can use thier own logging functions and just retrieve log info from the scraper.
- xyber - 2009-07-09 18:03
Was just saying that your verify function might cause problems for your log function. But I did not check how you use it internally. I don't make calls to it so does not matter to me.
Great lib btw Saved me a ton of work from so far. I'll announce the media manage I am working on soon. Still deciding if I want to first complete the TV eps section.
- Nicezia - 2009-07-09 22:24
thanks It'll be nice to se my work implemented in something other than my test programs
I have the Tv Show stuff working now, but i need a little input from spiff to understand something about the GetEpisodeList Function as i don't think i understand it as well as i thought
@spiff are there values passed to buffers during GetEpisodeList ?
I'm guessing its the same as GetEpisodeDetails, because i see it (tvdb scraper) looking for a value for cache but i want to be sure....
- xyber - 2009-07-10 16:28
I notice Application error (Rails) for http://www.themoviedb.org/movie today
and that causes ((VideoScraper)scraper).GetDetails(...); to fail when I'm using IMDB with Fanart selected.
For a user this would seem like the scraper failed from getting info from IMDB while it was really just the fanart he needed to turn off. I wonder if there is a way you could allow us to query your lib for more info on errors occuring inside your lib. So if I ask your lib why it did not return data from GetDetails I could see its a fail on retrieving a list of fanart and then at least prompt the user to turn it off in settings, or do it in the code and run the query again. .. or we can just hope that kinda problem with themoviedb don't happen too often
- Nicezia - 2009-07-10 19:55
Will edit it to log scraper return values
However what exactley is the error/failure, i haven't seen this error and can't fix it if i don't know the full details of it. it retrives fanart just fine for me.
- smeehrrr - 2009-07-10 21:12
I had problems with fanart last night, the server was returning HTTP 500 errors. I don't believe it had anything to do with your library - it would have to either be a server side problem or a problem with the scraper itself. And it only happened on some titles. I haven't tried yet today to see if the same thing is happening.