Kodi Community Forum
ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... (/showthread.php?tid=50055)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22


- smeehrrr - 2009-07-18

And I am done. The problem isn't in SharpZipLib, it's in ScraperXML.

The HttpRetrieve.GetPage function isn't doing very inconsistent disposing of objects, and it's causing memory to not be reclaimed and is making the web client timeout. Every HttpWebResponse, FileStream, Stream, WebClient, MemoryStream, ZipFile, or StreamReader in that function should be slapped into a using() statement. Do that, and the timeouts go away and you can see the memory usage stabilize (it still oscillates, but it goes down from time to time which it didn't do before).

Is there someplace I can send a patch file?


- Nicezia - 2009-07-18

smeehrrr Wrote:And I am done. The problem isn't in SharpZipLib, it's in ScraperXML.

The HttpRetrieve.GetPage function isn't doing very inconsistent disposing of objects, and it's causing memory to not be reclaimed and is making the web client timeout. Every HttpWebResponse, FileStream, Stream, WebClient, MemoryStream, ZipFile, or StreamReader in that function should be slapped into a using() statement. Do that, and the timeouts go away and you can see the memory usage stabilize (it still oscillates, but it goes down from time to time which it didn't do before).

Is there someplace I can send a patch file?

email for now.

[email protected]


- Nicezia - 2009-07-18

smeehrrr Wrote:OK, System.IO.Packaging is just made of FAIL. It can't open the zip files returned from tvdb because they don't contain a file called Content_Types.

I guess I'll go get the source to SharpZipLib and see if I can fix the problem there.

Yeah i tried system.io.packaging... doesn't work for shite so that's why i went with sharpziplib but in anycase i'm trying to switch everything over to Webclient as far as handling webpages, i think most of the problem in the web function is that i have to switch to webclient in order to get a seekable stream from the net (GetResponseStream is not seekable - and therefore i can't read the zip file directly from the net) I'm sure i left something open when switching over to webclient (i think httprequest is still left open).

its kinda sad because the both .NET methods are sorely lacking in flexibilityin one place or another (or too complicated for me to deal with with my novice programming skills). I would switch to Curl but i haven't found a .Net implementation of it, and i'm not big on wrapping code.


- Nicezia - 2009-07-18

I just realized that if i install the C++ help files in my Visual studio, it's a ton of help to look up all the c++ functions i didn't understand before.

So now there's new update coming soon consisting of almost completely emulated functions of XBMC (encoding is the only thing still yet to be compensated for).


- Nicezia - 2009-07-18

xyber Wrote:ooh.. sounds like a bad one. Guess it wil happen in my app too then :/ Will be scanning my whole TV folder through the weekend. Think I'll run in debug and see what happens.

its amazing seeing my library at work to quote Cereal Killer (Hackers 1995) "Kinda feel like god!"

Great program xyber, and great implementation. Alot of changes coming up internally, but only a few will affect implementation (the one's i mentioned in PM)


- Nicezia - 2009-07-19

Ok http errors hopefully fixed thanks to patch smeehrrr plus i took a few extra steps to ensure memory from Web protocol is fully reclaimed. seems to run alot faster and alot of memory is reclaimed after the function ends.

Will upload later today.


- xyber - 2009-07-20

Nicezia Wrote:its amazing seeing my library at work to quote Cereal Killer (Hackers 1995) "Kinda feel like god!"

That was an awesome movie, still listening to its soundtrack when coding Nerd

I'll have a look at your updates in a week or so and make changes to my app where needed to get the benefit of these bug fixes in there.


- smeehrrr - 2009-07-24

Just updated to the latest in SVN and I'm very confused by some of the changes. Can you explain the changes you made to the various MediaTags? It looks like MovieTag no longer has obvious properties like Title and Plot, which has broken my code in many places.


- smeehrrr - 2009-07-24

More confusion: Why is the Backdrops property on MovieTag a Fanart, but the Backdrops property on TvShowTag is a List<Fanart>?


- smeehrrr - 2009-07-24

smeehrrr Wrote:Just updated to the latest in SVN and I'm very confused by some of the changes. Can you explain the changes you made to the various MediaTags? It looks like MovieTag no longer has obvious properties like Title and Plot, which has broken my code in many places.
The fix for this is to mark all those properties as public. Looks like just an oversight. I'm surprised the test program didn't catch that, though.

The fanart change to be Fanart instead of List<Fanart> is still unclear to me. It's not obvious from the XBMC documentation whether you can have more than one Fanart entry coming back from a scraper, and I haven't looked into the source code to figure it out.


- Nicezia - 2009-07-24

smeehrrr Wrote:Just updated to the latest in SVN and I'm very confused by some of the changes. Can you explain the changes you made to the various MediaTags? It looks like MovieTag no longer has obvious properties like Title and Plot, which has broken my code in many places.

a little oversight as i'm doing my best to limit memory overhead, should be fixed in the vertsion i'm about to upload, and there are alot more changes in the code as well, trying making things more streamlined... and use less memory

as far as the fanart thing i used to have it so it was possible to have multiple fanart sources, but i haven't seen any scrapers that use that so i changed ALL (in the current version) media tags only to one source.

and as far as the test program goes i don't actually use it myself, i use the object test bench in Visual studio to test my library, the test program is simply a demonstation


- Nicezia - 2009-07-24

spiff what is the default for input?

it seems a litttle annoying that people keep leaving it off their scrapers as i understood it input could be a string or a buffer reference or an setting replacement.... so if its absolutely nothing? i think if you want input you should have to designate it... but that's just me...

(well actually its only the tvdb that i have this problem with) and of course i can account for it, but its a little annoying because i have to readjust the parser to deal with the scrapers individually (seeing as how the internal flow of XBMC scraper process is still a little confusing to me i have to base my code of what the scrapers do) so now i have like 3 different statements on how to handle the input field, what i'm really wondering is what if someone wants to use blank input (no input value) and buffer $$1 has something in it? does that person have to specify (input="") would that account for a blank input? or would that pull from $$1??


Check SVN - Nicezia - 2009-07-25

Allright I've updated SVN. think i've got just about all the quirks out of the Media Tags. Updated The Test Program to reflect changes. The tv show setup is quite messy, but I've been up all night and it was the last thing i was working on (IMDB TV scraper still doesn't work for episode list, not sure why - but i haven't actually investigated the scraper itself or checked for any updates.)


Image

Image

Image

All i have left to do as far as i see it is investigate the IMDB TV scraper, and finish up music videos which is like a 20 minute - an hour job and this thing is complete.

I'm sure there's bugs as i haven't got a chance to FULLY delve into it... you know the drill let me know and i'll get right on it.

btw: the Tv.Com scraper returns junk, in both ScraperXML and XBMC, i'm sure its broken. Mind you i'm saying this before checking for updates in svn (thinking about writing a Svn-scraper-check into the code, so that you know when a scraper has been updated)

footnote: Now accepting donations (so hopefully i can get back online)


Due to upcoming changes in XBMC scraper code: - Nicezia - 2009-07-29

going to have to make a major change in the scraper code, so that all scrapers will be managed by scraperxml, the new manner will be that you load a ScraperManager object before calling for a scrape, call this item with the folder that holds scrapers, the path to folder to use for cache. and the Logfile path. This item will hold multiple persistant List<ScraperInfo>(so instead of creating the lists yourself you can just reference them from the ScraperManager) the scrapermanager however doesn't have to be persistant, as when its called it scans the folder for the available scrapers.

it'll be a week or so before this change is made in my code however if the new XBMC code changes before then, i would suggest sticking to the current scrapers that are in the scraperxml svn until those changes are reflected in my code.


- smeehrrr - 2009-07-30

Can you talk a bit about the justification for this change? What's broken that this will fix?