Kodi Community Forum
ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... (/showthread.php?tid=50055)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22


- Nicezia - 2009-07-07

xyber Wrote:Thanks. I'll just popup something incase it dalays too long. Living in SA gives me the "benefit" of seeing what users will experience under the worse internet connection conditions.

Guess that is why tvdb.xml was renamed to tvdb.xml.xox Tongue

Yeah cause i do most of my coding and testing without intenet access. so i renamed it to keep it from trying to connect to the net.


- ultrabrutal - 2009-07-07

Nicezia Wrote:What part of your Code is crashing it? what exactley is your scraper doing when it has this problem?

This is when opening your application when my scraper is in the folder. Smile I never get to press any buttons myself Wink

Now with the old version I can only it just fine and I can test the functions - except GetDetails which fails if I use nested regex (posters and fanart with more than one regex). See my other thread about this problem:
http://forum.xbmc.org/showthread.php?tid=50452


- Nicezia - 2009-07-07

ScraperXML shouldn't have any problems with nested RegExp

its really hard for me to find what's causing a problem when i don't have a copy of the object that the problem is with. Which is why i suggest using scraperXMLEditor, as its based on the same code, it will aid you in making a scraper fully compatible with both XBMC AND scraperXML.

however without knowing what your scraper looks like what it does or what it is about it that ScraperXML doesn't like, there's not a thing i can do to help... you're just not supplying enough info

one thing i can see though is that the problem is happening when trying to retrieve settings from the scraper... There is no exception handling in the ScraperParser when getting settings. It expects the Scraper to conform. Only thing that has been changed in settings is the option="hidden" added. Other than that the code in cSharp is a direct reflection of the old VB code.


At least let me see your "GetSettings" function so i can see what the prob is.

also have you loaded this in XBMC does it show your settings ok?


- ultrabrutal - 2009-07-07

Well all of the scraper seems to work in XBMC. I'm having some problems with fanart in GetDetails, but that's it (debugging XBMC right now to find cause). Do you messenger? Might be easier to talk there?


- Nicezia - 2009-07-07

ultrabrutal Wrote:Well all of the scraper seems to work in XBMC. I'm having some problems with fanart in GetDetails, but that's it (debugging XBMC right now to find cause). Do you messenger? Might be easier to talk there?

yahoo niceziavincent
googletalk [email protected]


- ultrabrutal - 2009-07-07

I've added: [email protected] in messenger. Should work I think? What about irc?


- Nicezia - 2009-07-07

Nicezia
on freenode.net


- Nicezia - 2009-07-07

For public knowledge the problem was that there was no default on the text values...

all settings MUST have a default setting (for initial value when adjusting settings) if you don't want anything set initially set

Code:
default=""

this will be fixed in the next version so that default is empty if not set


- ultrabrutal - 2009-07-08

Hehe, I didn't want to supply my account info with the scraper Wink


- xyber - 2009-07-08

Nicezia, I've added you on gtalk. Request from plyoung will be me.

I found some IO errors which seems to be comming from the logging stuff, but I'm still investigating this. Seems like LogFileInfo.Create(); can keep FileStream LogFile = new FileStream(LogFilePath, FileMode.Append); from opening the file in some cases.

[edit] Did moretesting and here is what I found.
The error message is "The process cannot access the file 'C:\Work\xMM\xMM\bin\Debug\test.log' because it is being used by another process." which happens on FileStream LogFile = new FileStream(testfile, FileMode.Append);

From this it might happend that this exception will be thrown when you try to write to the log after a call to VerifyLogFile(...)

Code:
private void test1()
{
    string testfile = Application.StartupPath + Path.DirectorySeparatorChar.ToString() + "test.log";
    //try
    {
        FileInfo LogFileInfo = new FileInfo(testfile);
        LogFileInfo.Create();
    }
    //catch { }
}

private void test2()
{
    string testfile = Application.StartupPath + Path.DirectorySeparatorChar.ToString() + "test.log";
    //try
    {
        FileStream LogFile = new FileStream(testfile, FileMode.Append);
        StreamWriter sw = new StreamWriter(LogFile);
        sw.WriteLine("{0} : {1}", DateTime.Now.ToString("yyyy/MM/dd - HH:mm:ss"), "test");
        sw.Close();
        LogFile.Close();
    }
    //catch { }
}

private void MainForm_Load(object sender, EventArgs e)
{
    test1();
    test2();
...

[edit2] Rather than using
Code:
FileInfo LogFileInfo = new FileInfo(testfile);
LogFileInfo.Create();

to create the file, use this. Seems to work. I'd still add the try catch block just incase.
Code:
FileStream f = File.Create(testfile);
f.Close();

But I don't think this code is even needed since FileStream LogFile = new FileStream(testfile, FileMode.Append); seems to create the file if it does not exist.


Some more feedback - smeehrrr - 2009-07-08

OK, I've been working with the new version for a while now and have some initial feedback. First of all, thanks again for doing this work, it's saved me a ton of time. Please take this as constructive criticism, virtually everything on this list is a "nice to have" and not a bug, and the bugs that are there are easily fixed or worked around.

1) "Paramater" is misspelled in ScraperSetting, it should be "Parameter".
2) It would be nice for ScraperSettings to have a public default constructor and make the Deserialize method public. I'm not sure why I believe this is true anymore, but I wrote it down so at some point I thought it was interesting. This may just be an artifact of how my application worked with settings in the prior version.
3) Deserialize calls in general could use an overload with takes a string, rather than always having to construct an XElement. This would allow your callers to not have to be Linq aware.
4) It would be nice if I could specify a null as the path to the cache directory on the various scraper constructors, and you'd use the Temp directory by default. Easy to work around, but would be nice.
5) There's a small bug in the video scraper (may be in the others, I haven't checked) where if you make a GetDetails call without doing a prior CreateSearch, you get an exception due to WebpageDownload being uninitialized.
6) Polymorphism: This library could really use some. There are lots of common elements between the various types of scrapers and the various types of media tags, and it would be very useful (for me, anyway) if those could be pulled out into base classes so I could work with the data in a more general way. In my particular case, I'm working with a large library of movies and TV shows, and I'd like to avoid special-case code wherever possible. Currently I wrote a wrapper object to abstract out the common elements, but this is really something that should be in the base design, I think.

Have you done any performance testing on this version versus the previous? Qualititatively the C# version seems faster, but I wonder if that's just my perception.


- smeehrrr - 2009-07-08

Disregard comment #2, I went back and revisited my own code and this is indeed just an artifact of me being lazy. It works fine the way you have it.


- smeehrrr - 2009-07-09

Hm, are TV show scrapers supposed to be working at all? The only one that seems to work properly for me is IMDB.


- Nicezia - 2009-07-09

well i'm having to tweak it, apparently my implementation for the Zip files isn't working properly, because of the way .Net handles the stream... HttpWebResponse.GetResponseStream() is not seekable (which leads to the tvdb failure as my function tries to read directly from the stream and it being non-seekable makes that impossible) so i'm working on a fix


- Nicezia - 2009-07-09

xyber Wrote:Nicezia, I've added you on gtalk. Request from plyoung will be me.

I found some IO errors which seems to be comming from the logging stuff, but I'm still investigating this. Seems like LogFileInfo.Create(); can keep FileStream LogFile = new FileStream(LogFilePath, FileMode.Append); from opening the file in some cases.

[edit] Did moretesting and here is what I found.
The error message is "The process cannot access the file 'C:\Work\xMM\xMM\bin\Debug\test.log' because it is being used by another process." which happens on FileStream LogFile = new FileStream(testfile, FileMode.Append);

From this it might happend that this exception will be thrown when you try to write to the log after a call to VerifyLogFile(...)

Code:
private void test1()
{
    string testfile = Application.StartupPath + Path.DirectorySeparatorChar.ToString() + "test.log";
    //try
    {
        FileInfo LogFileInfo = new FileInfo(testfile);
        LogFileInfo.Create();
    }
    //catch { }
}

private void test2()
{
    string testfile = Application.StartupPath + Path.DirectorySeparatorChar.ToString() + "test.log";
    //try
    {
        FileStream LogFile = new FileStream(testfile, FileMode.Append);
        StreamWriter sw = new StreamWriter(LogFile);
        sw.WriteLine("{0} : {1}", DateTime.Now.ToString("yyyy/MM/dd - HH:mm:ss"), "test");
        sw.Close();
        LogFile.Close();
    }
    //catch { }
}

private void MainForm_Load(object sender, EventArgs e)
{
    test1();
    test2();
...

[edit2] Rather than using
Code:
FileInfo LogFileInfo = new FileInfo(testfile);
LogFileInfo.Create();

to create the file, use this. Seems to work. I'd still add the try catch block just incase.
Code:
FileStream f = File.Create(testfile);
f.Close();

But I don't think this code is even needed since FileStream LogFile = new FileStream(testfile, FileMode.Append); seems to create the file if it does not exist.

VerifyLogfile will create the logfile if it doesn't exist, no need to call a create on the file. Calling a create seems to leave the file open if not specifically closed. If you want to create the file yourself, be sure to close the file so its available for open by other processes

or maybe i don't understand exactley what it is you're saying

I'm working on learning more about events so that programs can use thier own logging functions and just retrieve log info from the scraper.