Kodi Community Forum
ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... (/showthread.php?tid=50055)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22


- spiff - 2009-07-01

you found my naughty Wink

the <thumbs> stuff is something i plan to change. it should just do multiple <thumb> tags. when i have the time and all that..


- Nicezia - 2009-07-02

exactley what is is you're going to change about it

well basically what i was hoping for was the ability to keep and switch between multiple fanart stored locally as well as well as multiple thumbnails... in this way i can keep all my images and info with my videos (which are on a 1 Terrabyte external portable harddrrive (and i'd like to be able to have all that info passed when i take it to my friends house or when i go out on vacation somewhere i don't have internet.)

I'm going to look over the source and see if its something i can edit myself to save you some time.. Wink and submit a patch


ScraperXML Awesome Release Tomorrow!! - Nicezia - 2009-07-05

heheh,

ok i'm really not one to toot my own horn....

but tomorrow's update of ScraperXML is gonna be the bomb, the penultimate achievement of my Programming thus far, (which isn't much, but hey... I'm a newbie)

Its got some kick-ass Error-handling, along with internal functions that make sure that if any results are returned, they are parseable by any XML handler.

There are two demo programs (windows based) one written in visual basic, and one written in C# (scraperXML's new native language). The framework is already in place in the library to make scraper-editors, with scraper debugging features. and cache and logging are fully supported.

the library contains no threading (i suggest that the programs implementing it run the library on a separate thread manually (until i delve deeper into threading and event handlers)

I know i've been inactive as far as updates, but i wanted to completely rewrite the entire library in c# and then make sure it worked as close to perfect as can be expected. I'd say my time has been well spent!


- xyber - 2009-07-05

Awesome. I was about to start with the scaping code for the media manager I'm working on but will wait and check your lib out first. Cool


- Nicezia - 2009-07-05

fekker Wrote:Sidenote: any chance of using a .net 2.0 target framework? (just an idea)

I'm not sure i know what the difference is between .Net 2.0 and .Net 3.5

it may sound odd, but i honestly don't.

All i know is that it works in mono.

if i can figure out what the difference between them is i'll make it completely .Net 2.0 but first i want to complete what i've got.

plus i've just figured out Events and threading so that's another thing i'll be adding post-today's-release


SVN Updated - Nicezia - 2009-07-06

Everything connected and working except TV Shows... I'm tired , and need to get some sleep, probably update again in the morning before heading off to work

stilll pretty proud of it even though its not completely finished, just nodding off at my computer (the TV scraper code is easy to understand anyway... and could be easily implemented (its only creating the TvScraper Object(by passing a scraperInfo object the cache folder path and a path to a log file - both the cachepath and the logfile can be nulled or string.Empty if you don't want to use cache or logging) then calling the 4 public methods as needed.

probably only about an hour's more worth of work for me to implement into the test programs, but i don't have an hour more steam in me.


- xyber - 2009-07-06

Had a quick look at your SVN update. Like what I see from the C# Test app Wink
Gonna start integrating it into my project now. Will let you know how it goes.


- smeehrrr - 2009-07-06

Nicezia Wrote:heheh,
but tomorrow's update of ScraperXML is gonna be the bomb, the penultimate achievement of my Programming thus far, (which isn't much, but hey... I'm a newbie)

What's the ultimate achievement, and how do you know that this one is the penultimate?

Joking aside, your code has saved me a ton of time on the project I'm currently working on, and thanks for it. I'm going to check out the C# update soon, and I should have some feedback for you after that.


- ultrabrutal - 2009-07-06

You need some more exception handling in there Smile This error is raised when I open the scraper I'm working on. The old version can open it just fine...

Code:
System.ArgumentNullException: Value cannot be null.
Parameter name: value
   at System.Xml.Linq.XElement.set_Value(String value)
   at TechNuts.ScraperXML.ScraperSetting.Serialize() in E:\Visual Studio 2008\Projects\TechNuts\TechNuts\ScraperXML\ScraperSettings.cs:line 163
   at TechNuts.ScraperXML.ScraperSettings.Serialize() in E:\Visual Studio 2008\Projects\TechNuts\TechNuts\ScraperXML\ScraperSettings.cs:line 24
   at TechNuts.ScraperXML.ScraperParser.GetSettings() in E:\Visual Studio 2008\Projects\TechNuts\TechNuts\ScraperXML\ScraperParser.cs:line 443
   at TechNuts.ScraperXML.ScraperInfo..ctor(String xmlPath) in E:\Visual Studio 2008\Projects\TechNuts\TechNuts\ScraperXML\ScraperInfo.cs:line 122
   at ScraperXML_Test_Program_CSharp.Form1.Form1_Load(Object sender, EventArgs e) in E:\Visual Studio 2008\Projects\TechNuts\ScraperXML Test Program CSharp\Form1.cs:line 46
   at System.Windows.Forms.Form.OnLoad(EventArgs e)
   at System.Windows.Forms.Control.CreateControl(Boolean fIgnoreVisible)
   at System.Windows.Forms.Control.CreateControl()
   at System.Windows.Forms.Control.WmShowWindow(Message& m)
   at System.Windows.Forms.Control.WndProc(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
   at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)



- xyber - 2009-07-06

I noticed that my app wants to connect to thetvdb.com as soon as it starts. I think it happens when my code call your lib to load the scrapers.
Basically this section from your example...
Code:
DirectoryInfo scrapersDir = new DirectoryInfo(scraperPath);
    foreach (FileInfo scraper in scrapersDir.GetFiles("*.xml"))
    {
        try
        {
            XDocument xScraperXML = XDocument.Load(scraper.FullName);
            if (xScraperXML.Root.Attribute("content").Value == "movies") movieSource.Add(new ScraperInfo(scraper.FullName));
            if (xScraperXML.Root.Attribute("content").Value == "tvshows") tvShowSource.Add(new ScraperInfo(scraper.FullName));
            if (xScraperXML.Root.Attribute("content").Value == "albums") albumSource.Add(new ScraperInfo(scraper.FullName));
            if (xScraperXML.Root.Attribute("content").Value == "musicvideos") musicVideoSource.Add(new ScraperInfo(scraper.FullName));
        } catch { errors = true; }
    }

Is there a reason for that?


- spiff - 2009-07-06

it's probably loading the scraper settings, which query tvdb for available languages


- xyber - 2009-07-06

Thanks. I'll just popup something incase it dalays too long. Living in SA gives me the "benefit" of seeing what users will experience under the worse internet connection conditions.

Guess that is why tvdb.xml was renamed to tvdb.xml.xox Tongue


- smeehrrr - 2009-07-06

Quick bug report here. On my build, boolean settings are not working properly. The problematic code is in the ScraperSetting constructor:

Code:
if (string.IsNullOrEmpty(xmlelement.Value) != true)
                {
                    if (_type == "bool")
                    {
                        _param = bool.Parse(xmlelement.Value).ToString();
                    }
                    else
                    {
                        {
                            _param = xmlelement.Value;
                        }
                    }
                }
                else
                {
                    _param = _default;
                }

The problem is that the bool.Parse.ToString returns an uppercased "True" and the scrapers apparently expect it to be lowercase. I added a .ToLower() at the end of that line and everything started working again.

I'll have lots more comments coming. Got my code ported over from the older version to the new one pretty easily modulo this settings problem.


- Nicezia - 2009-07-07

just add ToLower(); at the end of that bool.Parse(foo);

Code:
_param = bool.Parse(xmlelement.Value).ToString().ToLower();



- Nicezia - 2009-07-07

ultrabrutal Wrote:You need some more exception handling in there Smile This error is raised when I open the scraper I'm working on. The old version can open it just fine...

Code:
System.ArgumentNullException: Value cannot be null.
Parameter name: value
   at System.Xml.Linq.XElement.set_Value(String value)
   at TechNuts.ScraperXML.ScraperSetting.Serialize() in E:\Visual Studio 2008\Projects\TechNuts\TechNuts\ScraperXML\ScraperSettings.cs:line 163
   at TechNuts.ScraperXML.ScraperSettings.Serialize() in E:\Visual Studio 2008\Projects\TechNuts\TechNuts\ScraperXML\ScraperSettings.cs:line 24
   at TechNuts.ScraperXML.ScraperParser.GetSettings() in E:\Visual Studio 2008\Projects\TechNuts\TechNuts\ScraperXML\ScraperParser.cs:line 443
   at TechNuts.ScraperXML.ScraperInfo..ctor(String xmlPath) in E:\Visual Studio 2008\Projects\TechNuts\TechNuts\ScraperXML\ScraperInfo.cs:line 122
   at ScraperXML_Test_Program_CSharp.Form1.Form1_Load(Object sender, EventArgs e) in E:\Visual Studio 2008\Projects\TechNuts\ScraperXML Test Program CSharp\Form1.cs:line 46
   at System.Windows.Forms.Form.OnLoad(EventArgs e)
   at System.Windows.Forms.Control.CreateControl(Boolean fIgnoreVisible)
   at System.Windows.Forms.Control.CreateControl()
   at System.Windows.Forms.Control.WmShowWindow(Message& m)
   at System.Windows.Forms.Control.WndProc(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
   at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)


What part of your Code is crashing it? what exactley is your scraper doing when it has this problem?