ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work...

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
ssimon Offline
Junior Member
Posts: 26
Joined: Dec 2009
Reputation: 0
Post: #301
I see, OK, then would it be possible to take the output of the scraper and load it somehow into XBMC or am I far off track here?
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #302
ssimon Wrote:I see, OK, then would it be possible to take the output of the scraper and load it somehow into XBMC or am I far off track here?

not far off track, but you'd have to have a program implementing scraperxml and save the results to an ".nfo" in the folder with the video

so far there is only one program that i know if that does that (I'm working on one of my own but that's far off in coming public) and that XMM (look it up in Supplemental tools in the forum)

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
find quote
ssimon Offline
Junior Member
Posts: 26
Joined: Dec 2009
Reputation: 0
Post: #303
My thanks, I will see if I can figure out how to import the data.
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #304
why on earth do you think that running the scrapers through another interpreter (not the original one i might add) would make the slightest difference? if scraperxml does its job correctly, the output would be exactly the same as in xbmc.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #305
didn't make sense to me either

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
find quote
smeehrrr Offline
Junior Member
Posts: 48
Joined: Jun 2009
Reputation: 0
Post: #306
Is the current code in SVN supposed to be functional? I'm noticing that when scraping movies the only fields that get set are the ones explicitly set in MovieTag.Deserialize, which doesn't include basic things like Title. The changes I made for the VideoTag class look like they've been removed and replaced with something that I don't understand. How is title supposed to get set in the current code?
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #307
smeehrrr Wrote:Is the current code in SVN supposed to be functional? I'm noticing that when scraping movies the only fields that get set are the ones explicitly set in MovieTag.Deserialize, which doesn't include basic things like Title. The changes I made for the VideoTag class look like they've been removed and replaced with something that I don't understand. How is title supposed to get set in the current code?

slight oversight on my part will upload the fix in a few hours. (i uploaded the wrong source to the svn.).
I replaced the VideoTag Class with a IVideoTag interface instead. I didn't see any benifit to sctually having a class singling out videotag . following the heirechy got to be a pain in the ass. when half the class existed in another class. So instead i created the MediaTag with all the types of Items that exist in all of the MediaTag types, (utilizing two dictionaries to hold any values that individual types might have). this way even if you are accessing the object as a media tag, you can pull any existing values from its dictionary (whether it be a string value-from UserProperties, or a string list-from StringLists). Which is how i'm setting up for a Genreric scraper that will retrieve any sort of info, without having to have properties predefined.

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2010-01-06 06:16 by Nicezia.)
find quote
smeehrrr Offline
Junior Member
Posts: 48
Joined: Jun 2009
Reputation: 0
Post: #308
Part of the point of the refactoring I did was to eliminate duplicated code, which you've discarded when you switched VideoTag from a class to an interface. If you look at the serialization methods for MovieTag and TvShowTag now you'll see big chunks of identical code, that previously I had pulled out into VideoTag. In addition to making your code unnecessarily larger, you've also doubled the possible locations for bugs and made sure that you'll likely have to fix bugs or make feature changes in more than one place.

Moving all the properties up into a dictionary in the base class is an interesting idea, but I'm not sure how much use that will be in practice. If you're going to go that direction, why not make the serialization code fully generic as well? You could end up with a place where all the media tags are in essence just name mapping objects, which would actually be pretty cool.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #309
because the necessary fields for some media are not completely generic, this keeps a standard setup for XBMC Media tags (for property binding for those who wish to use it) while leaving open the ability to use the code for other purposes as well, one for instance is that i use the generic tag to scrape RSS info from blogs that don't provide RSS info, and also to scrape headlines from my local newspaper site, in this way anyone can derive from a generic media tag to create their own mediatag, and still have a way to access it via MediaTag , also sure there is duplicate code, but for now the duplicate code is easier to manage while i am debugging, instead of running down base.base.base.Whatever tree to find values while debugging, which was one of the reasons, i left some code out of deserialize while managing, i may at one point go back to that, but only after i'm sure i don't have to modify mediatags anymore, however, at this stage i'm expanding MediaTags, as MediaManager software is not the only aim of the project, however, if you'd like to branch from a previous version , that is why i restructured the svn, so that you're welcome to do so.

I contemplated making the entire thing generic, but couldn't figure out a practical way to do so. Plus i want serialization ordered a certain way for each class, in deserialization it doesn't matter what order it deserializes, (and to be honest it truly doesn't matter what order items are added to the XML, but i think its easier for those who will still open and edit some fields in the xml if its ordered in a certain way (or maybe that's just my personal opinion), and using a base.Serialize(), pops out all the XElements create by the base, in the order base.Serialize() designates, and then everything has to be serialized around this... either before or after, making ordering of Xml Elements a task. its easier for me to be sure i've handled everything when its all balled up together and not off to the side...


I'm not at all saying the way you had mediatag was not ideal, i'm just saying that my perspective (which is inexperienced, and i have a tendency to make more mistakes the further away from an object properties & methods get extended from the origional code, because my mind works better on a spherical pattern than on a long extending linear one) - i loved what you did with it, just had problems working with it myself. Besides the code refactoring undone only adds 8kb to the code, and barely any difference after compilation.

I'm not sure you can understand what i'm saying, but hell... refactoring its something i do when i'm completely happy with my end result. Because otherwise i just confuse myself.

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2010-01-06 10:54 by Nicezia.)
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #310
added a new property to the scrapers, <scraper name=".." ..cachePersistence="hh:mm">.

this is how long we should keep cached files around before deleting them. i guess it's mostly useful for the editor in your case.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #311
thanks

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
find quote
Vincent81 Offline
Fan
Posts: 382
Joined: Aug 2009
Reputation: 3
Location: France
Post: #312
Hello,
Would it be possible to use ScraperXML with command line or DLL in my program (XBNE : is not developed in C) ?
And How to..

Thanks

XBNE : XBMC Video DataBase / Nfo Editor
Download - Forum - Donate
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #313
i could throw together a comandline inteface for it easily, that returns an xml formatted string.

however the end user would still need .Net 3.5 or (mono 2.0 or greater for linux) to use it.

I've been using your project for quite a while to update my library myself, and been wondering about whether you'd be interested in using ScraperXML.

I'd be happy to work along side you to get it working together with your program.
if you PM me we can discuss a way to make it work smoothly through DLL...

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
(This post was last modified: 2010-02-09 07:09 by Nicezia.)
find quote
tehashix Offline
Junior Member
Posts: 2
Joined: Apr 2010
Reputation: 0
Location: Lorraine, France
Post: #314
Hello Nicezia,

First I would like to thank you a lot for the great work you did with ScraperXML. It's an amazing library Big Grin

I'm trying to use your library in my little software "Dune Explorer" (HDI Dune player customization tool). I started with the sources on Sourceforge to correct a little issue depending on localization :
- When you convert values with the ToDouble function, it depends on windows regional settings and sometimes the dot (".") is not the decimal separator, so you got an error. Here is an example of the correction I've applied on MediaTag.cs (I set a NumberFormatInfo to force the dot (".") as decimal separator)
Code:
internal double ProcessRating(XElement ratingElement)
        {
            if (ratingElement != null)
            {
                NumberFormatInfo provider = new NumberFormatInfo();

                provider.NumberDecimalSeparator = ".";
                provider.NumberGroupSeparator = ",";
                provider.NumberGroupSizes = new int[] { 3 };


                if (ratingElement.Attribute("max") != null)
                {
                    double scale = 10.0 / Convert.ToDouble(ratingElement.Attribute("max").Value,provider);
                    return scale * Convert.ToDouble(ratingElement.Value,provider);
                }
                else
                {
                    return Convert.ToDouble(ratingElement.Value,provider);
                }
            }

            return 0.0;
        }

It works great with Movies but I've got some questions about how the TV Shows work. For TV Shows I use the same functions as for Movies :
  • ScraperManager.GetResults to search for the tv show (with the tvshow MediaType and tvdb scraper)
  • ScraperManager.GetDetails to get Details from one ScrapeResultEntity

The GetDetails function returns a TVShowTag which is almost empty, there is only an EpisodeGuide available in the object, nothing else, no actors, no directors, no Episodes, no Fanart, no Thumbnail, etc...

So, my first question, how can I get theses informations in a TVShowTag object ? (I'm a little confused with TV Shows :sadSmile

My second question is about ScraperSettings, I would like to allow the user to configure scraper settings once (and not every time the application is launched). So, is there a way to save the ScrapperSettings and load them on an application startup ?

Again, thank you a lot for your work.

Regards,

TeHashiX
find quote
patmtp35 Offline
Junior Member
Posts: 1
Joined: May 2010
Reputation: 0
Post: #315
hi!

just discover your scraper commandline, it works great on imdb.com, but as i m french i would have to use it with ciné-passion, but i can't have it to works please could you help me ?

regards
find quote
Post Reply