IMDb scraper function in a DLL?
#1
Question 
Hi,
I plan to make an ASP .Net web site to manage my vidéo collection, and then create a scrapper to query the web site (In this way, all my xbmc boxes will have the same video metadata scrapped from my own application).

I have thinking to try to use the methods from IMDB.cpp and IMDB.h files. To do this, I have tried to create a win32 dll (which will be wrapped on .Net to use the methods) which will expose methods used by xbmc to scrap data (expose CIMDB::InternalFindMovie and CIMDB::InternalGetDetails)

The problem : I'm used to use win32 dll in .Net but I'm not a C++ expert (little skill, no more) and I have a lot of problem when I want to include IMDB.h file in the main code of my DLL.

Does anyone can help me ?
Reply
#2
Why not use the opensource scraper library already available?
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


Image
Reply
#3
Thanks for quick reply. I haven't see this lib. I've seen a project named "scrap" but seems to be dead. Where can I find this lib ?
[UPDATE] I found it thanks for the info !
Reply
#4
http://forum.xbmc.org/showthread.php?tid=50055
Reply
#5
l'd suggest waiting til this weekend when i upload the most recent version which handles things alot better, and has alot more tools at your disposal.
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
#6
Hi,
I've just started my dev so I can wait for this week end. I've test the current function and it works well for what I need. I will let you know how things are going.
Reply
#7
I have a little problem. I've tried to use the allocine.xml scraper from the svn code source with ScraperXML. It doesn't work. It seems that the scraper has bugs in the ScraperXML view (Space missing before "function" and another little one). But this scraper works well with XBMC. That's why I have tried to put InternalGetDetail function in a dll, because I think it is the best to re-use the XBMC code.

I continue for searching to put that in a dll.
Reply
#8
i really really suggest you rather figure out what's bugging in the scraperxml library. the code in xbmc is so entangled in internal stuff that i gave up separating it a while back..
Reply
#9
There are a few things that XBMC can do that scraperXML can't

for instance

XBMC can parse XML fragments, while ScraperXML needs a fully qualified XML element

for instance

XBMC can parse this

Code:
<url>http://foo.com/foo.html</url>
<url>http://foo.com/foo2.html</url>

while ScraperXML would need it to be like this

Code:
<sometag>
   <url>http://foo.com/foo.html</url>
   <url>http://foo.com/foo2.html</url>
</sometag>

ScraperXML requires well formed Xml elements (for compatibility with any program that may use a different XML Parser, this is something i don't feel should be changed as compatibility with other programs and other XML handlers is one of the main goals of the code.)

perhaps it would be a good idea to document what's about this scraper isn't working with ScraperXML let me know and i'll look into fixing the scraper so it works with both.

for instance, there being no space separating the function attribute and the previous attribute, is easily repairable by changing that in the scraper.

I've had several problems with other scrapers that have small errors like that that i have been fixing to get to work with both scraperXML and XBMC.
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
#10
I will try to make a dll, I know it's hard, but I will make a try this week end. I think it is possible. If I failed, I will use ScraperXML and be sure I will give feedback about it.
Reply
#11
Nicezia Wrote:ScraperXML requires well formed Xml elements (for compatibility with any program that may use a different XML Parser, this is something i don't feel should be changed as compatibility with other programs and other XML handlers is one of the main goals of the code.)
Even if XBMC's cuurent XML scraper can handle XML fragments maybe it would be a good idea non the less to update all scraper XML files in XBMC's SVN to have well formed Xml elements?
http://trac.xbmc.org/browser/branches/li...m/scrapers
http://trac.xbmc.org/browser/branches/li...pers/music
http://trac.xbmc.org/browser/branches/li...pers/video

Huh

I am sure that updated scraper files would be more than welcomed if submitted as patches tickets on trac:
http://trac.xbmc.org
(login using forum login, case sensative, and click submit new ticket, and add a new separate ticket for each scraper and each update).

At least then all of XBMC's scraper in the official SVN would be compatible with the ScraperXML library and other XML parsers out-of-the-box without any medications needed Wink
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#12
there is nothing malformed with having sibling elements
Reply
#13
Here is a little proof of concept. It's not clean and not very useful for now. It's just a proof of concept.

I've take the XBMC project under VC++, and change the output to DLL. Add XBMCDLL.cpp/.h files and a XBMCDLL.def

I have exported one function called "Test" with these declaration :

Code:
XBMCDLL_API void Test()
{
    // Set some need globals/env variables
    CProfile* profile = NULL;
    CStdString strExecutablePath = "E:\\xbmc-code";
    CStdString strTempPath = "E:\\temp";
    g_guiSettings.AddBool(0, "filelists.hideextensions", 1, true);
    SetEnvironmentVariable("XBMC_HOME", "E:\\xbmc-code");

    // Set Paths
    CSpecialProtocol::SetXBMCPath(strExecutablePath);
    CSpecialProtocol::SetTempPath(strTempPath);

    // Instanciate CIMDB with utils params
    CIMDB* XBMCScraper = new CIMDB();
    // For sample : use IMDB scrapper
    SScraperInfo info;
    info.strPath = "imdb.xml";
    info.strContent = "movies";
    info.strTitle = "IMDb";
    info.strLanguage = "en";
    info.strFramework = "1.0";
    info.settings = *(new CScraperSettings());
    CStdString strSettings = "<settings><setting id=\"GetThumbnail\" value=\"true\" /><setting id=\"info\" value=\"true\" /><setting id=\"actor\" value=\"falsetrue\" /><setting id=\"fanart\" value=\"true\" /><setting id=\"fullcredits\" value=\"false\" /><setting id=\"impawards\" value=\"true\" /><setting id=\"movieposterdb\" value=\"false\" /><setting id=\"trailer\" value=\"true\" /><setting id=\"imdbscale\" value=\"512\" /><setting id=\"url\" value=\"akas.imdb.com\" /></settings>";
    info.settings.LoadUserXML(strSettings);
    XBMCScraper->SetScraperInfo(info);

    // Call to FindMovie for "Fight Club"
    IMDB_MOVIELIST    movieList;
    CStdString strMovie = "Fight Club";
    XBMCScraper->FindMovie(strMovie, movieList);

    // Call to GetDetails with first result
    CVideoInfoTag infoTags;
    XBMCScraper->GetDetails(movieList[0], infoTags);

}

Then I have created a small C# .Net form with a button that call the method. And in debug mode it works. I have the CVideoInfoTag filled.

I have to work around this. But it seems doable. I know, I have to clean things make it useable (the test just search for "fight club" and take the first result), and I have to throw away unneeded code and marshal results to .Net but it works.
Reply
#14
spiff Wrote:there is nothing malformed with having sibling elements

well i wasn't actually saying it was malformed, that was a bad choice of words... some parsers can't parse a group of Xelements without them being under a root element.

I'm returning pure strings, so as to allow programs to parse with whatever they wish, every parser i know of can take parse strings if there is a root element to them, but not every one of them can parse a group of sibling elements.

Trying to be compatible across the board. knowing that not everyone uses the same parser (myself for example i use Linq most of the time (since my intro to XML was with linq, but again as I'm quite new to programming i'm learning millions of ways to do the same task.)



If this presents a problem let me know.
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply
#15
Gamester17 Wrote:Even if XBMC's cuurent XML scraper can handle XML fragments maybe it would be a good idea non the less to update all scraper XML files in XBMC's SVN to have well formed Xml elements?
http://trac.xbmc.org/browser/branches/li...m/scrapers
http://trac.xbmc.org/browser/branches/li...pers/music
http://trac.xbmc.org/browser/branches/li...pers/video

Huh

I am sure that updated scraper files would be more than welcomed if submitted as patches tickets on trac:
http://trac.xbmc.org
(login using forum login, case sensative, and click submit new ticket, and add a new separate ticket for each scraper and each update).

At least then all of XBMC's scraper in the official SVN would be compatible with the ScraperXML library and other XML parsers out-of-the-box without any medications needed Wink


well there aren't many that present that problem, there are two i think that have the ability to create multiple <url> elements in CreateSearchUrl, and i think one that doesn't return a full element during GetDetails, but other than that there aren't really any others that present a problem. (so far i've been handling that by ignoring them as the next function to run puts them into a <details> tag with the rest of the thumbs.)
ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

Image
Reply

Logout Mark Read Team Forum Stats Members Help
IMDb scraper function in a DLL?0