ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work...

ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... (/showthread.php?tid=50055)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

- smeehrrr - 2009-08-11

Nicezia Wrote:first of all the item used inside the function IS a copy of the item passed to it, (unless specified by the ref keyword - there's no way to modify the actual item passed as i'm not using any pointers at all and all code is purely .NET if you don't have to specify ref when passing the item then its not going to modify the origional item, - unless the scraperxml code has been modified in someway on your part) and shouldnt be modifying your cached copy. secondly there's absolutely nothing inside the code that should make it copy anything to your cached copy of the item, the ref'd item is only internal as it has to be passed to another internal function - i don't really see how this could be causing problems in your code, and i can't tell anything having not seen the code you're using to pass it - granted i'm only an amateur program (very amateur) but i unless you've made some modifications to the scraperxml code that i don't know about it shouldn't really be doing what you say its doing

ScraperResultsItem is a reference type, not a value type. So if you pass it to a function and then modify one of its fields, the field remains modified when you return from the function. Check http://msdn.microsoft.com/en-us/library/0f66670z(VS.71).aspx#vclrfpassingmethodparameters_example4 for a good example of this.

You're right, though, in that adding the ref keyword won't really address this problem. The only way to fix it is to explicitly copy the object instead of modifying the one that's passed in.

You should be able to reproduce this fairly easily. Get a ScrapeResultsItem using IMDB, make a call into GetMovieDetails, then examine the contents of the item you passed in. It will be different.

- Nicezia - 2009-08-11

smeehrrr Wrote:ScraperResultsItem is a reference type, not a value type. So if you pass it to a function and then modify one of its fields, the field remains modified when you return from the function. Check http://msdn.microsoft.com/en-us/library/0f66670z(VS.71).aspx#vclrfpassingmethodparameters_example4 for a good example of this.

You're right, though, in that adding the ref keyword won't really address this problem. The only way to fix it is to explicitly copy the object instead of modifying the one that's passed in.

You should be able to reproduce this fairly easily. Get a ScrapeResultsItem using IMDB, make a call into GetMovieDetails, then examine the contents of the item you passed in. It will be different.

what i'm saying is its literally impossible for that to happen in my code, the only thing that comes out of the Get*Details function is the *Tag object, and the item passed in is copied as per general coding rules, the only thing that should have made any kind of change is the copy of the item inside the function, it doesn't make any changes to the object that are external to the function. (General .NET protocol on sending an item to a function - and for that matter every programming language i know, states that an item sent to a function is a copy of that item inside that function unless otherwise specified), how in gods name its managing to change something outside of the function is a mystery, and i can't reproduce this behaviour on my end (yes the item inside the function is modified, but there's no reason at al for that modified item to go outside the function... as there is no multiple returns ONLY the tag item.) i've tried everything to make this happen, and the only way it could posibly happen is if the ref keyword is used on the public function initially used by the program which its not... so you got me Phil...

maybe you should try the new code in svn.... (in its origional form without modification - since through all testing it seems to work as its supposed to)

ps... also i can be reached as nicezia on freenode (IRC)

- smeehrrr - 2009-08-11

[quote=Nicezia]what i'm saying is its literally impossible for that to happen in my code, the only thing that comes out of the Get*Details function is the *Tag object, and the item passed in is copied as per general coding rules, the only thing that should have made any kind of change is the copy of the item inside the function, it doesn't make any changes to the object that are external to the function. (General .NET protocol on sending an item to a function - and for that matter every programming language i know, states that an item sent to a function is a copy of that item inside that function unless otherwise specified),[quote]
Take a look at the link I posted. What you're saying is true for value types in C#, but not true for reference types. For reference types (which ScraperResultsItem is, or which an array would be), modifying the information inside the object or array is reflected in the original object.

If you look at examples 4 and 5 at that link you'll see the effect that the 'ref' keyword has when passing reference types. It's not at all intuitive.

Have you tried running the steps I suggested in a debugger? I suspect your test code never tries to reuse a ScraperResultsItem, so you obviously won't see this in your testing unless you look for it.

- Nicezia - 2009-08-12

yes i did and as i said, there's no reason for that changed item to go outside the function (i use the scrape results item urls for downloading the custom functions so when it finds custom function calls (which there are several in IMDB) those custom function callse get set as the URLs and the parser iterates through them as neccessary, yes again the item gets changed internally, but that item is/should be a copy of the origional item not the actual item itself and what's more that item should not be passed outside the function, and in my tests (doing even as you suggested) the item gets changed inside the function but the item passed to it stays intact.

- smeehrrr - 2009-08-12

For those playing the home game, Nicezia and I hashed this out last night via IM, and the next version shouldn't have this behavior.

- Nicezia - 2009-08-12

actually the current version in svn should be without this behaviour it only took changing one phrase

btw Welcome to the project

- smeehrrr - 2009-08-16

Out of curiosity, am I the only one building out of the SVN? If you're using ScraperXML in your projects, please drop a note here and let me know how you're getting it and how you're using it.

- spiff - 2009-09-08

nicezia; another head's up.

i'm about to add an encode="1,2,3" thing to the expression tag. this is to urlencode the specified buffers.

- Nicezia - 2009-09-08

allright got it

- spiff - 2009-09-09

it's in.

- Nicezia - 2009-09-21

Just to let you know, in scraperXML i added a feature to expression

fixamp="1,2...."

this fixes stray ampersands in match groups in the same way encode, noclean, & trim work

adding a new function to scraperparser

FixAmp

I'm still not sure how this would be done in c++

but i think this is about right (i'm sure you'll be able to spot any errors i've made in it)

Code:
ScraperParser::FixAmp(CStdString &strXML)

{

   CStdString strFixed = "";

   for (int i = 0; i < strXML.length; i++)

   {

      if (strXML[i] = '&')

      {

          if (strXML.substring(i, 5) == "&amp;"

              || strXML.substring(i, 4) ==  "&lt;"

              || strXML.substring(i, 4) ==  "&gt;"

              || strXML.substring(i, 6) ==  "&apos;"

              || strXML.substring(i, 6) ==  "&quot;"

              || [b]RegEx.Match(strXML.substring(i, 7), "&x[0-9]+;").Success[/b])

          {

             strFixed = strXML[i];

          }

          else

          {

              strFixed += strXML[i];

          }

       }

       else

       {

           strFixed += strXML[i];

       }

   }

       strXML = strFixed;

}

also the code in ScraperParser: Tongue

arseExpression would have to be modified to call this function, but seeing as how that's the one function in ScraperParser that confounds me i can't actually try to show how it would be done, however it would follow the same model as noclean, trim and encode. also the part in bold needs to be edited to use XBMC Regular Expression Engine to take into acount ISO-8559 entities.

Anyway i did this because there seems to be alot of problems with stuff like Biographies & Descriptions which throw ampersands in there and make the result unparesable, or require the scraper writer to play games with the output to fix it, when a real simple function can fix it.

- seedzero - 2009-10-15

Hey nicezia, just thought you might like to know about a patch for ratings which allows for a max="foo" attribute. It's in SVN as of r23705, I did a write up in the wiki about it.

New Version Soon! - Nicezia - 2009-12-20

ScraperXml 5.0 beta code will hit svn by Monday

New Features:

100% support for encoding.
Better Ampersand handling (after having found something in the ScraperParser: Tongue

arseExpression labeled "nasty hack #1")
Rearranged some classes in namespaces by thier usage
Simplified Scraping: no need to call sepearte functions according to content, only the initial search needs to specify content.
Supports Rating scaling.
New Generic scraper type for scraping of any structured webpage.

Removed Features:
Multiple Site Scrape Removed... simply because it was impratical

WIP Features:
Control library for easier integration.
EpisodeUrl/NfoUrl function handling.
Results Sorting.

how to integrate in XBMC - ssimon - 2009-12-28

Please excuse my ignorance, but how do I integrate your ScraperXml 5.0 into XBMC? I am running a HD install of XBMC Live 9.11 Camelot on an Acer Revo 1600 and my scrapers do not work at all. I get garbled letters and numbers for a movie name when I try to scrape. I am hoping your script will have better luck so I can finally download the artwork and synopsis for my movies...

Thank you kindly.

- Nicezia - 2009-12-29

ssimon Wrote:Please excuse my ignorance, but how do I integrate your ScraperXml 5.0 into XBMC? I am running a HD install of XBMC Live 9.11 Camelot on an Acer Revo 1600 and my scrapers do not work at all. I get garbled letters and numbers for a movie name when I try to scrape. I am hoping your script will have better luck so I can finally download the artwork and synopsis for my movies...

Thank you kindly.

ScraperXML is not a script, its a .NET library that uses XBMC scrapers to download information for media.

it wouldn't be possible to integrate it into XBMC as XBMC is not written with .NET support