Kodi Community Forum
ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Development (https://forum.kodi.tv/forumdisplay.php?fid=32)
+--- Forum: Scrapers (https://forum.kodi.tv/forumdisplay.php?fid=60)
+--- Thread: ScraperXML (Open Source XML Web Scraper C# Library) please help verify my work... (/showthread.php?tid=50055)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22


- DonJ - 2009-05-21

I think we should have a 'scraper framework' version attribute to identify incompatibilities as the scraper framework progresses + a scraper version attribute as you suggested Nicezia.

Let's simply define the scraper framework as used in babylon as 1.0

DonJ


- spiff - 2009-05-22

yeah, with you on that donj


- Nicezia - 2009-05-22

DonJ Wrote:I think we should have a 'scraper framework' version attribute to identify incompatibilities as the scraper framework progresses + a scraper version attribute as you suggested Nicezia.

Let's simply define the scraper framework as used in babylon as 1.0

DonJ

spiff Wrote:yeah, with you on that donj


Great when will this go into effect?

Unrelated, I've run into an obstacle with the TVDB scraper. As it uses a zip file for info.

Okay i suppose this is where "url cache=" comes in handy

My question is is what exactley fo you do after the zip for the show is downloaded? how do you handle accessing the zip file (I know you have zip suppoort in xbmc, but i'm at a loss to the process that's happening after the zip is downloaded


- spiff - 2009-05-22

we extract all files in the zip after each other in a memory buffer. i.e. the zips from tvdb holds 3 xml files, we concate these into one buffer.

i'l look into adding version="1" on the <scraper> tag asap, no eta


- DonJ - 2009-05-22

I'd say immediately. To avoid inconsistent versioning by scraper authors we should use a date (ISO 8601) instead of a version attribute.

New attributes: framework, date:
Quote:<scraper name="xxs" content="xx" thumb="xx.jpg" language="en" framework="1.0" date="2009-05-22">

If you agree spiff I'll add it to all scrapers in svn later

EDIT: Looks like we posted at the same time spiff, go ahead.


- Gamester17 - 2009-05-22

DonJ Wrote:I'd say immediately. To avoid inconsistent versioning by scraper authors we should use a date (ISO 8601) instead of a version attribute.
...or just use XBMC's releases number convention which is based on a date, example 9.04 means the year 2009 and the month April, while 9.05 would be 2009 May, and 10.01 would be 2010 January.

Big Grin


- DonJ - 2009-05-22

Since the aim of Nicezia's library is to use scrapers in other applications I wouldn't impose xbmc versioning format on them. Moreover, ISO 8601 is easier understood and much more flexible (multiple updates a month, supports time etc.)


- spiff - 2009-05-22

donj; please handle it. and i'm fine with iso 8601


- Nicezia - 2009-05-22

while waiting for your reply on the zip process I've nearly finished support for Music scrapers

It seems i'll be done with the official XBMC support sooner than i thought


- DonJ - 2009-05-22

Added attributes with R20536


- Nicezia - 2009-05-22

that will definately help keep things clarified


- Nicezia - 2009-05-23

ok simple question
if you're not using dvd ordering, and you're not using absolute number on a tvshow

what ordering is there left?

nevermind, i have a different question to ask as this phrase seems to be completely locking up my library

from the TVDB scraper:
Code:
<RegExp conditional="!absolutenumber" input="$$1" output="&lt;episode&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url cache=&quot;$$10.xml&quot;&gt;$$2&lt;/url&gt;&lt;epnum&gt;\3&lt;/epnum&gt;&lt;season&gt;\4&lt;/season&gt;&lt;id&gt;\1&lt;/id&gt;&lt;/episode&gt;" dest="4">
        <expression repeat="yes">&lt;Episode&gt;.*?&lt;id&gt;([0-9]+).*?&lt;EpisodeName&gt;([^&lt;]*).*?&lt;EpisodeNumber&gt;([0-9]+)[^&lt;]*.*?&lt;SeasonNumber&gt;([0-9]+)[^&lt;]*.*?&lt;/Episode&gt;</expression>
</RegExp>

i mean it completely freezes, so if i stop the program before this expression runs, the moment it locks up is when matching the expression, it finds an uncountable number of matches

anyone have any suspiscions as to why? all i've changed in the code is added the unzip and concat ability to it, and everything else still works fine, if i disable this phrase and enable dvdorder it works ok as well. which doesn't make sense to me since the expression is almost identical to this one.

Even when i check this expression against the combined contents of the zip in regexBuddy, it freezes up


- Gamester17 - 2009-05-23

@Nicezia, not sure if it is something you want in your ScraperXML library or not but FYI; changeset r20559 commits "Ability to scrape and scan TV Shows into the video library by air-date via TheTVDB.com" to XBMC scraper code and TheTVDB XML scraper, see patch #5143 on trac for more information.

Wink


- Nicezia - 2009-05-23

Gamester17 Wrote:@Nicezia, not sure if it is something you want in your ScraperXML library or not but FYI; changeset r20559 commits "Ability to scrape and scan TV Shows into the video library by air-date via TheTVDB.com" to XBMC scraper code and TheTVDB XML scraper, see patch #5143 on trac for more information.

Wink

I'll look into that, i've got my hands ful with the tvdb as it is, everything else i have solid, i've even finished up the music scrapers.


@spiff does "." (period) match newline or not? in the wiki it says its supposed to, which i allowed for in my library. should that be turned off?


- Nicezia - 2009-05-24

ah ha! found my problem, the conditional evaluation was just slightly out of place, so it was running statements that it shouldn't have.

ScraperXML now works with all XBMC scrapers. Soon as i do a little housekeeping onthe code, i'm going to up Version 4.0 and start working on Game, Books, And Comics!!!!!

Does anyone know of an GPL .NEt zip lbrary, that's easy to use, i tried sharpziplib, and i couldn't make heads or tails of it, before i release this i need to replace the ziplib i'm using (IonicZip) with a GPL compatible zip lib as from what i read i can't tell if the license its under (MS Public Liscence) is compatible with GPL.

Considering its a Microsoft Public License my first guess would be no.

My first choice would be to use SharpZipLib so if there's anyone that can help me to get it to do the things i need it to do that would be appreciated (and would speed up the release of 4.0, as i can't release it under the GPL as is, i don't think)

Edit: Nevermind again, my friend just dropped in and wrote a beautiful little simple routine for me for SharpZipLib, and made me feel stupid for not figuring this out on my own. damn those career programmers. In anycase, he's taken a copy of the program and is going to convert it into c#, so i should have a mono lib ready by the end of the weekend.