HOW-TO write Media Info Scrapers - Scraper creation for dummies

  Thread Rating:
  • 1 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Gamester17 Offline
Team-XBMC Forum Moderator
Posts: 10,595
Joined: Sep 2003
Reputation: 9
Location: Sweden
Post: #11
pko66 Wrote:BTW, is there something that can be done to the horizontal scroll bar? it makes the article pretty unreadable Sad
Send sho a PM, he is our resident wiki guru Big Grin

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
pko66 Offline
Senior Member
Posts: 189
Joined: Dec 2006
Reputation: 0
Post: #12
I've been a few days on vacation and I've been unable to do anything in the guide... But now I'm back and I hope this weekend I will have the time and make some progress into it. Now that culturalia scraper is finished (didn't post it to be included in xbmc yet... will do soon), I'm planning to expand it to search into IMDB, and I hope I can make the implementation open enough so it can be included in other scrapers (like moviemeter.nl, that has been requested recently)

Then I plan to do a scraper for "generic" movies (to scrape for example home movies) and an addon for any scraper to simulate file mode in library mode (sacrifying the genre tags); both would work in current xbmc, but a more "elegant" solution would need some (I hope easy) modifications in xbmc code, so an after-atlantis thing.

Maybe I'm being a little too optimistic with my time and knowledge capabilities :-D but I hope not, I think I can do that in a 2 - 3 week timeframe.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #13
My Search string needs to be modified before aplied, if i have spaces in the buffer how would i modify the string to change all spaces to a '+' ?
find quote
spiff Online
Grumpy Bastard Developer
Posts: 12,176
Joined: Nov 2003
Reputation: 82
Post: #14
is this a search string fed from the application? if so; which. they should be url encoded before passed to the scraper.

in any case, just run a regular expression to replace..

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #15
Okay i've figured it out how to use a regex to replace the spaces now... for the record when i use scrap using scrap.exe the url returns 'Blah+blah+blah' when i use xbmc the log reports that its scrapng for 'Blah%20blah%20blah'

Just one small observation.. even the guide for dummies on writing scrapers is a bit high brow... one has to read through it about 30 times while trying it to figure it out completely... for instance one thing that's not FULLY addressed is exactley what all the 'special characters' that need to be escaped are that much i still haven't figured out yet, but i guess its because i still don't completely understand XML...

The only reason i'm getting it right now is because i wrote a little VB program to output my regular expressions to xml-encoded strings.
find quote
spiff Online
Grumpy Bastard Developer
Posts: 12,176
Joined: Nov 2003
Reputation: 82
Post: #16
well, you can just goggle for the xml-escape chars, it is general xml.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
spiff Online
Grumpy Bastard Developer
Posts: 12,176
Joined: Nov 2003
Reputation: 82
Information  Changes to thumb handling for scrapes and NFO! ALL SCRAPER DEVELOPERS PLEASE READ! Post: #17
hi guys,

i'm sorry that i have to do this to all of you, but it was necessary as the current state was just embarrassing Smile

please mind http://trac.xbmc.org/changeset/21882

i opted to not keep the old loading code as this would never die. one clean, painful cut Smile

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
UsagiYojimbo Offline
Member
Posts: 83
Joined: Feb 2010
Reputation: 1
Location: Debrecen, Hungary
Star    Post: #18
Nicezia Wrote:for the record when i use scrap using scrap.exe the url returns 'Blah+blah+blah' when i use xbmc the log reports that its scrapng for 'Blah%20blah%20blah'
Well, both of the URI's mean the same, as both the %20, and the plus sign are evaluated to a space character... Nerd

"Now something totally different..." Big Grin

Is there a chapter 3 planned, about scraping tv-shows (tv series in particular)?
As all documentation deals with scraping movies, but I did not found any tv-show related info. Confused
find quote
Nicezia Offline
Fan
Posts: 369
Joined: Nov 2006
Reputation: 0
Location: Montgomery, Alabama
Post: #19
UsagiYojimbo Wrote:Well, both of the URI's mean the same, as both the %20, and the plus sign are evaluated to a space character... Nerd?

No some sites don't interpret + and %20 as meaning the same thing....as i found out when dealing with some scrapers i was writing a + will mean to some sites that this exact word MUST exist as written... while %20 (space) allows for fuzzy search

ScraperXML Open Source Web Scraper Library compatible with XBMC XML Scrapers


I Suck, and if you act now by sending only $19.95 and a self addressed stamped envelop, so can you!

[Image: teamumx_sigline.png]
find quote
UsagiYojimbo Offline
Member
Posts: 83
Joined: Feb 2010
Reputation: 1
Location: Debrecen, Hungary
Post: #20
Nicezia Wrote:No some sites don't interpret + and %20 as meaning the same thing....as i found out when dealing with some scrapers i was writing a + will mean to some sites that this exact word MUST exist as written... while %20 (space) allows for fuzzy search
Well, see for yourself:HTML URL Encoding @ W3Schools Nerd
On the other hand, some scripts do not handle it properly... However, the URL/URI is correct.

BTW, what you mention, the plus sign haveing extra meaning: it is possible, if that plus sign is encoded as %2B. If that happens, that means that there is some multiple encoding to your URL/URI... Try removing them, until only one remains. (Well in case of a scraper, you could remove the encode attribute.)
find quote