Getting started with scraper development? Discoogs, and Anime News Network

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Question  Getting started with scraper development? Discoogs, and Anime News Network
Post: #1
Hello there fellow xbmc:ers,
I've started a small project of mine writing 2 scappers for;

Discoogs (Music), one of the most complete database for electro-music, something the allmusic scrapper lacks.
http://www.discogs.com/

and
Anime News Network (Tv), have a very big database of anime with episode names etc.
http://www.animenewsnetwork.com/

Nevermind, to the point;
I have to restart xbmc everytime, takes forever just to make a small change in the scapper-xml. I've read on the forum about some small program called scap.exe but i can't find it anywhere. Perhaps it is in the source but have no posibility to compile the xbmc here (various reasons).

So how do you do it?
find quote
Trazer Offline
Junior Member
Posts: 10
Joined: Feb 2008
Reputation: 0
Location: Netherlands
Post: #2
I was just told (by spiff) in the moviemeter scraper topic that scrap.exe is outdated and as a result can not be used to effectively test a scraper.
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #3
Oh, so there is no other way to do then the "hard" way.. well well not much to do.
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #4
why do you have to restart xbmc?

and obviously using the linux/osx/windows versions would be easier than xbox
(This post was last modified: 2008-07-25 13:23 by spiff.)
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #5
well perhaps i have a bad build or something but if i don't restart xbmc the scrapper changes don't get update properly.
Yes I'm writing the scraper on a windows build atn (will move it to my xbox when it's done).
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #6
ah, darn, did i forget to fix the reload on refresh..

if you can build, here's a quick'n'dirty hack that might make your life a little bit easier.

GUIWindowVideobase.cpp, comment line 633 m_database.SetDetailsForMovie(item->m_strPath, movieDetails);

that way info won't be stored in the db. so you can do the lookup, close the dialog, do the lookup etc and scraper should reload
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #7
hm.. i seems to me that i will ditch the windows build and use my ubuntu comp instead for scraper develop, since i can't build on Windows.

I'll try the small hack you gave when i got a good build enviroment up and running on Ubuntu.

When I'm at it, is there any info about how the music scrapers works? I'm just starting to get the hang of how the buffers and stuffs work Wink.

Thanks for the help.
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #8
allmusic.xml is the documentation Smile

basically it's just like movies, except that we have two search functions one for albums, one for artists

documenting stuff is not one of my "strong" sides Wink
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #9
Yeah i'm trying to reverse-engineer the allmusic.xml but i think i'm not getting hang of some fundmental things.

From the begining:
The discoog search string for releases (album) look like this:

http://www.discogs.com/search?type=releases&q=[Album]&btn=Search

if I understand correctly the xml-code would look something like this:
Code:
    <CreateAlbumSearchUrl dest="3">
        <RegExp input="$$1" output="http://www.discogs.com/search?type=releases&q=\1&btn=Search" dest="3">
            <expression noclean="1"></expression>
        </RegExp>
    </CreateAlbumSearchUrl>

Where $$1 is the Album name and the nonclean=1 means that $$1 will not be empty after the search, correct?

the result of the search will be stored in $$3, correct?

Now for make things abit more complicated, i know that you will get more narrowed searched (discogs db is huge) if you include the artist name in the search. So a better search url would be
http://www.discogs.com/search?type=releases&q=[Artist][Album]&btn=Search
any tips on how i do this?


Yeha i know how it is with the documentation-part, works as a developer myself ;P
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #10
nevermind about the noclean=1, found info about it on the wiki. Wink
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #11
in createalbumsearchurl you are fed album name in $$1 and artist name in $$2
this should do it;

Code:
    <CreateAlbumSearchUrl dest="3">
        <RegExp input="$$1" output="http://www.discogs.com/search?type=releases&q=[\1][$$2]&btn=Search" dest="3">
            <expression noclean="1"></expression>
        </RegExp>
    </CreateAlbumSearchUrl>
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #12
Ah thanks Wink,
Perhaps some OT but i'm trying to make a build on linux (using latest svn) and i get compling errors.
make[2]: Leaving directory `/home/ztripez/XBMC/xbmc/cores/dvdplayer/Codecs/ffmpeg´
make[1]: Leaving directory `/home/ztripez/XBMC/xbmc/cores/dvdplayer/Codecs´
make *** [dvdcodecs] Error 2
make[1]: Leaving directory `/home/ztripez/XBMC/´

Errors have occured!
View compile log (y/n).
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #13
nevermind i did a fresh checkout and now it's working fine Wink
find quote
ztripez Offline
Junior Member
Posts: 48
Joined: May 2008
Reputation: 0
Post: #14
I dunno if it is only me being stupid or..
Is there anyway to get a debug output on whats actualy going on? how the xml-"file" i'm doing to the scraper looklikes and such.
I've try to set loglevel in the advancedsettings.xml but nothing happends.. do i miss anything?
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #15
not really, but if you speak c++ go to xbmc/utils/MusicInfoScraper.cpp for music or xbmc/utils/IMDB.cpp for videos. in the relevant functions you can printf / log the received xml. just look for .Parse() and print the returned strings there. i'm a bit too intoxicated to cook up a diff rite now, but i can do it tomorrow if u can figure it out yourself
find quote
Post Reply