custom polish music scraper - help and bug reports wanted!

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
smuto Offline
Senior Member
Posts: 240
Joined: Sep 2004
Reputation: 2
Post: #1
hey,

i did for myself hybrid english\polish scraper

mostly i use code from xbmc existing scrapers

allmusic - for generic info [english] - thx spiff
merlin.pl - for album review [polish]
lastfm - for artist biography [polish] - thx spiff

http://smuto.w.interia.pl/allmusic_merlin_lastfm.xml

smuto

[Image: 1.png]
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,174
Joined: Nov 2003
Reputation: 82
Post: #2
Smile

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
smuto Offline
Senior Member
Posts: 240
Joined: Sep 2004
Reputation: 2
Post: #3
hey again
next custom polish music scraper
http://smuto.w.interia.pl/xbmcpicard.xml

album search - musicbrainz.org
album review - merlin.pl
album generic info - allmusic.com

artist search - lastfm
artist discography - musicbrainz.org
artist biography - lastfm
artist generic info - allmusic.com

diacritics replace for allmusic search by my own php tool

smuto

[Image: 1.png]
find quote
smuto Offline
Senior Member
Posts: 240
Joined: Sep 2004
Reputation: 2
Post: #4
i need to ask
is xbmc supports MBID when scrobbling a song?

mayby we can add MBID to the buffers passed to scrapers?
CreateAlbumSearchUrl: $$1 = album title, $$2 = artist title, $$3 = album mbid
CreateArtistSearchUrl: $$1 = artist title, $$2 = artist mbid

take a look
http://musicbrainz.org/ws/1/artist/$...#36;$1

and now my problem
artist = "The The"
with empty mbid
http://musicbrainz.org/ws/1/artist/?type...me=The+The
or with
http://musicbrainz.org/ws/1/artist/a7409...me=The+The

and no more problems with auto scan
smuto

[Image: 1.png]
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,174
Joined: Nov 2003
Reputation: 82
Post: #5
good idea. we do extract them so they should be passable. trac it please.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
smuto Offline
Senior Member
Posts: 240
Joined: Sep 2004
Reputation: 2
Post: #6
more questionsNerd

Is there a way to fill tracks info
<track><position>\1</position><title>\2</title><duration>\3</duration></track>

from this xml
http://musicbrainz.org/ws/1/release/f2b8...ase-events

position - mayby repeat index
duration - milisecends
for now i open a new html page

allmusic scraper common requests
whether we can move a tracks info from ParseAMGAlbum to GetAMGReview?

smuto

[Image: 1.png]
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,174
Joined: Nov 2003
Reputation: 82
Post: #7
tricky to get sane info from that page. you can most definitely split parseamgalbum if you find it advantageous.

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
bambi73 Offline
Senior Member
Posts: 165
Joined: Jan 2010
Reputation: 0
Location: Czech Republic
Post: #8
If you are still looking for loading track infos from xml, you can try using something like:

!!! WARNING !!! I wrote this without much knowledge about your scraper (only checked it briefly) and without any testing in XBMC, so i guess there are some errors, but in theory it should work (i used similar constructions in my scraper) Wink

Code:
<xxxxxxx clearbuffers="no" .....>
    .
    .
    .

    <!-- Variables:
       - $$20 ... buffer for gathering track infos (new info is appended to end of buffer)
       - $$19 ... current track id
       - $$18 ... current track position
     -->  
    <RegExp input="" output="\1" dest="20">
      <expression/>
    </RegExp>
    <RegExp input="$$1" output="\1" dest="19">
      <expression>&lt;track id=&quot;([0-9a-f\-]+)&quot;</expression>
    </RegExp>
    <RegExp input="1" output="\1" dest="18">
      <expression/>
    </RegExp>
    
    <!-- url call to GetXMLTitleList - empty (cleared) in case there are no tracks -->
    <RegExp input="$$19" output="&lt;url function=&quot;GetXMLTitleList&quot; cache=&quot;yourcache.xml&quot;&gt;http://your.xml.source.url/&gt;" dest="9">
      <expression clear="yes">[0-9a-f\-]+</expression>
    </RegExp>

    <!-- url added to your details buffer -->
    <RegExp input="$$9" output="\1" dest="8+">
      <expression noclean="1"/>
    </RegExp>

    .
    .
    .
  </xxxxxxx>

  <GetXMLTitleList clearbuffers="no" dest="4">
    <RegExp input="$$8$$9" output="&lt;details&gt;\1&lt;/details&gt;" dest="4">
    
      <!-- Parsing track info, appended to $$20 -->
      <RegExp input="$$1" output="\1" dest="6">
        <expression clear="yes" noclean="1">(&lt;track\s+id=&quot;$$19&quot;.*?&lt;/track&gt;)</expression>
      </RegExp>
      <RegExp input="$$6" output="&lt;track&gt;&lt;position&gt;$$18&lt;/position&gt;&lt;title&gt;\1&lt;/title&gt;&lt;duration&gt;\2&lt;/duration&gt;&lt;/track&gt;" dest="7">
        <expression clear="yes">&lt;title&gt;(.*?)&lt;/title&gt;.*?&lt;duration&gt;(\d*?)&lt;/duration&gt;</expression>
      </RegExp>
      <RegExp input="$$7" output="\1" dest="20+">
        <expression noclean="1"/>
      </RegExp>
  
      <!-- Parsing next track id -->
      <RegExp input="$$1" output="\1" dest="19">
        <expression clear="yes">\Q$$6\E\s*&lt;track id=&quot;([0-9a-f\-]+)&quot;</expression>
      </RegExp>
        
      <!-- Track number + 1 -->
      <RegExp input="1-2;2-3;3-4;4-5;5-6;6-7;7-8;8-9;9-10;10-11;11-12;12-13;13-14;14-15;" output="\1" dest="18">
        <expression>$$18-(\d+);</expression>
      </RegExp>      
    
      <!-- Only one variable should be filled
         - $$8 in case there are still unprocessed tracs -> another cycle over GetXMLTitleList
         - $$9 in case you are at the end of list -> returns all gathered track infos from $$20
      -->    
      <RegExp input="~$$19~" output="&lt;url function=&quot;GetXMLTitleList&quot; cache=&quot;yourcache.xml&quot;&gt;Should be already cached&lt;/url&gt;" dest="8">
        <expression clear="yes">~[0-9a-f\-]+~</expression>
      </RegExp>
      <RegExp input="~$$19~" output="$$20" dest="9">
        <expression clear="yes">~~</expression>
      </RegExp>
      
      <expression noclean="1"/>
    </RegExp>
  </GetXMLTitleList>

But i'm not sure if you can call it "sane" Tongue
find quote
smuto Offline
Senior Member
Posts: 240
Joined: Sep 2004
Reputation: 2
Post: #9
"sane" is position & duration

position is just tricky, but i also need change format of duration
from
<duration>246826</duration>
to
<duration>4:07</duration>

thx for your help

[Image: 1.png]
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,174
Joined: Nov 2003
Reputation: 82
Post: #10
it would need code support to allow something ala <duration format=".."></duration>

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
Post Reply