Last.fm scraper in development - help wanted

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
pyro-x Offline
Junior Member
Posts: 4
Joined: Sep 2008
Reputation: 0
Location: Madrid, Spain
Question  Last.fm scraper in development - help wanted
Post: #1
http://forum.xbmc.org/showthread.php?tid=38378
spiff Wrote:i have considered it but i do not feel comfortable scraping a site that provides open api's. that being said, anyone else is ofc free to do it

Hi spiff!

I've already tried to begin with the last.fm scrapper. But i have some problems understanding how the flows and interaction between the scrapper and xbmc works...

I started modifying your allmusic scrapper. Just to have something to begin with..

I would like to fully understand i the flow.

First, the scrapper create the albumsearchurl, i maganaged to get that working... after a working url has been created xbmc makes the request to it and then the regexp to parse the resuts. What follows is what i don't fully understand, once i got the basic information, album title, and url.. how the request for the album information url is done. I never get a list of albums or anything.

This is mi getalbumsearchresult :
<GetAlbumSearchResults dest="8">
<RegExp input="$$5" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
<RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;2000&lt;/year&gt;&lt;genre&gt;test&lt;/genre&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;http://www.last.fm/music\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
<expression repeat="yes">&lt;a href=&quot;(.*)&quot;&gt;(.*)&lt;/a&gt; &lt;span</expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetAlbumSearchResults>

I don't know if all fields (year,genre,title,etc) need to exist in the result of the first request, i don't have them available at first, but i would from the url fetched in the regexp...

Perhaps there is something wrong with the regexp, ive tried m any simple ones, with no result, is there any way to force a valid result?. So i can get to the next step in the scrapper?.

Well, i don't really know if i'm actually making any sense here.. but thank you very much in advance for your help.

Pyro-X
(This post was last modified: 2008-10-15 18:20 by Gamester17.)
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #2
okay, i'm in a bit of a hurry so i'll just go fast.

createalbumsearchurl - creates the url
getalbumsearchresults - returns a list of possible matches. here you only fill title, artist, year, genre (actually only title & artist is required). xbmc chooses which of these the user want (either letting the user choose from a list or scoring the matches and taking the one with the highest score), fetches this url then runs the scraper function GetAlbumDetails on the contents of that url. THIS is where you will fill ALL details.
find quote
pyro-x Offline
Junior Member
Posts: 4
Joined: Sep 2008
Reputation: 0
Location: Madrid, Spain
Post: #3
spiff Wrote:okay, i'm in a bit of a hurry so i'll just go fast.

createalbumsearchurl - creates the url
getalbumsearchresults - returns a list of possible matches. here you only fill title, artist, year, genre (actually only title & artist is required). xbmc chooses which of these the user want (either letting the user choose from a list or scoring the matches and taking the one with the highest score), fetches this url then runs the scraper function GetAlbumDetails on the contents of that url. THIS is where you will fill ALL details.


Thank you very much for your explanation.. finally i got it to work Smile. My blocking step was that i didn't know why was the need of two regexp at the GetAlbumSearchResults section. I still don't know, but wrote then two regexp one for the "search title" and another one for the actual results. And then finally got into the next step Smile.

Anyway, what fields are mandatory for the album detail?, there isn't so much information on last.fm. I don't know what i should do with the "tags" because a tag here can be, a genre, a mood, and sometimes any other crazy thing the last.fm user decides xD.

Anyway thank you very much, i'm already making progress with it Smile. I even can get the cover !! Smile)

Oh!, one more thing i'm thinking about... if mp3s are already tagged with its genre , year of publish, and then with the scrapper get some of that information again from the web, which one is taken into account for the xbmc db and the library mode?. Ones from the id3 tags, or the ones from the scrapping result.

Thanks,

Pyro-X
(This post was last modified: 2008-10-14 11:34 by pyro-x.)
find quote
v0lrath Offline
Member
Posts: 81
Joined: Sep 2008
Reputation: 0
Location: Redmond, WA/Provo, UT
Post: #4
I would love to be able to use tags as genres so I could go through my library by last.fm tags. It would be nice if tags such as "seen live" and "awesome", etc. were filtered out though.

Keep up the good work, Last.fm is much better than allmusic.
find quote
Gamester17 Offline
Team-XBMC Forum Moderator
Posts: 10,523
Joined: Sep 2003
Reputation: 10
Location: Sweden
Information  Tips!
Post: #5
See:
http://wiki.xbmc.org/?title=Category:Scraper
and:
http://forum.xbmc.org/showthread.php?tid=38379
and:
http://forum.xbmc.org/showthread.php?tid=38378

Wink

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
DuMbGuM Offline
Fan
Posts: 448
Joined: Sep 2008
Reputation: 1
Location: Ireland
Post: #6
looking good pyro-x, keep up the hard work, last.fm will be a real nice repo to have as a scraper.
find quote
spyrojyros_tail Offline
Junior Member
Posts: 9
Joined: Nov 2008
Reputation: 0
Post: #7
Hey guys, a last.fm scraper would be brilliant!! I would love to have my collection updated.

Pyro-x are you and rwparris2 going to be working on this together Confused
find quote
rwparris2 Offline
Team-XBMC Python Developer
Posts: 1,333
Joined: Jan 2008
Reputation: 2
Location: US
Post: #8
spyrojyros_tail Wrote:Hey guys, a last.fm scraper would be brilliant!! I would love to have my collection updated.

Pyro-x are you and rwparris2 going to be working on this together Confused

I pretty much gave up on it -- couldn't get my head to wrap around what was going on, and decided to waste my time on things that didn't frustrate me.

Buut I really want it so if Pyro-x or anyone else wants to email me / PM me feel free.
(Any emails that want whatever I have so far will be replied with " " because thats basically all I managed)

Always read the XBMC online-manual, FAQ and search and search the forum before posting.
For troubleshooting and bug reporting please read how to submit a proper bug report.

If you're interested in writing addons for xbmc, read docs and how-to for plugins and scripts ||| http://code.google.com/p/xbmc-addons/
find quote
TechLife Offline
Donor
Posts: 582
Joined: Aug 2008
Reputation: 20
Location: Aurora, CO
Post: #9
spiff Wrote:...either letting the user choose from a list or scoring the matches and taking the one with the highest score...

How are the returned results scored? If the site supports it, can I return a relevance value so that the proper entry from the returned list is chosen?

*If I helped, please +rep below*
Windows Media Center PVR add-on (pvr.wmc) and Server (ServerWMC)
The XBMC team, plug-in devs, skinners, etc. do this for us for FREE in their spare time because they want to. Think about that for a second before you start bitching...
(This post was last modified: 2008-11-22 01:51 by TechLife.)
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #10
usually it's a equally weighted fuzzy string match of artist and album.
with 16266 add the following in the return xml

<relevance scale="yy">x.xx</relevance>

where x.xx is a number between 0 and 1 and yy is an optional scale (if your number is scaled otherwise) and your wish is granted.
find quote
TechLife Offline
Donor
Posts: 582
Joined: Aug 2008
Reputation: 20
Location: Aurora, CO
Post: #11
luv u spiff Big Grin

*If I helped, please +rep below*
Windows Media Center PVR add-on (pvr.wmc) and Server (ServerWMC)
The XBMC team, plug-in devs, skinners, etc. do this for us for FREE in their spare time because they want to. Think about that for a second before you start bitching...
(This post was last modified: 2008-11-22 02:57 by TechLife.)
find quote
kriziz Offline
Junior Member
Posts: 4
Joined: Jun 2008
Reputation: 0
Location: UT, Netherlands
Post: #12
hey guys, its good to find out somebody is working on more music scrapers. Most of my albums come up blank on allmusic.com. An alternative scraper to work around this issue would be great! Last.FM and discogs.com are my favorite sites, and i'd be happy to help out with scraper development for either site.

Is there any public place where i can find the current development version of your scraper ? Ofcourse i'd like to know where the development of the lastfm scraper is at. I have a lot of scripting experience (all sorts of stuff: PHP, Perl, pl/pgsql, LUA script, etc..) and have constructed a lot of complex regular expressions in the past, so if any help is needed, let me know.. Smile
find quote
Aron Parsons Offline
Senior Member
Posts: 153
Joined: Oct 2003
Reputation: 0
Location: Virginia
Post: #13
@pyro-x
Are you still working on this? Do you want any assistance? If so, post your latest revision and I'll see where I can help out.
find quote
kastrolis Offline
Junior Member
Posts: 2
Joined: Dec 2008
Reputation: 0
Post: #14
Pyro-x is either so busy with his work on the scraper that he doesn't even have time to check this message board, or he has given up on it altogether. if the second option were to be true, also taking in account that work on discogs.com scraper also seems to have been discontinued, there's no doubt - some other people should take over the alternative scraper development. personally I'm ready to start working on this scraper, however I have serious doubts that last.fm really is the best source of information, as the only real thing that it provides are album covers. discogs.com seems slightly better, but I don't like it listing all the different issues of the record (Canadian Vinyl editions and stuff like that).
find quote
Aron Parsons Offline
Senior Member
Posts: 153
Joined: Oct 2003
Reputation: 0
Location: Virginia
Post: #15
last.fm is sometimes good for more obscure artists and that's the main reason I want to scrape from it. Not all of the groups I listen to have entries at the more popular sites (e.g. AllMusic).

It's not currently possible to cascade scrapers, is it? For example, it can't find an artist with scraper 1, so it tries #2, then #3. Perhaps that is another bit of functionality that I can work on if others would find it useful as well.
find quote
Post Reply