Last.fm scraper in development - help wanted
#1
Question 
http://forum.xbmc.org/showthread.php?tid=38378
spiff Wrote:i have considered it but i do not feel comfortable scraping a site that provides open api's. that being said, anyone else is ofc free to do it

Hi spiff!

I've already tried to begin with the last.fm scrapper. But i have some problems understanding how the flows and interaction between the scrapper and xbmc works...

I started modifying your allmusic scrapper. Just to have something to begin with..

I would like to fully understand i the flow.

First, the scrapper create the albumsearchurl, i maganaged to get that working... after a working url has been created xbmc makes the request to it and then the regexp to parse the resuts. What follows is what i don't fully understand, once i got the basic information, album title, and url.. how the request for the album information url is done. I never get a list of albums or anything.

This is mi getalbumsearchresult :
<GetAlbumSearchResults dest="8">
<RegExp input="$$5" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
<RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;2000&lt;/year&gt;&lt;genre&gt;test&lt;/genre&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;http://www.last.fm/music\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
<expression repeat="yes">&lt;a href=&quot;(.*)&quot;&gt;(.*)&lt;/a&gt; &lt;span</expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetAlbumSearchResults>

I don't know if all fields (year,genre,title,etc) need to exist in the result of the first request, i don't have them available at first, but i would from the url fetched in the regexp...

Perhaps there is something wrong with the regexp, ive tried m any simple ones, with no result, is there any way to force a valid result?. So i can get to the next step in the scrapper?.

Well, i don't really know if i'm actually making any sense here.. but thank you very much in advance for your help.

Pyro-X
Reply
#2
okay, i'm in a bit of a hurry so i'll just go fast.

createalbumsearchurl - creates the url
getalbumsearchresults - returns a list of possible matches. here you only fill title, artist, year, genre (actually only title & artist is required). xbmc chooses which of these the user want (either letting the user choose from a list or scoring the matches and taking the one with the highest score), fetches this url then runs the scraper function GetAlbumDetails on the contents of that url. THIS is where you will fill ALL details.
Reply
#3
spiff Wrote:okay, i'm in a bit of a hurry so i'll just go fast.

createalbumsearchurl - creates the url
getalbumsearchresults - returns a list of possible matches. here you only fill title, artist, year, genre (actually only title & artist is required). xbmc chooses which of these the user want (either letting the user choose from a list or scoring the matches and taking the one with the highest score), fetches this url then runs the scraper function GetAlbumDetails on the contents of that url. THIS is where you will fill ALL details.


Thank you very much for your explanation.. finally i got it to work Smile. My blocking step was that i didn't know why was the need of two regexp at the GetAlbumSearchResults section. I still don't know, but wrote then two regexp one for the "search title" and another one for the actual results. And then finally got into the next step Smile.

Anyway, what fields are mandatory for the album detail?, there isn't so much information on last.fm. I don't know what i should do with the "tags" because a tag here can be, a genre, a mood, and sometimes any other crazy thing the last.fm user decides xD.

Anyway thank you very much, i'm already making progress with it Smile. I even can get the cover !! Smile)

Oh!, one more thing i'm thinking about... if mp3s are already tagged with its genre , year of publish, and then with the scrapper get some of that information again from the web, which one is taken into account for the xbmc db and the library mode?. Ones from the id3 tags, or the ones from the scrapping result.

Thanks,

Pyro-X
Reply
#4
I would love to be able to use tags as genres so I could go through my library by last.fm tags. It would be nice if tags such as "seen live" and "awesome", etc. were filtered out though.

Keep up the good work, Last.fm is much better than allmusic.
Reply
#5
Information 
See:
http://wiki.xbmc.org/?title=Category:Scraper
and:
http://forum.xbmc.org/showthread.php?tid=38379
and:
http://forum.xbmc.org/showthread.php?tid=38378

Wink
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Reply
#6
looking good pyro-x, keep up the hard work, last.fm will be a real nice repo to have as a scraper.
Reply
#7
Hey guys, a last.fm scraper would be brilliant!! I would love to have my collection updated.

Pyro-x are you and rwparris2 going to be working on this together Huh
Reply
#8
spyrojyros_tail Wrote:Hey guys, a last.fm scraper would be brilliant!! I would love to have my collection updated.

Pyro-x are you and rwparris2 going to be working on this together Huh

I pretty much gave up on it -- couldn't get my head to wrap around what was going on, and decided to waste my time on things that didn't frustrate me.

Buut I really want it so if Pyro-x or anyone else wants to email me / PM me feel free.
(Any emails that want whatever I have so far will be replied with " " because thats basically all I managed)
Always read the XBMC online-manual, FAQ and search and search the forum before posting.
For troubleshooting and bug reporting please read how to submit a proper bug report.

If you're interested in writing addons for xbmc, read docs and how-to for plugins and scripts ||| http://code.google.com/p/xbmc-addons/
Reply
#9
spiff Wrote:...either letting the user choose from a list or scoring the matches and taking the one with the highest score...

How are the returned results scored? If the site supports it, can I return a relevance value so that the proper entry from the returned list is chosen?
The XBMC team, plug-in devs, skinners, etc. do this for us for FREE in their spare time because they want to. Think about that for a second before you start bitching...
Reply
#10
usually it's a equally weighted fuzzy string match of artist and album.
with 16266 add the following in the return xml

<relevance scale="yy">x.xx</relevance>

where x.xx is a number between 0 and 1 and yy is an optional scale (if your number is scaled otherwise) and your wish is granted.
Reply
#11
luv u spiff Big Grin
The XBMC team, plug-in devs, skinners, etc. do this for us for FREE in their spare time because they want to. Think about that for a second before you start bitching...
Reply
#12
hey guys, its good to find out somebody is working on more music scrapers. Most of my albums come up blank on allmusic.com. An alternative scraper to work around this issue would be great! Last.FM and discogs.com are my favorite sites, and i'd be happy to help out with scraper development for either site.

Is there any public place where i can find the current development version of your scraper ? Ofcourse i'd like to know where the development of the lastfm scraper is at. I have a lot of scripting experience (all sorts of stuff: PHP, Perl, pl/pgsql, LUA script, etc..) and have constructed a lot of complex regular expressions in the past, so if any help is needed, let me know.. Smile
Reply
#13
@pyro-x
Are you still working on this? Do you want any assistance? If so, post your latest revision and I'll see where I can help out.
Reply
#14
Pyro-x is either so busy with his work on the scraper that he doesn't even have time to check this message board, or he has given up on it altogether. if the second option were to be true, also taking in account that work on discogs.com scraper also seems to have been discontinued, there's no doubt - some other people should take over the alternative scraper development. personally I'm ready to start working on this scraper, however I have serious doubts that last.fm really is the best source of information, as the only real thing that it provides are album covers. discogs.com seems slightly better, but I don't like it listing all the different issues of the record (Canadian Vinyl editions and stuff like that).
Reply
#15
last.fm is sometimes good for more obscure artists and that's the main reason I want to scrape from it. Not all of the groups I listen to have entries at the more popular sites (e.g. AllMusic).

It's not currently possible to cascade scrapers, is it? For example, it can't find an artist with scraper 1, so it tries #2, then #3. Perhaps that is another bit of functionality that I can work on if others would find it useful as well.
Reply

Logout Mark Read Team Forum Stats Members Help
Last.fm scraper in development - help wanted0