Improved allmusic.com scraper (plus a few questions) - Printable Version
+- XBMC Community Forum (http://forum.xbmc.org)
+-- Forum: Development (/forumdisplay.php?fid=32)
+--- Forum: Scraper Development (/forumdisplay.php?fid=60)
+--- Thread: Improved allmusic.com scraper (plus a few questions) (/showthread.php?tid=57501)
Improved allmusic.com scraper (plus a few questions) - talisto - 2009-09-06 08:03
Longtime user, (almost) first-time poster. Anyhoo, I've been searching for the best music scraper for a while now, and haven't really been happy with any of them.. allmusic.com has fantastic artist information/reviews but awful photos (and the scraper never really seemed to get all the proper info), discog has decent photos but really limited artist information, and last.fm has medicore everything. So I started mucking around with the existing allmusic scraper (from r22528) and I've fixed a few problems with it, and improved it a bit (for my own needs, anyhow). Here's what I've done:
- The artist information (aside from the bio) wasn't being parsed properly; the ParseAMGArtist function was being passed the value "test" instead of the actual URL. Fixed.
- The album information wasn't being parsed properly either; the ParseAMGAlbum function was being passed the value "placeholder" instead of the actual URL. Fixed.
- The caching was glitchy, it seems that the cache file should be unique to each artist, whereas it was set to use the same cache file for every artist, so subsequent lookups would often have duplicate/incorrect information from the previous artist. Fixed.
- The scraper was set to only get thumbs from htbackdrops (same with all the other music scrapers now?), which is fine and may be preferrable to some, but htbackdrops barely has any thumbnails for the artists in my library. I noticed that discogs generally has decent photos for their artists, and are quite extensive, so I've changed the scraper to also check discogs for thumbs (though it will still use the thumb from htbackdrops as the primary if one is available).
With these changes, pretty much every artist and album in my library has a proper thumb, as well as full bio/reviews/discography/etc. Definitely a HUGE improvement.
I've posted it here: EDIT: my changes are now in the latest SVN builds. Just download that instead!
Hopefully it helps someone else as well Maybe someone can merge these fixes into SVN.
1st question for the scraper gurus: is it possible to nest URL fetches or functions? I couldn't figure out how to fetch a URL, parse it, and then use the resulting string to fetch another URL and parse that.
2nd question: Is there a way to ensure a variable/buffer is URL encoded properly for use in a GET string?
- spiff - 2009-09-06 10:02
1) you can chain as deep as you want - just see how we call e.g. ParseAMGArtist and monkey that.
2) current no, but i've been pondering adding a urlencode feature - seems you need it as well, so it will come soonish.
- ashlar - 2009-09-06 12:31
spiff, do you think this new and improved scraper could be included in future releases?
Asking just to understand whether it's worth adding it manually or if with a bit of patience I could find it by regularly updating.
- spiff - 2009-09-06 12:42
if a trac is posted it will get considered, if not it will be ignored
- blacklist - 2009-09-07 16:47
Much MUCH improved! Thank you for your work on this talisto!
Now I need to rescrape 26,000 files....
- ashlar - 2009-09-08 08:48
spiff, does the trac need to be posted by the author?
- spiff - 2009-09-08 09:37
preferably yes, but no problem making an exception if the author doesnt pop up..
- talisto - 2009-09-08 10:35
blacklist Wrote:Much MUCH improved! Thank you for your work on this talisto!
Glad to hear it's working for you!
I didn't submit this as a patch to trac because I assume there's a reason why all the music scrapers have been switched to use htbackdrops exclusively for thumbnails, and my inclusion of discogs is sort of a personal preference rather than a fix. However the other changes I've made are clearly bugfixes so perhaps I should submit a trac with only those changes, so that at least those get fixed promptly.
I've never used trac before, though, so I'd better read up on the "HOW-TO submit a patch" guidelines first!
- spiff - 2009-09-08 10:38
reason is; major screwup on my behalf. must have been drunk
the thumbs, however, you are dead wrong on. we still happily parse amg thumbs in GetAMGArtist and those are added first - so they get the priority.
as for including discogs, just make it a setting and default it to false and everybody's happy
edit2: just remember, i haven't gotten around to adding scraper settings to music scrapers yet. will do asap
- stokedfish - 2009-09-08 13:36
spiff Wrote:as for including discogs, just make it a setting and default it to false and everybody's happy
looks like you guys don't like discogs. why is that? just wondering...