[Release] Universal Scraper for Music Albums

  Thread Rating:
  • 5 Votes - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
scudlee Offline
Team-XBMC Member
Posts: 584
Joined: Jul 2011
Reputation: 45
Post: #11
Can't you just reverse the logic in the scraper? Instead of taking the first 3 digits, capture everything but the last 3 digits?
i.e. rather than <length>(\d{3})\d*?</length> have <length>(\d*?)\d{3}</length>
find quote
olympia Offline
Team-XBMC Member
Posts: 2,381
Joined: May 2008
Reputation: 30
Post: #12
wooopss, where did I lost my head today?
Hell, yeah this will surely do it. Will fix this when I get home.

Thanks for refreshing my mind.
find quote
night199uk Offline
Team-XBMC Member
Posts: 27
Joined: May 2009
Reputation: 0
Post: #13
hey olympia.

this seems to over run musicbrainz query/sec limit for me on a semi-regular basis. the problem is, once the qps limit kicks in musicbrainz start serving up a really simple fast reject page, which means the queries go even faster and the rate-limit just stays in force. so i have to stop scanning and restart.

i think unfortunately the real solution is a rate-limit per domain, and nothing we can do in the scraper. :-(
find quote
olympia Offline
Team-XBMC Member
Posts: 2,381
Joined: May 2008
Reputation: 30
Post: #14
^^ that' too bad. I didn't experience this yet as I didn't run a mass scraping on my side yet.
find quote
olympia Offline
Team-XBMC Member
Posts: 2,381
Joined: May 2008
Reputation: 30
Post: #15
(2012-06-10 10:58)Zippy79 Wrote:  I have found another problem and I've done a little digging around and think I know the cause. The track duration of the 6th track on this album is downloaded by the scraper as 08:45. Whereas the MusicBrainz page actually lists the duration as 00:53. The problem seems to be that the scraper retrieves the track duration in milliseconds and then takes the first 3 digits and assumes that they equate to seconds. Using this track as an example, the time in milliseconds retrieved by the scraper is 52506 (which equals 00:53). It then takes the first 3 digits, 525, and wrongly treats them as whole seconds. Which gives the result 525 / 60 = 8.75, or 08:45.

Fixed in v1.0.2
Credits to scudlee (see changelog), cheers!
find quote
Zippy79 Offline
Junior Member
Posts: 6
Joined: Jun 2012
Reputation: 0
Post: #16
Thanks olympia and scudlee! Smile
find quote
saladasalad Offline
Junior Member
Posts: 6
Joined: Jun 2012
Reputation: 0
Post: #17
Thanks for your work on this, Olympia. Much appreciated!
find quote
brettawesome Offline
Junior Member
Posts: 17
Joined: Mar 2012
Reputation: 0
Post: #18
Hi, awesome work here. How do you enable the scraper to fetch album reviews from allmusic? Not really got a clue about coding so i don't know what part of settings.xml i need to change.
find quote
olympia Offline
Team-XBMC Member
Posts: 2,381
Joined: May 2008
Reputation: 30
Post: #19
Did you look at settings.xml? It's not coding at all, it only involves some common sense and logic.

You see this for albumreviewsource:
Code:
values="last.fm|None" id="albumreviewsource"

Then you see this for example for albumratingsource:
Code:
values="MusicBrainz|allmusic.com|None" id="albumratingsource"

I suspect it's not a very difficult to guess the riddle, is it? Smile
find quote
brettawesome Offline
Junior Member
Posts: 17
Joined: Mar 2012
Reputation: 0
Post: #20
I tried that before i asked the first time and the reviews weren't getting scraped. Fair enough, i'll do it manually.
(This post was last modified: 2012-06-18 12:11 by brettawesome.)
find quote
Post Reply