Can't you just reverse the logic in the scraper? Instead of taking the first 3 digits, capture everything but the last 3 digits?
i.e. rather than <length>(\d{3})\d*?</length> have <length>(\d*?)\d{3}</length>
[Release] Universal Scraper for Music Albums
scudlee
Team-XBMC Member Posts: 584 Joined: Jul 2011 Reputation: 45 |
2012-06-10 15:30
Post: #11
|
| find quote |
olympia
Team-XBMC Member Joined: May 2008 Reputation: 30 |
2012-06-10 15:41
Post: #12
wooopss, where did I lost my head today?
Hell, yeah this will surely do it. Will fix this when I get home. Thanks for refreshing my mind.
ASUS P5N7A-VM - Intel C2D E7300 - 2GB RAM - Silverstone GD02 Black - MS MCE Remote - Patriot Warp 32GB SSD Drive |
| find quote |
night199uk
Team-XBMC Member Posts: 27 Joined: May 2009 Reputation: 0 |
2012-06-10 16:24
Post: #13
hey olympia.
this seems to over run musicbrainz query/sec limit for me on a semi-regular basis. the problem is, once the qps limit kicks in musicbrainz start serving up a really simple fast reject page, which means the queries go even faster and the rate-limit just stays in force. so i have to stop scanning and restart. i think unfortunately the real solution is a rate-limit per domain, and nothing we can do in the scraper. :-( |
| find quote |
olympia
Team-XBMC Member Joined: May 2008 Reputation: 30 |
2012-06-10 22:18
Post: #14
^^ that' too bad. I didn't experience this yet as I didn't run a mass scraping on my side yet.
ASUS P5N7A-VM - Intel C2D E7300 - 2GB RAM - Silverstone GD02 Black - MS MCE Remote - Patriot Warp 32GB SSD Drive |
| find quote |
olympia
Team-XBMC Member Joined: May 2008 Reputation: 30 |
2012-06-10 22:19
Post: #15
(2012-06-10 10:58)Zippy79 Wrote: I have found another problem and I've done a little digging around and think I know the cause. The track duration of the 6th track on this album is downloaded by the scraper as 08:45. Whereas the MusicBrainz page actually lists the duration as 00:53. The problem seems to be that the scraper retrieves the track duration in milliseconds and then takes the first 3 digits and assumes that they equate to seconds. Using this track as an example, the time in milliseconds retrieved by the scraper is 52506 (which equals 00:53). It then takes the first 3 digits, 525, and wrongly treats them as whole seconds. Which gives the result 525 / 60 = 8.75, or 08:45. Fixed in v1.0.2 Credits to scudlee (see changelog), cheers!
ASUS P5N7A-VM - Intel C2D E7300 - 2GB RAM - Silverstone GD02 Black - MS MCE Remote - Patriot Warp 32GB SSD Drive |
| find quote |
Zippy79
Junior Member Posts: 6 Joined: Jun 2012 Reputation: 0 |
2012-06-11 08:16
Post: #16
Thanks olympia and scudlee!
|
| find quote |
saladasalad
Junior Member Posts: 6 Joined: Jun 2012 Reputation: 0 |
2012-06-12 05:42
Post: #17
Thanks for your work on this, Olympia. Much appreciated!
|
| find quote |
brettawesome
Junior Member Posts: 17 Joined: Mar 2012 Reputation: 0 |
2012-06-15 07:11
Post: #18
Hi, awesome work here. How do you enable the scraper to fetch album reviews from allmusic? Not really got a clue about coding so i don't know what part of settings.xml i need to change.
|
| find quote |
olympia
Team-XBMC Member Joined: May 2008 Reputation: 30 |
2012-06-15 10:09
Post: #19
Did you look at settings.xml? It's not coding at all, it only involves some common sense and logic.
You see this for albumreviewsource: Code: values="last.fm|None" id="albumreviewsource"Then you see this for example for albumratingsource: Code: values="MusicBrainz|allmusic.com|None" id="albumratingsource"I suspect it's not a very difficult to guess the riddle, is it?
ASUS P5N7A-VM - Intel C2D E7300 - 2GB RAM - Silverstone GD02 Black - MS MCE Remote - Patriot Warp 32GB SSD Drive |
| find quote |
brettawesome
Junior Member Posts: 17 Joined: Mar 2012 Reputation: 0 |
2012-06-18 12:09
Post: #20
I tried that before i asked the first time and the reviews weren't getting scraped. Fair enough, i'll do it manually.
(This post was last modified: 2012-06-18 12:11 by brettawesome.)
|
| find quote |

Search
Help