Improved allmusic.com scraper (plus a few questions)

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
azido Offline
Posting Freak
Posts: 1,881
Joined: Nov 2008
Reputation: 1
Location: Stuttgart, Germany
Post: #51
talisto Wrote:Hey Azido,

What is it about showmix that requires skin-specific fanart/thumbs?
If/when my other changes get added to the SVN, I'll consider making you a scraper that you could at least offer as a download on your site; I doubt those options would make it into the official XBMC SVN.

thanks for your statement.
the fanarts are just an addition as we already have a place to store images. so the scraper ideally should target to the artist thumbs only. they are not needed in showmix, but are prepared to look good in it. go have a look on my site and you see what i mean Wink

yeah, that was my target, as you seem to have the skills to write a scraper (while i'm a total noob on that) i would like to offer showmix users the possibility to get the artist thumbs from my site without downloading them manually.
the way of storing them will not change, so queries should always work the same (until i know how to create an api for that. i have skills in creating websites in php and a database, but not for building an api yet).

for sure that is nothing to have in svn for the masses (although those thumbs will surely look good in other skins, too), so whenever you have the time and the will to do me a favour, that would be great.

cheers,azido :;):

-=[ NOTE: The official Aeon Showmix Project is dead due to a hack of the website ]=-
But some cool guys keep coding stuff to it and made it dharma-compatible, see here:
http://forum.xbmc.org/showthread.php?tid=82899
find quote
redtapemedia Offline
UMM Project
Posts: 544
Joined: Mar 2009
Post: #52
azido Wrote:the way of storing them will not change, so queries should always work the same (until i know how to create an api for that. i have skills in creating websites in php and a database, but not for building an api yet).

All a scraper *really* requires is easily acquired "hooks" within the html / xml. A way to easily identify where the fields are exactly on the page.

But if you're interested the TVDB has some information on their API: http://thetvdb.com/wiki/index.php?title=Programmers_API

I think I recall seeing the actual source for the TVDB website somewhere too, I'm pretty sure it's freely available somewhere. You could aways modify it to suit.

( yep it's here: http://sourceforge.net/projects/tvdb/ )
find quote
chumaj001 Offline
Junior Member
Posts: 44
Joined: Oct 2009
Reputation: 0
Location: Czech Republic
Post: #53
Hello guys,
I have problem with scraping latelly. I am running latest SVN build (24059) on Windows 7 x64 an i almost cant scrap any info for music. Dicogs, Freebase and Last.fm is not working, and Allmusic, even with updated script from here downloads almost no artist thumbs at all, not to say any fanart. Since I have huge library (1200+ artists) with mostly underground music, it is a huge problem for me.

And another problem with latest build is, that when i try to scrap every artist, XBMC crashes.

Where is the problem? In the build? In the scrapers, or did the discogs changed layout?

What would be the best automated way to collect artist thumbs and fanart outside of xbmc? I tried media info, but it is also byggy and gives me very strange results.
find quote
spiff Offline
Grumpy Bastard Developer
Posts: 12,187
Joined: Nov 2003
Reputation: 82
Post: #54
nobody has yet sponsored us with crystal balls - it would be most welcome cause every day there's atleast 10 guys like you who think we can give you an answer based on absolutely no information at all Wink

see my signature

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
find quote
chumaj001 Offline
Junior Member
Posts: 44
Joined: Oct 2009
Reputation: 0
Location: Czech Republic
Post: #55
Sorry, but this question was aimed to talisto and others in this thread, who are trying to tweak scrapers and since the scrapers stopped working this week, i am asking if they know what happened.
find quote
SleepyP Offline
Posting Freak
Posts: 2,282
Joined: Nov 2005
Reputation: 4
Location: Portland, Oregon
Post: #56
I already posted some bug reports on Trac about the scraper issues.

Catchy Signature Here
find quote
Ronald Pagan Offline
Junior Member
Posts: 40
Joined: Jul 2009
Reputation: 0
Post: #57
chumaj001 Wrote:Hello guys,
I have problem with scraping latelly. I am running latest SVN build (24059) on Windows 7 x64 an i almost cant scrap any info for music. Dicogs, Freebase and Last.fm is not working, and Allmusic, even with updated script from here downloads almost no artist thumbs at all, not to say any fanart. Since I have huge library (1200+ artists) with mostly underground music, it is a huge problem for me.

And another problem with latest build is, that when i try to scrap every artist, XBMC crashes.

Where is the problem? In the build? In the scrapers, or did the discogs changed layout?

What would be the best automated way to collect artist thumbs and fanart outside of xbmc? I tried media info, but it is also byggy and gives me very strange results.
I am experiencing the exact same scraper problems. Except I am using XBMC Live Have all of the music portals changed their access?
find quote
chumaj001 Offline
Junior Member
Posts: 44
Joined: Oct 2009
Reputation: 0
Location: Czech Republic
Post: #58
Also it seems, that freebase.org is down or gone. Webpage is empty without any explanation. It just seems strange, that (in my opinion) so important part of the xbmc stops working and there is no mention about t on the forums.
find quote
talisto Offline
Junior Member
Posts: 40
Joined: Jun 2005
Reputation: 0
Post: #59
chumaj001 Wrote:Sorry, but this question was aimed to talisto and others in this thread, who are trying to tweak scrapers and since the scrapers stopped working this week, i am asking if they know what happened.

What spiff was hinting at was that when you're having any sort of problem like this, you really should collect a debug log so that we can actually see what your system is doing. Read his sig for the link on how to submit a bug report.

Discogs support does seem to be broken at the moment; it seems they're now requiring the client to request compressed data, and it looks like XBMC's curl library isn't doing that by default. (edit: actually, my scraper is set to ask for compression for Discogs so I'm not sure why it isn't working. I'll see what I can debug..) Anyhow, The other scrapers (including mine, minus the Discogs scraping) should still be working fine; I tested mine as well as the last.fm scraper on a fresh SVN build, both of which worked great.

Just a guess, but do you have the "Download additional info on library updates" option disabled in Settings > Music > Library, which was recently added to XBMC? That needs to be turned on for the music scrapers to work. I think it's off by default.
(This post was last modified: 2009-10-28 05:44 by talisto.)
find quote
talisto Offline
Junior Member
Posts: 40
Joined: Jun 2005
Reputation: 0
Post: #60
Ok, so it seems that Discogs is checking the user agent of the request, and is rejecting XBMC's user agent. It seems that they're specifically targeting XBMC, since if I enter a random user agent, it works fine, yet any user agent starting with "XBMC" is rejected. This is fairly troubling since Discogs is the default scraper for XBMC, so probably 99% of the userbase now has a failing music scraper.

Clearly they don't want XBMC scraping their pages anymore. So I suppose the question now is, do we start spoofing a browser's user agent to get around the problem, or do we merely skip Discogs and move to other sources allowing more legitimate scraping (e.g. the last.fm API)?

(edit: the real unfortunate thing is that they're even blocking XBMC's usage of the API, so we can't even use that an alternative to the HTML scraping.)
(This post was last modified: 2009-10-28 07:14 by talisto.)
find quote
Post Reply