Improved allmusic.com scraper (plus a few questions)

  Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Post Reply
Roborob Offline
Senior Member
Posts: 143
Joined: Jan 2009
Reputation: 0
Location: The Netherlands
Post: #46
talisto Wrote:Are you changing the scraper settings from the library with the context menu (pressing "C" from the music library screen), or are you changing the settings from (Main Menu) > Settings > Music > Library > Scraper Settings? I've noticed that when you change the settings from the context menu, it only remembers the settings for one lookup and then reverts back to your previous settings. Whether this is "by design" or a bug, I'm not sure. Going through the full settings menu should work fine though.

Thanks I did it in the context menu, I'll change them in the main menu
find quote
azido Offline
Posting Freak
Posts: 1,880
Joined: Nov 2008
Reputation: 1
Location: Stuttgart, Germany
Post: #47
this one sounds very promising, cheers for that.

one (personal) question:

as you modified an existing scraper with good results, any chances you can do me a favor and use some of your time to add the functionality to look up for artist thumbs on my site?

i began to start a resource for especially prepared artist thumbs for the use in aeon (showmix) and so far 777 thumbs are present, with a bunch of users willing to add more in the future. also we started collecting fanart, but that's maybe another topic.

the gallery is organised pretty basic, thumbs are categorised in folders by artist names, so it should be pretty easy for ppl that have the skills to write a lookup to get them downloaded. every artist thumb has a thumbnail and a full picture. unfortunately there is no api that can be used, so it would be simple html scraping; but as there is an easy structure and we don't hold additional info in general, once again it should be easy to get them scraped. we also use a search feature that returns matches by given keywords in the whole gallery (thumbs AND fanart).

i would be glad if you consider trying to do that.

cheers,azido :;):
find quote
fnwc Offline
Member
Posts: 99
Joined: Sep 2009
Reputation: 1
Post: #48
talisto Wrote:3) The "Get Fanart" is a separate setting, so as long as you keep that enabled, the scraper will still fetch fanart from htbackdrops regardless of what you set the other settings to. But FYI, I'd recommend keeping "Get artist thumbs from HTBackdrops" enabled AS WELL as getting artist thumbs from another source, because if it can't find a thumb from one source, it will automatically look to the other source. I've actually given HTBackdrops a higher priority so that it will look there first, as all the artist thumbs on that site are optimized for XBMC. But they just don't have a very large database, so it's best to have another source to fall back on.

Do you have prioritization built in for the artist thumbs and album thumbs such that we can just enable everything in the options but it will still try to get art in your preferred order?
find quote
talisto Offline
Junior Member
Posts: 40
Joined: Jun 2005
Reputation: 0
Post: #49
Hey Azido,

azido Wrote:as you modified an existing scraper with good results, any chances you can do me a favor and use some of your time to add the functionality to look up for artist thumbs on my site?

Well, I'm a bit reluctant to do any more work on this scraper until my existing changes have been added to SVN (*if* they're ever going to be added to the SVN.. it's been over 3 weeks since I submitted the patch. I'm wondering if I've already added in more options than the dev team would prefer). But i'm also reluctant to work on features for a specific mod of a specific skin, when htbackdrops is already a fairly competent resource for general fanart; they have almost 4000 backdrops now. What is it about showmix that requires skin-specific fanart/thumbs?

Aside from that, your site would need (or should have) some tweaks before it would be scraper-friendly; artist thumbs and fanart are handled separately in XBMC so your search should have the option of returning results from one or the other. You really should consider having some sort of basic API as well; scraping HTML is somewhat sketchy, as slight changes in the design of the website can often easily break the scraper.

If/when my other changes get added to the SVN, I'll consider making you a scraper that you could at least offer as a download on your site; I doubt those options would make it into the official XBMC SVN.
find quote
talisto Offline
Junior Member
Posts: 40
Joined: Jun 2005
Reputation: 0
Post: #50
fnwc Wrote:Do you have prioritization built in for the artist thumbs and album thumbs such that we can just enable everything in the options but it will still try to get art in your preferred order?

Mostly. Since the script is first and foremost an Allmusic scraper, I opted to give Allmusic the top priority for both if they are enabled, but I'd recommend turning the allmusic thumb options off to give priority to the others. If everything is enabled, the priority order for artist thumbs is (highest priority to lowest): Allmusic, HTBackdrops, Last.fm, Discogs. The priority order for album thumbs is: Allmusic, Last.fm, Discogs. There's currently no way to re-order the priority as an option.
(This post was last modified: 2009-10-26 03:54 by talisto.)
find quote
azido Offline
Posting Freak
Posts: 1,880
Joined: Nov 2008
Reputation: 1
Location: Stuttgart, Germany
Post: #51
talisto Wrote:Hey Azido,

What is it about showmix that requires skin-specific fanart/thumbs?
If/when my other changes get added to the SVN, I'll consider making you a scraper that you could at least offer as a download on your site; I doubt those options would make it into the official XBMC SVN.

thanks for your statement.
the fanarts are just an addition as we already have a place to store images. so the scraper ideally should target to the artist thumbs only. they are not needed in showmix, but are prepared to look good in it. go have a look on my site and you see what i mean Wink

yeah, that was my target, as you seem to have the skills to write a scraper (while i'm a total noob on that) i would like to offer showmix users the possibility to get the artist thumbs from my site without downloading them manually.
the way of storing them will not change, so queries should always work the same (until i know how to create an api for that. i have skills in creating websites in php and a database, but not for building an api yet).

for sure that is nothing to have in svn for the masses (although those thumbs will surely look good in other skins, too), so whenever you have the time and the will to do me a favour, that would be great.

cheers,azido :;):
find quote
redtapemedia Offline
UMM Project
Posts: 551
Joined: Mar 2009
Post: #52
azido Wrote:the way of storing them will not change, so queries should always work the same (until i know how to create an api for that. i have skills in creating websites in php and a database, but not for building an api yet).

All a scraper *really* requires is easily acquired "hooks" within the html / xml. A way to easily identify where the fields are exactly on the page.

But if you're interested the TVDB has some information on their API: http://thetvdb.com/wiki/index.php?title=Programmers_API

I think I recall seeing the actual source for the TVDB website somewhere too, I'm pretty sure it's freely available somewhere. You could aways modify it to suit.

( yep it's here: http://sourceforge.net/projects/tvdb/ )
find quote
chumaj001 Offline
Junior Member
Posts: 44
Joined: Oct 2009
Reputation: 0
Location: Czech Republic
Post: #53
Hello guys,
I have problem with scraping latelly. I am running latest SVN build (24059) on Windows 7 x64 an i almost cant scrap any info for music. Dicogs, Freebase and Last.fm is not working, and Allmusic, even with updated script from here downloads almost no artist thumbs at all, not to say any fanart. Since I have huge library (1200+ artists) with mostly underground music, it is a huge problem for me.

And another problem with latest build is, that when i try to scrap every artist, XBMC crashes.

Where is the problem? In the build? In the scrapers, or did the discogs changed layout?

What would be the best automated way to collect artist thumbs and fanart outside of xbmc? I tried media info, but it is also byggy and gives me very strange results.
find quote
spiff Offline
Retired Developer
Posts: 12,386
Joined: Nov 2003
Post: #54
nobody has yet sponsored us with crystal balls - it would be most welcome cause every day there's atleast 10 guys like you who think we can give you an answer based on absolutely no information at all Wink

see my signature
find quote
chumaj001 Offline
Junior Member
Posts: 44
Joined: Oct 2009
Reputation: 0
Location: Czech Republic
Post: #55
Sorry, but this question was aimed to talisto and others in this thread, who are trying to tweak scrapers and since the scrapers stopped working this week, i am asking if they know what happened.
find quote
SleepyP Offline
Posting Freak
Posts: 2,276
Joined: Nov 2005
Reputation: 4
Location: Portland, Oregon
Post: #56
I already posted some bug reports on Trac about the scraper issues.

Catchy Signature Here
find quote
Ronald Pagan Offline
Junior Member
Posts: 42
Joined: Jul 2009
Reputation: 0
Post: #57
chumaj001 Wrote:Hello guys,
I have problem with scraping latelly. I am running latest SVN build (24059) on Windows 7 x64 an i almost cant scrap any info for music. Dicogs, Freebase and Last.fm is not working, and Allmusic, even with updated script from here downloads almost no artist thumbs at all, not to say any fanart. Since I have huge library (1200+ artists) with mostly underground music, it is a huge problem for me.

And another problem with latest build is, that when i try to scrap every artist, XBMC crashes.

Where is the problem? In the build? In the scrapers, or did the discogs changed layout?

What would be the best automated way to collect artist thumbs and fanart outside of xbmc? I tried media info, but it is also byggy and gives me very strange results.
I am experiencing the exact same scraper problems. Except I am using XBMC Live Have all of the music portals changed their access?
find quote
chumaj001 Offline
Junior Member
Posts: 44
Joined: Oct 2009
Reputation: 0
Location: Czech Republic
Post: #58
Also it seems, that freebase.org is down or gone. Webpage is empty without any explanation. It just seems strange, that (in my opinion) so important part of the xbmc stops working and there is no mention about t on the forums.
find quote
talisto Offline
Junior Member
Posts: 40
Joined: Jun 2005
Reputation: 0
Post: #59
chumaj001 Wrote:Sorry, but this question was aimed to talisto and others in this thread, who are trying to tweak scrapers and since the scrapers stopped working this week, i am asking if they know what happened.

What spiff was hinting at was that when you're having any sort of problem like this, you really should collect a debug log so that we can actually see what your system is doing. Read his sig for the link on how to submit a bug report.

Discogs support does seem to be broken at the moment; it seems they're now requiring the client to request compressed data, and it looks like XBMC's curl library isn't doing that by default. (edit: actually, my scraper is set to ask for compression for Discogs so I'm not sure why it isn't working. I'll see what I can debug..) Anyhow, The other scrapers (including mine, minus the Discogs scraping) should still be working fine; I tested mine as well as the last.fm scraper on a fresh SVN build, both of which worked great.

Just a guess, but do you have the "Download additional info on library updates" option disabled in Settings > Music > Library, which was recently added to XBMC? That needs to be turned on for the music scrapers to work. I think it's off by default.
(This post was last modified: 2009-10-28 05:44 by talisto.)
find quote
talisto Offline
Junior Member
Posts: 40
Joined: Jun 2005
Reputation: 0
Post: #60
Ok, so it seems that Discogs is checking the user agent of the request, and is rejecting XBMC's user agent. It seems that they're specifically targeting XBMC, since if I enter a random user agent, it works fine, yet any user agent starting with "XBMC" is rejected. This is fairly troubling since Discogs is the default scraper for XBMC, so probably 99% of the userbase now has a failing music scraper.

Clearly they don't want XBMC scraping their pages anymore. So I suppose the question now is, do we start spoofing a browser's user agent to get around the problem, or do we merely skip Discogs and move to other sources allowing more legitimate scraping (e.g. the last.fm API)?

(edit: the real unfortunate thing is that they're even blocking XBMC's usage of the API, so we can't even use that an alternative to the HTML scraping.)
(This post was last modified: 2009-10-28 07:14 by talisto.)
find quote
Post Reply