Kodi Community Forum
Scraping music data - Printable Version

+- Kodi Community Forum (https://forum.kodi.tv)
+-- Forum: Discussions (https://forum.kodi.tv/forumdisplay.php?fid=222)
+--- Forum: Kodi related discussions (https://forum.kodi.tv/forumdisplay.php?fid=6)
+--- Thread: Scraping music data (/showthread.php?tid=249493)

Pages: 1 2 3 4


Scraping music data - DaveBlake - 2015-11-18

It is generally agreed that scraping music data from both online sources and local NFO file, additional to the tag data in music files themselves, needs some attention. Some work is already underway, but I would like to have a broad discussion about what users currently do, would like to do and why, so as to inform the future dev team design work.

Here, rather than Feature Requests or Application Development, so as to reach the widest audience (I hope).

In my view there are 3 levels at which music data comes into Kodi, each dependent on the next, but also there are reasons for each. It starts when you add a music source:

a) view it as files seeing all file properties, that means tag data if the file is tagged.
b) scan that tag data into a library (giving artists, albums, search facilities, smart playlists etc., flexible music browsing and play selection)
c) scrape extra data for artists and albums from external sources or local NFO files - fan art, biogs, ratings etc.

Here I would like to focus on c).

In the future some kind of music recognition lookup could be available to enable a dream automatic and accurate music identification, but for now I think we should look at how to make the best of external databases looking up using the artist names etc. provided in the tag data. There is also the simpler concept of proving an easy way for users to load their own images, text etc. in bulk.

The data is out there, and we want Kodi to enrich the users music with images and information. But we also want to target the aquisiton of that data, not repeatedly look in the wrong place for that genre of music etc. Speed of aquisition is also an issue, especially for large music collections, so initial scraping needs to be asynchronous going on the background so the user can get on and play some music in the mean time.

But what are we scraping? In video, so far files not often tagged, you need to scrape to create a video library. Scraping is based on the source files, and you can easily associate the scraper you want to use with the location the files e.g. all TV series in one place, movies in another. You set the content of a source folder/drive and set scraper parameters.

With music we create the library from the tag data, then we scrape the artists and albums (using name and album title), the derrived data not the files themselves. No matter how you organise your files you can end up with artists in our library that do not have a source folder - featured artists, composer or conductor, many reasons why an album (or song) could have many artists.

While you could trace back through the database from artist to songs to files and hence source folders, there is nothing to ensure that the files all have a single common source folder or even drive. It seems to me that unlike video, we need to set scraper parameters for a collection of artists. Not artist by artist, that would be tedious, but maybe by genre?

Then do we separate scraping artists from scraping albums?

I think there are reasons for doing that, for example language. You may be happy to get the best album art from a US site, but prefer artist biog data in your own language. In my case the artist art is a bonus worth scraping, but I have sufficient album data and cover art in my tags, and since artist+title alone seems to lead to misidentification of my collection I wouldn't bother with scraping albums.

But users out there - do you scrape both artists and albums, just one, neither and why?

And those that use NFO - how, what and why?

I know what we all want is an enriched user experience just to appear by magic, but we need to get there step at a time.


RE: Scraping music data - zag - 2015-11-18

I am interested in music libraries so c)

My music is tagged correctly with Mp3tag (and more often Picard these days) so I have no problem scraping online sources.

IMO there should be an option to separate feat or contributing artists completely into a new node. It should have nothing to do with my normal Artists view. Things like composer, conductor should also have their own nodes.

Basically I think we need the concept of "primary" artist.


RE: Scraping music data - DaveBlake - 2015-11-18

Zag I think it is a bit of a separate topic, but we have "primary" artist support in albumartists. I agree with the separate nodes idea - (primary/album) artists, (featured/other) artists, composers, DJ etc. - and it is something easy to add.

But back to scraping Smile

Primary artist can be multiple - conductor & orchestra, so still no single folder for each of them.

From your external DB knowledge, they provide data based on a name or name & title lookup, is that right?
I presume that you scrape both artist and album, but from the same external DB? Do you ever mix it up, and if so why?


RE: Scraping music data - ronie - 2015-11-18

i'm using foobar2000 to tag my mp3's.
i'll let it fill these id3 tags:
  • Artist Title
  • Album Title
  • Song Title
  • Track Number
  • Date
  • Genre
  • Disc Number

most tagging software will use the CDDB to fetch data for those tags.
afaik CDDB doesn't provide any other info, besides the tags listed above, so i just keep it at that.
i can't be arsed to manually fill in any tags myself. nor am i interested in bloating my mp3's by inserting artwork.

personally i have no use for conductor/performer/composer or artist1 feat. artist2 vs. artist3 stuff as none of that applies to my collection.
i don't have compilation albums either, so i have no opinion on all of that.

i scrape both additional artist & album info, using the universal scrapers.
why? because i like to see all the available eye-candy in kodi.

i can't remember how long scraping took and not really bothered by the speed of it either.
it's a one time process, and in my case, that was years ago.

after the scan was completed, i used the export library option in kodi
to export everything to separate .nfo files. exported thumbs/fanart as well.
the reason for this is quite obvious, if i ever happen to need to rescan my music,
kodi will simply use the local info instead of needing to do the online lookup again.
which speeds things up a whole lot.

in specific cases i manually create/edit .nfo files, if the scraper was unable to find some obscure artist or album.
i'll manually download artwork for those cases as well.


that's about it basically i think...
thanx for your time and effort in improving the kodi music side of things! :-)


RE: Scraping music data - zag - 2015-11-18

(2015-11-18, 14:02)DaveBlake Wrote: From your external DB knowledge, they provide data based on a name or name & title lookup, is that right?
I presume that you scrape both artist and album, but from the same external DB? Do you ever mix it up, and if so why?

Just having a quick look at the logs, most lookups from xbmc on TADB are done from the musicbrainz id and most are correct. A few wrong lookups but not many.

I've never seen artist and album mixed up no, but not quite sure what you mean there.


RE: Scraping music data - DarkHelmet - 2015-11-18

I scrape artists and albums. I use MP3tag for my files. Mostly these tag fields are filled:

artist
album
title
track
year
genre
albumartist
composer (if I have the info)
conductor (for the very few classical albums I have)

Files I download from amazon also have the "amazon.com song id" in the commentary field. So I leave that too. I also embed the album cover.

For artists that are not only part of a compilation I create a folder on my NAS whene my files are. I manually download a thumbnail (folder.jpg), fanart, logo and put that in the folder. I do the same for cdart. I save it into the album folder.

I frequently update zag's audiodb with German artist information, album information, upload artwork etc. So thanks zag for the amazing work you did there.

I know many users use the addon cdart manager for managing music artwork.

http://forum.kodi.tv/showthread.php?tid=77031

I don't know how that works. In the German kodi forum it is the recommended tool to use for the music section so I guess many users will use it. I'm a bitch when it comes to my music so I do it all manually.

I also manually changed the artist with a "feat." or a "vs." or something like that to "artist a" / "artist b". Sometimes I moved the feat. artist from the artist field to the song name field "song a (feat. artist b)".

I have "override tags with online information" disabled. I think when the musicbrainz integration was introduced some years back it was enabled by default and I had some funny results when my library was updated (duplicate songs, albums, cyrillic artist names and other foreign letters). Maybe these things are better now but I never bothered to use the musicbrainz feature again. Also I'm a live bootleg collector and many of these bootlegs naturally do not have a musicbrainz id.

This is my current behaviour.

If it's okay here are some thoughts about the whole musicbrainz thing: I have a feeling that this feature was somehow never completed. It has a "half completed" feeling to me. If I'm informed correctly the dev who was responsible for the musicbrainz integration left after the feature was merged. Please don't get this wrong but I fail to see the benefit of it (from a user's perspective, could totally be possible that it helps for the inner workings of kodi).

"Starting in v13 "Gotham", Kodi has "MusicBrainzID" integration. This allows your music metadata to be updated automatically when that data is updated on the MusicBrainz database if the music has MusicBrainz ID tags."

Does that even happen? If so, in what intervals? If there's some undiscovered greatness in using musicbrainz id's I might change my mind to use it. So please convince me.


RE: Scraping music data - zag - 2015-11-18

Yep mbid's are used to lookup most artwork and details for artists and albums now.

It basically gives Kodi a unique ID (like IMDB id and TheTVDB id) to lookup data from external sites.


RE: Scraping music data - WelshPaul - 2015-11-18

Hi Dave, thanks as ever for all the thought and effort you're putting into this.

The 2 main things I can't get using mp3tag are album description and artist biog – things like birthplace, etc, may be nice for completeness but rarely if ever use them. I don't care about site ratings just my own for playlists, so don't need nfos for this other than for quick updating of the library (for which I use MediaElch). Currently I copy the info from AllMusic when I rip the album. The reason I don't use the scrapers are:

Album description – particularly for classical music I want to read about the composition not the album, i.e. want to read about Beethoven and his symphony not some Russian pianist I've never heard of and his playing style (sorry – I'm a pleb).
Artist description – updating when a new album, breakup, death (yikes), etc, is a pain..

Best solution from my point of view is the auto-updating url to the all-music (or similar) artist bio page and the ability to manually alter the album description lookup path for composition rather than album. I know this was (is?) possible and it's still in the wiki but I could never get it to work properly. Ideally, I'd like this done without nfos. I don't know any other program that uses nfo's. It would be much more portable if the urls were just stored in the db and then output to an available tag if people wanted. Admittedly, this could get completely buggered by changes in the scraper website's set-up, which I guess is the main reason most of us keep nfo files as a fall-back.

I'd also like to point out that one key reason for not using Kodi to scrape, rate, etc, is that if you're using Apple or Amazon TV with a 3 or 6 button remote it's nigh on impossible to quickly edit/correct. Therefore much easier to do off-system and then run a library update when you're done.

Hope that helps.


RE: Scraping music data - beesmyer - 2015-11-18

I enjoy as musch feature rich experience as possible. I scrape artists and albums.
A persons Home Theatre should be very customizable to create a personal experience for anyone. Music presents more of a challenge than videos do.
These are just a few of the things that present challenges...

A lack of widely accepted standards,
polluted online data sources
lack of information online
personal preference influencing genre/styles/moods
Multiple artists, albums, and songs all sharing similar names.

Because of these things I think it will always be necessary to have a unique ID and a local scraping feature.
Currently MusicBrainz is the best soluton to uniquely identifying everything.

I build my library locally with all NFO's and artwork using MEdiaElch for music. In the Past I have let Kodi scrape online sources and exported the library. One problem with this is that Kodi does not export the ibrary in the same way that it scrapes sources. Having an exported library is only half useful. I've used CDart manager as well but this was tedious, slow, and lacked customization.

I use picard to tag all music but i don't imbed artwork. I like keeping my music as lightweight as possible and easy to update and improve. so nothing but the basic tags in my music

I think music is too complex and personal to require one common library structure. Although for the sake of import and export consistency there would need to be one agreed upon format... ie.. /album artist/album (year)/track. Artist - title.ext. Besides that, it's really currently irrelevant, Kodi uses tags and scrapes online sources for matches.

I don't know why it can't be as simple as this...
Tell kodi to use online sources or local sources. Start with music that is tagged with MB tags. Kodi scans the music and lcoal nfo's for mathing tags or online sources and stores it accordingly. I don't know why it can't go as far to specify what to get online and what to get from local nfo's as well. ie genre, moods, style, bio, etc... This could even easily allow for custom tags in music instead of wating for onlline sources and industry standards to get straightened out. Then if I have a certain field in my NFO's under a certain artist or album it would use that.

MAybe even just a musicbrainz scraper could be written easily. I don't mean to undermine all the hours and hard work that goes into creating all this software. Please don't take that the wrong way. Seems reasonable this day in age to require certain standards for music tagging to get the most out of kodi. Even if the devs don't want to do that....

1- Have a recomended folder structure so importing and exporting is consistent if desired
2- Read musicbrainz tags fro music files
3- Scan local NFO's looking for matching musicbrainz tags or online sources depending on what the user selected.

This is not difinitive. I know other people will have plenty of good things to add and come up with better ideas than me. But i'd even pay something to have a scraper that will specifically use MB tags from my music and I can tell it to scan and save local information or online information.


RE: Scraping music data - beesmyer - 2015-11-19

Here’s some reasons I like the idea of XML files. IN this case .NFO’s.

I want things simple and easy. But not at the expense of flexibility, consistency, durability and options. I think an NFO is a great link between keeping things simple yet allowing for cross platform consistency, durability of files, growth for changes and a hub to connect any sort of source.

Data Integrity and ease of handling; I think Music, videos, and digital photos should have the minimum tags and unique identifiers added to them. Then other details and things can be added to a buddy nfo file. This way they are kept light weight, easy as possible for any software to handle, get opened and modified as little as possible, and when changes are made it it’s much easier to backup an NFO of 10kb than it is an entire FLAC album of 300 mb.

Currently Local fetching and scraping is likely to be done by software that uses NFO’s, not read the tags in music files. Like MediaElch. AN NFO affords them more flexibility. It would be possible for them to write sources, URLS’s, identifiers, history, or anything else to the NFO to make them function better. Local/specific artwork is a big thing for me. The NFO is sort of a hub for me to keep things straight on that right now.

Online sources don’t use the same methods to identify artist, albums, etc… More are starting to US musicbrainz ID’s but not all do now nor will they in the future. An NFO, if nothing else, can serve as a hub to bring all this together and link an artist with any number of online sources even if there are 10 other artists with the same name. Even if nothing more than to initially set up Kodi or any other MC.

They provide for easier and faster scraping.

They allow me to enter data for unpublished recordings like songs my wife, friends and children have recorded. Allowing me to display it on the big screen with all the professionals which is kind of neat.

Videos are also handled with NFO’s so its consistent

Allows me to customize details or notes about any artist or album.

Would be easy to add custom tags to this and gain more detail and flexibility if someone were to build a system for it.

Allows me to have the information available offline. Cataloging, displaying, rummaging, etc…

Any one of these benefits can be picked apart and a better solution found. But as of right now I know of nothing as simple as an NFO to help with all this.


RE: Scraping music data - beesmyer - 2015-11-19

(2015-11-18, 22:31)WelshPaul Wrote: Album description – particularly for classical music I want to read about the composition not the album, i.e. want to read about Beethoven and his symphony not some Russian pianist I've never heard of and his playing style (sorry – I'm a pleb).

I can view my classical music under Beethoven, bach, whomever it was. I use picard to tag and separate the artist into multiple fields. I think the composer is listed in the composer tag and as an artist along with the performer. I think this is what allows even the online scrapers to pull up information for Bach on the pieces he composed but someone else is performing.

Not infront of my pc right now so I can't look at picard or my music to see for sure how the tags are. Just know I can view this like you are describing.


RE: Scraping music data - scott967 - 2015-11-20

I use album.nfo because about 70% of my albums don't "scrape". My environment is a central file server and each Kodi client holds its own database, so doing this allows me to update each client database via the nfo. I don't use artist.nfo because as implemented it doesn't work (no way to handle multiple artists per album). Instead I create a single file export from my master Kodi and do library import for the artists on the clients.

Currently Kodi doesn't support classical well. No way to scrape composers or conductors and not much data is stored in the library for these entities. (really, none.) There is also a need for handling unique work serials (don't know the proper name for this but I mean Op,. K., etc as assigned by musicologists).

There has also been expressed interest in dealing with podcasts, audio books and the like. Might add vocaloid to this list though I doubt many users have an interest. (Suppose vocaloid is more in the music video domain.)

There have been some attempts at writing music nfo generators but nothing has really surfaced. There was a WIP named Symphony but seems to have gone to ground.

scott s.
.


RE: Scraping music data - DaveBlake - 2015-11-20

(2015-11-18, 14:38)zag Wrote:
(2015-11-18, 14:02)DaveBlake Wrote: From your external DB knowledge, they provide data based on a name or name & title lookup, is that right?
I presume that you scrape both artist and album, but from the same external DB? Do you ever mix it up, and if so why?

Just having a quick look at the logs, most lookups from xbmc on TADB are done from the musicbrainz id and most are correct. A few wrong lookups but not many.

I've never seen artist and album mixed up no, but not quite sure what you mean there.

Ah yes musicbrainz ID, of course we use that not just names if we have them. And with DarkHelmet's comment in mind we really need to explain in wiki why MBIDs are so useful, and worth tagging.

I was rather unclear with the rest! Zag I was asking if you always get both artist and album data from the same external database, or if you could see situations where you would look in a different place for artist than for album. Can you tell from TADB whether people do both scrapes (album and artist) or more of one than the other?

But all of you so far thanks for the feedback, it does help to undertstand what people do and why. I must point out that I am not doing all the music work either!! It is Razzee that is looking at adding a "set content" diaglog for music, that would set scraper parameters when you add a music source. This idea has merit - making making scraping more obvious (as a prelude to making it automatic once it is up to it), and also enabling users to target a suitable external database for thier kind of music (and language). But I also forsee some issues that need resolving - there is not always an immediate relationship between artist in Kodi music library and the users music file/folder hierarchy on their hard drives etc.

I would like to continue to gather information, so other music users out there let us know.


RE: Scraping music data - blutstein - 2015-11-23

i've divided my collection in 2 parts, one part with albums (~ 100 gb) and the big rest (~1 tb) that contains DJ Mixsets, Podcasts and so on. Since these files are individually recorded and not officially released, scraping is not that relevant, so I would clearly say, i would scrape from different databases, although its difficult to find databases for the not so mainstream DJs - in this case a nfo is the only option left.

The music file/folder hierarchy is completly another than my sorting in Kodi.


RE: Scraping music data - zag - 2015-11-23

(2015-11-20, 20:26)DaveBlake Wrote: I was rather unclear with the rest! Zag I was asking if you always get both artist and album data from the same external database, or if you could see situations where you would look in a different place for artist than for album. Can you tell from TADB whether people do both scrapes (album and artist) or more of one than the other?

I always use the TADB Artist scraper and the TADB album scraper. So they come from the same source.

I have no idea why anyone would want different sources myself.