• 1
  • 3
  • 4
  • 5(current)
  • 6
  • 7
  • 48
[WIP] Audio-Matic Automated Music Downloader/Organizer
#61
compcentral Wrote:good suggestion, but the issue is not just saying "the beatles" = "beatles", but determining which name is the "right" one to use when post-processing and renaming downloaded files. I actually already ignore things like "the, &, and, etc." when searching for downloads and doing scraping and API calls, but that's another sort of issue. I like to have my music named "perfectly" if possible... hence this application was born.

I agree with being correct. And would rather everything be correct but if it comes to it, you could just have it as an option to have "the" or not. So if you look up a song and artist is named both "the beatles" and "beatles" according the the user option will determine if it should have "the" or not.

I think a lot of programs ignore "the" anyhow when listing music so its less of a big deal.

I'm sure you thought of this but if its available. For the songs you can't be 100% sure what the correct information is, could you use a song finger print type service? Basically run shazzam on the file to get information for it. Bad explanation but I think you get the idea. It would take awhile but would only be run on files that need it and I personally don't care if it takes 48hrs to add a library if it is going to be as correct as you intend it to be.
Reply
#62
The reason this is a big deal is how it is stored in the database. If some songs used "the beatles" as an artist name, those songs would get linked to the artist id that corresponds to "the beatles". If other songs use "beatles" as the artist name, they would get linked to a different artist id in the database.

I was going to just link each of these artists to each other using an "other_aliases" field, but that becomes a problem if the artist is either completely wrong or something vague like "Various Artists".

And it's not so much the obvious artist names (filtering out "the" is no big deal), but maybe one like this:

Thirty Seconds to Mars
vs.
30 Seconds to Mars

I just used "the" as an example because it was the first thing that popped into my head.

So far, I think the best solution will be to just do an online lookup, determine the most likely artist using all info available, and only store that artist in the database. Then I will link all of the related songs to that artist id, log the changes, and give the user the option to override the changes after the import is complete.
Reply
#63
I would say flag the entries that have a problem, skip them, and then let the user decide what to do. It sounds like you are worried about how much time/effort it will take for the end user, but really it only is an issue on the very first import. As a user, I would understand if it took a long time since it's looking up every song. I can then go through at my leisure and correct the ones that the software is "unsure" of.
Reply
#64
compcentral Wrote:So far, I think the best solution will be to just do an online lookup, determine the most likely artist using all info available, and only store that artist in the database. Then I will link all of the related songs to that artist id, log the changes, and give the user the option to override the changes after the import is complete.

rockstrongo Wrote:I would say flag the entries that have a problem, skip them, and then let the user decide what to do. It sounds like you are worried about how much time/effort it will take for the end user, but really it only is an issue on the very first import. As a user, I would understand if it took a long time since it's looking up every song. I can then go through at my leisure and correct the ones that the software is "unsure" of.

I guess I'll do both... just have an option in settings for the user to choose their preference.
Reply
#65
compcentral Wrote:Thanks for your input. It is greatly appreciated.



You are correct. Currently I'm not doing any threading. Here is the basic process that accounts for most of the import function. I admit my experience creating multi-threaded apps is limited. How would you recommend threading this?

The concept of threads is a pretty simple one, but implementing it can be difficult if you aren't smart about how you implement your code. It's been about 10 years since I've worked with VB (I'm a java developer by trade), however this app would be served pretty well with a simple FIFO queue producer consumer model. Basically what you'd want in your class is two functions one that reads the files and one that processes the information. So you'd start the read thread which adds the file information to the queue, and have process thread which pulls that information from the queue and processes it. Here is some basic pseudo code that gives you an idea of how it would be implemented, you'd potentially need to add the sleep commands as I'm not sure how thread scheduling is done in VB.

Code:
function startThreads(){
     Thread t1 = new Thread(producer());
     Thread t2 = new Thread(consumer());
}

function producer(){
     while(hasMoreFiles){
            read file information;
            lock queue;
            write to queue;
            release queue;
     }
}

function consumer(){
     while(producerActive || queueNotEmpty){
            lock queue;
            retrieve and delete first item from queue;
            release queue;
            process queue item;
     }
}
Another thing you'd also need to add is a watch on the queue length, if it gets too big add a sleep command to the producer (means your file reading is going faster than the consumer can process) and likewise if it gets too small perhaps slow down the consumer. It's not an exact science see how your code operates with smaller datasets and adjust the code accordingly. Sorry I can't help you with exact syntax but I believe VB.NET even has an inbuilt Queue object which would make implementing this pretty trivial.


compcentral Wrote:Once I find an Artist path, I generate a list of all music files within that folder and all subfolders. I then scan each of these files for tagging metadata and write everything discovered for the file to the database. After all files have been scanned, I examine the combined tagging info to look for inconsistencies. Then I write the information obtained for this artist into the artist table and if naming conflicts are found, write that information to the database as well before moving on to the next artist folder and repeating the process.
The logic of the threads can be adjusted then, perhaps a custom object which stores all files of an artist which is added to the queue, then the consumer processes that object while the file read is busy reading the next artist. There are plenty of ways you can approach it. You can even add a third thread which writes the data to the database, instead of depending on how your code is written opening a new db connection each time and writing the data, the connection can be persistent until the operation is done.



compcentral Wrote:Rather than simply use regex, I wrote a custom function to parse the filename and compare it to the folder structure definition. It seemed to nearly as fast and offered more flexibly for inexact matches (items that did not conform to the folder structure pattern defined before the import). I'll have to take another look at this and see how much of a performance hit/gain I get revising it. Up until this point, my main concern was functionality, but soon enough optimization will take a more important role.
I'd still use compiled regex where you can, it is unlikely that your code can be close to the speed of compiled regex, you don't need a blanket regex definition, have multiple regex's that implement what you've done logically in code and test them all to see if you get a match. It's quite versatile and you'll see some pretty big gains if you do implement it correct. Things like split, replace etc etc without regex make for some pretty big performance hits.

compcentral Wrote:Yeah. I agree. I think all the disk I/O is causing most of the slow down.

Unfortunately that's the one area which you have no power to change. How are you reading the id3 tags? Are you reading in the entire file or the last 125 bytes of the file for id3v1.1 (I can't remember the length of id3v2).
Reply
#66
ProphetVX Wrote:The concept of threads is a pretty simple one, but implementing it can be difficult if you aren't smart about how you implement your code. It's been about 10 years since I've worked with VB (I'm a java developer by trade), however this app would be served pretty well with a simple FIFO queue producer consumer model. Basically what you'd want in your class is two functions one that reads the files and one that processes the information. So you'd start the read thread which adds the file information to the queue, and have process thread which pulls that information from the queue and processes it. Here is some basic pseudo code that gives you an idea of how it would be implemented, you'd potentially need to add the sleep commands as I'm not sure how thread scheduling is done in VB.

Yes, creating and executing the threads is easy enough. I'm aware of how to do that. It's just the logic behind doing so without getting out of synch that was throwing me off, but your example helps. I honestly hadn't put a lot of thought into it (too focused on just getting the core functions to work), but I'll definitely devote some time to this and get back to you.

ProphetVX Wrote:I'd still use compiled regex where you can, it is unlikely that your code can be close to the speed of compiled regex, you don't need a blanket regex definition, have multiple regex's that implement what you've done logically in code and test them all to see if you get a match. It's quite versatile and you'll see some pretty big gains if you do implement it correct. Things like split, replace etc etc without regex make for some pretty big performance hits.

I was never very good at creating regex strings, but if you think it will have a significant impact on processing time, I'll give it another look.

ProphetVX Wrote:Unfortunately that's the one area which you have no power to change. How are you reading the id3 tags? Are you reading in the entire file or the last 125 bytes of the file for id3v1.1 (I can't remember the length of id3v2).

I'm only reading in the id3 tag area of the file, but I'm currently processing every tag (because I just use one general purpose function for all id3 reading operations right now). I could gain some performance by optimizing this to only read the tags that are being stored in the database and used to identify the file (right now that is just artist, album, year, track name, track number, genre... this may expand later however). Initially, I planned on storing all tags in the database, but later I decided it would be me efficient to just read the other tags directly from the file if/when needed. This saved I/O writing to DB but I never got around to changing the tag reader function.
Reply
#67
I just spent the last few hours rewriting my "get song info from file path, and folder structure pattern matching" function to use regex and it seems to work fine, but no noticeable speed gain. Sad I can share my code for this with you if you think you could help. Maybe I'm not implementing it in the most efficient way? Definitely possible. This is not my specialty at all but.

Edit: Actually, here's a link to the source code. Just a text file, but if you paste the code into VB and it should be fine. Call the function using something like this:

Code:
dim pathinfo() as string
pathInfo = getInfoFromPath2("R:\Media Storage\Music\10 Years\Division (2008)\01-10_years-actions_and_motives.mp3", "R:\Media Storage\Music", "<artist>\<album> (<year>)\<num> <song>", "R:\Media Storage\Music")

After doing some more testing, it seems that writing to the SQLite database is the slow down. If I comment that out, it flies through the import process in no time. This is nearly all disk I/O and only a few lines of code, so sadly it looks like attempting to optimize code to speed up the import is not going to help much.

Oh well.. just a few hours wasted. On the plus side, my regex skills improved significantly. Smile It's usually a ton of trial and error for me to get the regex strings just right, but I discovered a nice regex validator for future use that helps significantly.
Reply
#68
Hi compcentral

im really looking forward to using this program and also appriciate that you are able to find the time to make it!!

cheers
cbb
Reply
#69
I posted a "request" for something like this only a short while ago.
This will make my HTPC life heaven on earth. I cant wait to give it a try.
Your work is already very much appreciated!
Reply
#70
Thank you both for the encouragement. I really appreciate it.
Reply
#71
Just finished updating the regex matching code a bit more today and revised some of what I posted yesterday. I updated the text file I posted to reflect these changes. It can still be viewed here: link to the Regex source code

Any programmers out there that would like to offer some feedback on this please? Thanks.
Reply
#72
Hi compcentral,
kudos to you & your project.
it's a mammoth undertaking, but ultimately imperative to strive towards
universal & efficient standards, for the love of music & the overwhelming
allure of dewey digital catalogue feng shui...

anyhow, you asked for some feedback/suggestions.

and as music is one thing i've spent far too many hours on..
here's a couple thoughts to chew over.

It's sad but true that naming conventions for some artists are inconsistent,
(Tupac, 2Pac, 2pac)(Rolling Stones, The)
and some may intentionally defy logic
(AC/DC,$wingin' Utter$).
I don't believe it prudent to replace any characters with similar looking ones,
it can only lead to more alias/folder structure horror,
sadly I use the humble underscore _ in place of illegal characters
(AC_DC, Boney M_, B_Witched)
until a universally complete/compatible codeset/font is here.

It occurred to me that no 'mere' regex formulae will absolve
every deviate discrepency when dealing with massive collections
My idea is the 'Essentially Holy Trinity Library Division'
or more precisely 'The Good, The Bad & The Ugly'
All 'complete','favourable bitrate' & 'correctly tagged & named'-
Artist/Albums are locked read-only tight in 'The Good' directory,
Any incomplete, poorer quality, poorly tagged/named, missing art etc
you guessed it 'The Bad' for further work needed.
And ALL illegal, immoral and plain system-breaking miscreants,
can join the unfavourably formatted & obscure bootleggers etc
in the 'Gulag' where the threat of painless deletion looms
til someone figures out if they're safe to merge again.

The upside is instead of churning 50000+
60%-compliant files through a library update,
with unsatisfactory results.
'The Good' say (60%) or 30000-tracks @ 100%
Top-of-the-Pops are flying for Eternal Paradise.
The Bad & The Ugly well they probably shouldn't be allowed library access
on account of their ill-litter-rate.

I'm also interested in finding the Holy-Grail of
fabled file/folder heirachy that seamlessly includes
releases, singles, collaborations, bootlegs, re-masters,
with info, artwork, lyrics, music videos, press-releases
tour-dates, maybe even tablature/music notation etc.

There's a fine-line between fan & fanatic.
I look forward to more Audio-matic mania.
Hats off to the fantastic XBMC-team for the finest
multi-media mega-mayhem in the business.
Reply
#73
You bring up a good point... how to handle illegal file/folder name characters (*, ?, \, /, etc.)? Obviously in these instances, the file/folder name will not match the actual name.

I think the underscore substitution method is probably the best option. I will add an option to treat an underscore as an unknown character during import. Upper ascii character substitution will also be optional when renaming, but when doing a "lookup" to either find downloads or identify music, it will be taken into consideration somehow.
Reply
#74
I can't wait to try the beta.
Any release date planned ?
Reply
#75
yannl Wrote:I can't wait to try the beta.
Any release date planned ?

Not yet. I'll probably actually release an alpha (still in development/preview) version before the beta is ready to go, but I'm still not sure when it will be ready. I'm making great strides, but these things take time. Thanks for your patience. I'll let everyone know when I have a date in mind. It's getting close for a preview, but a fully functional/polished release will be a ways off yet.
Reply
  • 1
  • 3
  • 4
  • 5(current)
  • 6
  • 7
  • 48

Logout Mark Read Team Forum Stats Members Help
[WIP] Audio-Matic Automated Music Downloader/Organizer3