Req Scan music library based on date
#1
Currently, rescanning the whole music library is inefficient because it causes elements that have not been modified since they were last scanned in, to be unnecessarily acted upon. On large libraries this can prove time consuming (in my case, upwards of 2 days). Thus, for efficiency's sake one is often forced to scan things in one file at a time. This can be time consuming as well, and requires unnecessary manual labor.

I'm wondering if it's possible to instead have XBMC use "indexes" for faster comparison. For instance, I propose this algorithm (assuming XBMC already keeps a list of file URLs with their modification times):

1) search the data sources for the music library to compile a list of file URLs with their corresponding modification times
2) compare the list obtained in 1) with the list XBMC already has
3) identify files that no longer exist, mark them to be cleaned out of the library at the end
4) identify files that are new or changed with respect to the original list, mark them to be scanned into the library. Prioritize new files, then modified files.
5) apply changes

This algorithm has the following gains:

1) avoids having to rescan potentially thousands of files that did not change and thus require no attention
2) allows simplified re-scanning of the library without having to track every single modification for later addition to XBMC (i.e. via the Files... "scan item into library" option)
3) accelerates rescanning of larger collections since only those files that actually require action will be acted upon

Cheers!
Reply
#2
AFAIK, XBMC uses a hashing system to check if things have changed. My Music library has ~23000 tracks but if I do an update library (if nothing has changed) it only takes around a minute to check the entire thing. Adding an album takes a little longer due to the scrapers having to look up and download information, but that's to be expected. In any case, adding an album only takes around 2~3 minutes using 'update library' and that includes checking every album.

The last time I accidentally trashed the DB, it took about 12 hours to rebuild from scratch. I can't see why an update would take you 2 days unless something is going wrong, or your hardware/internet is incredibly slow.

Oh, should maybe mention my db is a shared MySQL database, so it's probably taking a bit of a performance hit, compared to the standard sqlite version.
Learning Linux the hard way !!
Reply
#3
My library is on a NAS device. Even on a gigabit network, it's much slower than a local library. Thus, hash-based scanning isn't as efficient since it requires scanning the whole source file to calculate its hash, and then comparing that to what's stored in the library. Over the network, this can take a LOOOONG time (as you've seen).

I think a saner approach would be to allow the user to select which "scanning" mechanism to favor - checksum, or URL+date+size only. The latter would be much faster but evidently could miss subtle modifications that the former would definitely catch. As with anything, it's a speed-vs-reliability trade-off.

In my case, since I have a good control of the collection in question, I can do without the extra reliability, and will benefit greatly from the speed boost that the proposed alternative would offer.
Reply

Logout Mark Read Team Forum Stats Members Help
Scan music library based on date0