Faster library scanning
#1
There is no official headless solution thus for standalone users or people using shared library (mysql) solution scanning for new items is slow.
Last night on my iPad I had to sit back and wait for over a minute just so I could see 1 new added episode in my library. I have the feeling there is room for improvement.

First of, why is there no clear indication of the actual progress of scanning for new movies/episodes? I see the progress bar bouncing back and forth several times.

Could someone layout the current procedure of identifying new files and querying the addons for information?

Ever thought of threading the scanner for new movies and shows seperately?

-------------------- Update ----------------------

Possible improvements that have been lined out in this thread so far:
  • Scan directories and calculate hashes simultaneously for each library (movie,tv shows, ..)
  • Send concurrent requests to metadata providers (define API limits for each service)
  • Update UX for each new found item (after metadata was received)

(2014-06-23, 17:46)popcornmix Wrote: I remember when YAMJ added multithreading to scanning. It made it massively faster (like ten times faster).

They did hit the issue of hammering sites, and implemented per site limits (e.g. " 2 threads from TheTVDB.com, 2 from TheMovieDB.com, 5 from IMDB and 1 from google")
https://code.google.com/p/moviejukebox/w...iThreading

Obviously if you can overlap the movie, TV and music updates you get more benefit and spread the load across the sites.
Platforms: macOS - iOS - OSMC
co-author: Red Bull TV add-on
Reply
#2
i can do better than outline.

https://github.com/xbmc/xbmc/blob/master...canner.cpp

now, to answer the specific question:

you can scan movies and shows separately (in fact that's how the scanner operates, first it scans shows, then movies, then musicvideos (order may not be exactly that but..) unless a given type is specified.

you can specify what to scan through all api calls, json, python or the builtins. e.g. for the built-in interface (used for keymaps, skin buttons etc):

UpdateLibrary(movies) # will only update movies.
UpdateLibrary(movies, path) # will only update movies in the given path

The progress dialog is on a per-folder basis, thus it jumps back and forth. There used to be a global bar as well, but it was removed, since sending off the counter thread impaled the main scanner performance too much, in particular on io-strangled devices (e.g. rpi).
Reply
#3
I just wonder why it takes so long to go over all non-changed files.
Would it be a good idea if you create a hash for every tv show's folder structure and store that in the library.
If you scan and compare the hash and it's the same, then you know that nothing was changed.
Platforms: macOS - iOS - OSMC
co-author: Red Bull TV add-on
Reply
#4
that's exactly what's done, it's even a fast-hash when doable (based on mtime for folders on filesystems that support it). problem is a lot of filesystems do not provide it, and thus a slow hash needs to be done (i.e. stat every file, look at their mtimes..).
Reply
#5
FYI, I've added a recursive fast hash for series a few days ago. It's still not as fast as it could be, but with the right filesystem it's a lot faster when using current master.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#6
(2014-06-23, 16:59)vdrfan Wrote: FYI, I've added a recursive fast hash for series a few days ago. It's still not as fast as it could be, but with the right filesystem it's a lot faster when using current master.

Cool, thanks. Looking forward to it.

Is multi-threading also an option for processing movie and tv shows folders at the same time (including requesting information from the scrapers at the same time)?
Platforms: macOS - iOS - OSMC
co-author: Red Bull TV add-on
Reply
#7
The filesystem (hash) stuff should be single threaded IMO. Once done we could fire jobs that do the actual scraping and art handling (in theory). Will have to play around with it.
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#8
be careful. it will be a very efficient way to be banned from metadata providers.
Reply
#9
(2014-06-23, 17:20)ironic_monkey Wrote: be careful. it will be a very efficient way to be banned from metadata providers.

Hm, right. While on topic: You think we should cache the tvdb zips (longer than 15m) in order to speed things up and reduce the provider hammering? AFAIK the API offers updates, right?

EDIT: Btw, what's again your IRC nick? Wink
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#10
sure, if you can pull it off that would be much better. you'd have to store last update time,
Reply
#11
(2014-06-23, 17:10)vdrfan Wrote: The filesystem (hash) stuff should be single threaded IMO. Once done we could fire jobs that do the actual scraping and art handling (in theory). Will have to play around with it.

I remember when YAMJ added multithreading to scanning. It made it massively faster (like ten times faster).

They did hit the issue of hammering sites, and implemented per site limits (e.g. " 2 threads from TheTVDB.com, 2 from TheMovieDB.com, 5 from IMDB and 1 from google")
https://code.google.com/p/moviejukebox/w...iThreading

Obviously if you can overlap the movie, TV and music updates you get more benefit and spread the load across the sites.
Reply
#12
(2014-06-23, 17:20)ironic_monkey Wrote: be careful. it will be a very efficient way to be banned from metadata providers.

Understood, but at least we should scan for movies and shows/episodes at the same time and request metadata (provider is always(?) different between tv and movie ..).

Another possible big improvement would be to update skin/gui as soon as a new item is found. A.t.m. it appears only to refresh the UI once, when the library update is complete.
Platforms: macOS - iOS - OSMC
co-author: Red Bull TV add-on
Reply
#13
@vdrfan currently not on irc. still too pissed off at osx nonsense to guarantee a cool head. when i get back i'm either confused_hamster or karsk, depending on the box i'm using..
Reply
#14
(2014-06-23, 17:52)ironic_monkey Wrote: @vdrfan currently not on irc. still too pissed off at osx nonsense to guarantee a cool head. when i get back i'm either confused_hamster or karsk, depending on the box i'm using..

OK Wink
Always read the online manual (wiki), FAQ (wiki) and search the forum before posting.
Do not PM or e-mail Team-Kodi members directly asking for support. Read/follow the forum rules (wiki).
Please read the pages on troubleshooting (wiki) and bug reporting (wiki) before reporting issues.
Reply
#15
(2014-06-23, 14:22)tripkip Wrote: Possible improvements that have been lined out in this thread so far:
  • Scan directories and calculate hashes simultaneously for each library (movie,tv shows, ..)
  • Send concurrent requests to metadata providers (define API limits for each service)
  • Update UX for each new found item (after metadata was received)

Yeah.. here's why it's slow: bloat. The video scanner is 2100 lines and does everything from accessing file system, database, hashing, updating gui, fetching metadata intertwined. It's impossible to say anything about which parts are slow. The place to start is to remove stuff, not add more hashing and threading based on guesses.
Reply

Logout Mark Read Team Forum Stats Members Help
Faster library scanning1