Solved Updating Movies sources takes *forever*
#1
I hope I am doing something wrong here, but whenever I update the Movies data sources (using the button above the main movie list) to detect new movies to be scraped it takes absolutely forever to finish. I understand the part of the process where it looks through the source folders for something new, the part I don't understand is after that it goes through the process of "updating MediaInfo & cleanup". I know it is working hard at something because my CPU utilization is plenty high during this process.

Now I understand I have a lot of movies (almost 2000) and once the updating process is done everything works great. It just seems like a long time to take to update the list every time I want to add a new movie to the database. It seems like it is updating MediaInfo (whatever that entails) for every single movie rather than just the new ones every time the sources are updated.

When I say it takes forever I mean around 15-20 minutes total, which certainly seems like forever! I've looked through the settings and I don't think I have anything unnecessary checked but I want to make sure this is normal behavior for TMM.
Client: XBMCuntu Frodo - ASRock ION 330 Pro - Logitech Harmony One
Server: 4U NORCO RPC-4220 20-drive case - UnRAID 5.0 - 38 TB parity-protected storage
#2
I agree updating the sources takes a long time because of the gathering media info. I'm guessing that it's reading the NFO and image files every time you update to see if there is changes, maybe add an option to disable that portion of the source update (only read NFO/image files for newly added items)? For me the only time there will be a change is if I re-scrape the movie from tMM.
#3
At least I am not the only one. For reference, I started the update process before I even wrote my earlier post, and it is still going!
Client: XBMCuntu Frodo - ASRock ION 330 Pro - Logitech Harmony One
Server: 4U NORCO RPC-4220 20-drive case - UnRAID 5.0 - 38 TB parity-protected storage
#4
the update datasource process should only read mediainfo for files where nothing has been gathered before..
so when you re-scan your movie sources, everything will be checked for availability, but only new files will be scanned with mediainfo (at least this is the way should be done Wink )

I've just did a re-scan from my NAS (about 500 movies over a "slow" smb connection) - I was finished within 2 minutes. Mediainfo is only gathered for new movies

Could you send a bug report after a re-scan? I may find a hint in the logs (maybe the external mediainfo lib stucks at one or more malformed files)
tinyMediaManager - THE media manager of your choice - available for Windows, macOS and Linux
Help us translate tinyMediaManager at Weblate | Translations at 66%
Found a bug or want to submit a feature request? Contact us at GitLab
#5
Well... it's not that "easy" as you might think.
Just scan the movie datasource for a new folder and import.
Contrary to XBMC, we detect new files, even IF we already have the movie in our database Wink

We detect
- movies in their own folder
- more than one movie in a single folder and create corresponding independent movies out of that
- movies, which are split into multiple folders (like CD1/CD2/subs/sample folders and group to a single movie)
- all of them above AT THE SAME TIME, regardless of the deep of subdirectories you have
- detect the right moviename out of this mess, to be able to scrape w/o changing movie title

This is kinda unique i guess Smile

Nevertheless, i implemented a small change, that if you do not scan for "multiple movies in one folder" - which is off by default - it should be a little bit faster again.

So what is better?
To wait 20 minutes to detect and clean your weird folder structure,
or only detect half of the movies, or detect them "wrongly", so you cant scrape out-of-the-box? (because you got a CD1 movie?)

Have already prepared a testbed for a weird movie structure.
We for sure can compete with every other media manager around Wink
tinyMediaManager - THE media manager of your choice :)
Wanna help translate TMM ?
Image
#6
I just added a single movie to my library and the total process to update the sources in TMM took 56 minutes. A bit longer than even I thought it would take. I even measured how long each part of the process takes.

I have 2 different sources that are being scanned at the same time. Source 1 (more movies) took 14 minutes to scan and 28 minutes on "updating MediaInfo & cleanup". Source 2 (less movies) took about 2 minutes to scan and 12 minutes on "updating MediaInfo & cleanup".

I'm also running off a NAS with "slow" SMB shares but that doesn't seem to be the limiting factor here. During the "scan" process there is plenty of drive activity on my NAS, during the MediaInfo and cleanup phase there is almost no activity.

Here is a copy of the logs that TMM produced during this update.

By comparison, EMM took 2 minutes and 20 seconds to update its entire library, including scanning all my TV shows which I didn't do in TMM. I'm not sure where the slowdown is but something certainly seems to be wrong and any help getting it sped up would be much appreciated. I am really enjoying using TMM for my movies, but an almost hour update time is a bit of an issue that I can't overlook.

Also, I don't use any strange structures for my movies. It is as simple as it gets, with one movie (and associated files) per folder with no multiple CDs or anything like that.
Client: XBMCuntu Frodo - ASRock ION 330 Pro - Logitech Harmony One
Server: 4U NORCO RPC-4220 20-drive case - UnRAID 5.0 - 38 TB parity-protected storage
#7
thanks for the logs.
in your logs there are some strange things - first of all: each movie directory needs about 600ms to be scanned; on my NAS that is 50ms and on my HDD it is about 20 - 30ms.. do you have such a slow NAS??
if you calculate that for all movies: 2000 * 600ms = 1200sec ~ 20minutes. I will investigate if we can speed up our (mighty) import logic.

second thing: on your last import, there were only a few files for which mediainfo has been fetched:
Datasource 1 (V:\Movies) - 12 files
The Artist (2011) 720p.srt => 56sec
30 Minutes or Less (2011) 480p-fanart.jpg => 2min28sec
The Dark Knight Rises (2012) 1080p.srt => 1min42sec
... (didn't copy out all files Wink )

Datasource 2 (V:\Alex Movie) - 3 files
Monsters.University.2013.1080p.BluRay.x264.YIFY.mp4 => 4min30sec
Alien Planet (2005) 480p-poster.jpg => 1min24sec
John Carter (2012) 1080p.srt => 2min28sec

which is awesome slow.. but since we know that slowness can either come from mediainfo (which is really fast on non "damaged" files) or the network/NAS itself, i would say it's not tmm's fault Wink
Did you do some other things on the NAS? has it been busy in some way? (streaming or copying some content)

all in all your update datasource task took 52min according to my logs.

my NAS (~500 movies) took 1min12sec (using the really slow GVFS - SMB connection in linux)
tinyMediaManager - THE media manager of your choice - available for Windows, macOS and Linux
Help us translate tinyMediaManager at Weblate | Translations at 66%
Found a bug or want to submit a feature request? Contact us at GitLab
#8
Another thing you could try:
I see, you have all movies on a mapped drive V:\ (NAS)
Can you add you NAS share direclty to TMM? \\nas\movies

Or:
can you check the processes running on you NAS?
Some MediaIndexing in background?
Either within GUI... or ssh 'top' command?
3 secs for mediainfo on a graphic? c'mon....
tinyMediaManager - THE media manager of your choice :)
Wanna help translate TMM ?
Image
#9
The NAS is plenty fast on other functions, it is an i3 machine running UnRAID with nothing else going on at the time. Because of the way UnRAID works, it basically acts like a Linux box running Samba sharing from drives using ReiserFS. I get about 60 MB/s read speeds over my gigabit LAN. Like I mentioned before, during the scan process there is plenty of disk activity on the NAS but almost none during the MediaInfo and cleanup stage, so I don't think the NAS is the bottleneck. Also, if the NAS were the problem I would expect EMM to take much longer than it does. I have been using EMM on the same NAS with the same sources for more than a year and it has never taken more than 3 minutes to do any updating operation, even when adding entire TV shows etc.

I will try adding the sources using the actual network path rather than through the mapped drives I have on my Windows box that TMM Is running on and see if that makes a difference. In the process of doing that I noticed it takes a pretty long time to remove the existing source from TMM and have it clean up the database. Over 5 minutes easily to remove the source with more movies in it. Should removing a source take that long?

EDIT:

Well I think I have pretty much ruled out the NAS as being the issue. I installed TMM on an Apple laptop I have around, set up the same settings and added in the same sources and it is *MASSIVLY* faster than on my Windows machine, even though the laptop is much less powerful and running over wireless. Not sure what is going on yet but it seems to isolated to my Windows machine, which is really where I want to handle all the media. I'm not sure exactly what would need to be fixed on the Windows machine, maybe Java? Any help you guys can give me would be appreciated.

EDIT 2:

I completely uninstalled TMM and reinstalled fresh and it seems much better for some reason. It is re-adding my sources much faster now. I will still need to test the update sources function when I add a new movie but right now it seems to be much better than it was. Sorry for the non-issue but I really appreciate your responses!

EDIT 3:

Seems I may have spoken too soon. It is certainly faster than before after the reinstall but still takes quite a long time (20+ minutes) to run through the source update process, even with no new content added. If the rate at which the movie names go by on the progress indicator is anywhere near accurate, the scanning process seems to slow down significantly as the library gets larger. Per-movie scan times get longer and longer as TMM scans in more than 1000 movies, so maybe it has something to do with the library size, which would explain why I see it on my 2000 movie library and you don't on your 500 movie library.
Client: XBMCuntu Frodo - ASRock ION 330 Pro - Logitech Harmony One
Server: 4U NORCO RPC-4220 20-drive case - UnRAID 5.0 - 38 TB parity-protected storage
#10
we use objectDB as database engine, which is a highly optomized, industry standard database - so I can't imagine the DB is the problem here.

Could you do the intial import and update on your apple machine? Does this also take so long? I have to look where I can get such a big test-database from Big Grin
Meanwhile we implemented two optimizations - but 20 minutes is simply a way too long...
tinyMediaManager - THE media manager of your choice - available for Windows, macOS and Linux
Help us translate tinyMediaManager at Weblate | Translations at 66%
Found a bug or want to submit a feature request? Contact us at GitLab
#11
I just finished another round of testing, the times between the Windows machine and the Apple one seem about the same. For some reason the total update time (without adding any new content) is down to around 12 minutes on both. Not sure what changed, and 12 minutes isn't really that bad but I think, at least in my personal case, it could be faster.

The total time taken to scan my two sources without counting the "MediaInfo and cleanup" phase is only about 5.5 minutes, meaning it takes another 6.5 minutes on the MediaInfo phase. Looking at the log from the scan, on my 'big' source it only handles 15 files, with times ranging from 1 second to 1 minute and 20 seconds. It's strange because I know these same files have been scanned before and some of them have even been made by TMM itself (some fanart images). The weirder thing is it looks like most of the longest scans take place on subtitle .srt files. The 1 minute and 20 second scan was on the 145 Kb subtitle file for The Dark Knight Rises. The file itself is fine, I can read it perfectly with a text editor, and TMM picks it up as the associated subtitle file for that movie just fine. But 1m20s to scan a 145 Kb file tells me something is busted. What exactly is busted I don't know yet, but it seems like if I could clear up the MediaInfo scanning delays then TMM would be running nice and quick for me.

UPDATE:

I just did a quick re-scan of my 'small' movie share and looked at the logs again. TMM is updating the MediaInfo for the exact 3 same files, even though I just updated them through a full scan. I didn't even shut down TMM inbetween scans. Not sure why, but it seems to either forget that it scanned those files previously or thinks for some reason they are different every time. One of the files is an .srt that has been there before I started using TMM, but the other two that get scanned are image files that were made previously by TMM itself.
Client: XBMCuntu Frodo - ASRock ION 330 Pro - Logitech Harmony One
Server: 4U NORCO RPC-4220 20-drive case - UnRAID 5.0 - 38 TB parity-protected storage
#12
okay - we're getting closer to the problem. mediainfo should definitely not taking this long to scan a text file...

could you try it with a newer version of mediainfo?
32bit download:
http://sourceforge.net/projects/mediainf...z/download

64bit download:
http://sourceforge.net/projects/mediainf...z/download

extract the dll(s) from the download you need and extract it to the corresponding folder in the tmm install install dir under native
e.g.
tmm\native\windows-x64\MediaInfo.dll

do not forget to backup the distributed lib before Wink

Meanwhile I will investigate why these files will be scanned multiple times
tinyMediaManager - THE media manager of your choice - available for Windows, macOS and Linux
Help us translate tinyMediaManager at Weblate | Translations at 66%
Found a bug or want to submit a feature request? Contact us at GitLab
#13
a little update:

I did some hrs of performance tests against my NAS from two different computers (a netbook, 100mbit LAN and my PC, 1Gbit LAN)

here are the results (based on the actual nightly build, ~500 movies, at least 2 images for each movies, some contains more images and subtitles):
netbook:
initial import ~ 18min (~1 min scanning all files, ~2 min cleanup, ~14min getting mediainfo of the files)
update without changes ~ 50sec

PC:
initial import ~3min30sec (~15 sec scanning all files, ~6 sec cleanup, ~3min gettting mediainfo)
update without changes ~ 30sec


I did also watch at the task manager while the update run.. getting mediainfo needs some CPU time - that's why the netbook is so much slower than the PC.
And you will also see that getting mediainfo at the first import is a huge task - but when looking at the media information tab (or media files tab) you see the benefit: you always see all relevant informations (resolutions, codes, ...) of every movie file (not only video file - also images, subtitles, extra audio streams..).
And it is no wonder, that tmm takes longer than EMM - we do even more checks/cleanups on every update. We also gather media information from every possible file to create the best possible experience for our users! Such operations need their time.

Coming back to your example - about 600ms for scanning a file on your datasource is really slow! even my netbook on 100mbit is a way faster here (1min / 500 =~120ms per movie). Getting mediainfo the first time is a long lasting task - but that is needed only one time (with our fixes, subtitles are only parsed once).

I hope the next version will improve your updating speeds!

hth
Manuel
tinyMediaManager - THE media manager of your choice - available for Windows, macOS and Linux
Help us translate tinyMediaManager at Weblate | Translations at 66%
Found a bug or want to submit a feature request? Contact us at GitLab
#14
2.4.3 is out and addresses some problematic code-pieces at the importer.

worked flawlessly here (rescan of 500 movies took less than a minute)
tinyMediaManager - THE media manager of your choice - available for Windows, macOS and Linux
Help us translate tinyMediaManager at Weblate | Translations at 66%
Found a bug or want to submit a feature request? Contact us at GitLab
#15
(2013-10-13, 09:25)mlaggner Wrote: okay - we're getting closer to the problem. mediainfo should definitely not taking this long to scan a text file...

could you try it with a newer version of mediainfo?
32bit download:
http://sourceforge.net/projects/mediainf...z/download

64bit download:
http://sourceforge.net/projects/mediainf...z/download

extract the dll(s) from the download you need and extract it to the corresponding folder in the tmm install install dir under native
e.g.
tmm\native\windows-x64\MediaInfo.dll

do not forget to backup the distributed lib before Wink

Meanwhile I will investigate why these files will be scanned multiple times


Off topic question, can tMM run in 64bit Java?

Logout Mark Read Team Forum Stats Members Help
Updating Movies sources takes *forever*0