Kodi Community Forum

Full Version: Identifying duplicate movies
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3
Hello all,

To be clear, my post is about people who actually have multiple copies of the same content, in different filenames or formats across their filesystem, not people who only have one copy of a file and it's listed multiple times (I've had that problem too)
I'm curious what the best way to find such files would be within XBMC?
Example :
Quote:S:\Movies\Ghostbusters (1984).mkv
and
H:\Movies\Ghostbusters.AVI

Yes, you'll see the movie listed twice in the movie view but this means scrolling through your entire collection (alphabetically). If you've got a large collection, this is not ideal.
Presumably, the best way to identify the movies would be based on what they match to in the database, example both copies above would of course match to http://www.imdb.com/title/tt0087332/

I know this has been discussed a few times, is there a definitive solution to this yet? (I realise it's "the users fault" of course) - but it would be nice if there was an easy fix, a plugin or something.
There's an add-on that does this (although I can't recall the name), or there's the script in my sig - use the "duplicates" option. The script matches movies based on imdbnumber.
I can't seem to find one and I did a search - but I'm not entirely sure how to get yours running - I looked at it once and it looks awfully complex to install?
@AbRASiON

I prefer to use separate tools for this kind of thing.

I've written a powerful library management tool called VideoExplorer, that you can download free here:

http://forum.xbmc.org/showthread.php?tid=181785

That will allow you to rapidly identify duplicates in your collection because you can navigate very fast and group and sort on the main grid.

You've also actually got me thinking about adding an explicit feature to it to find duplicates for you. I will think about that and maybe even add a feature to do that.
@AbRASiON

I'm releasing v3.0.0.31 of VideoExplorer now and I've built in a feature to isolate duplicates.

If you load your entire library into VE now, then right click the main menu, you will find an option on the menu called "Show Duplicates Only".

That will do what you are looking for instantly.

From my release notes:

2. Added new "Show Duplicates Only" option to the right click menu on the main grid.

This will filter the items currently on the grid to show only items that have duplicates.

This uses the Metadata TITLE tag from the nfo file for each movie to determine duplication and it ignores Trailers and any movie with a NULL or Empty string as the Title.
Hi thanks,

So it will identify the duplicate based on the TITLE - I don't use an NFO file - I use a MySQL database for my library (2100 movies) - I take it if "Lord of the Rings: Return of the King" is listed in the database twice

C:\Return of the King,AVI
and
X:\Movies\Lord of the Rings - Return of the King.MPG

it will identify that as a dupe (we're going to assume for the sake of this thread that my movies are actually correctly matched to what they are of course?
Also the logical thing to do is to present the information in a format which includes the filename of the movie somehow - even if it requires pulling up more information? - I mean, I'm not going to be stupid enough to blindly delete based on jsut your program, I'm going to double check Smile but it'd be nice if I didn't have to alt-tab to confirm right?
@AbRASiON

1. My app does not look at the MySQL database at all, it works with the source material.

I recommend that you use the XBMC export option (to separate files) and export nfo files for all your movies to the movie folders, which is good practice anyway, and then do your maintenance using VideoExplorer, outside of XBMC.

XBMC is a window onto your media and a playback system, it is not a video or file maintenance system or a library maintenance system. Which is why apps like VideoExplorer and Ember Media Manager etc. are necessary and so useful.

Always work with nfo files in the folders and artwork on disk if you want full control of your library and if you want the maintenance and tidying up of your media to persist regardless of the playback platform you choose (XBMC in this case).

After you have deleted the duplicates as per the information VideoExplorer provides you with, you can "Clean" your library from within XBMC and XBMC will discard any movies in the library that are no longer present on disk.

2. The primary identifier for any video in VideoExplorer is the full path to the file, including the file name. Right click the headers on the main grid to choose columns and activate grouping.

I'm therefore confused by your comment about "the logical thing to do". VideoExplorer gives you every possible piece of information about your individual video files that you could ever possibly need to make a decision.

The right click menu on the main grid in VideoExplorer also offers an "Open Host Folder" option which will take you directly to the folder containing the actual files if you want, but that should not be necessary because
everything you need to know is already on the main grid.

3. When I say that I use the TITLE tag, I chose to do that because it's the best way to find duplicates in a library where some nfo files may have IMDB information embedded but some may have TheMovieDB information embedded and things like the movie URL and ID may therefore differ for 2 copies of the exact same movie in 2 different folders with 2 different nfo files.

Bottom line: Don't try and manage your library in XBMC when it comes to things like this, where you are managing the physical on-disk media, finding duplicates, deleting items, etc.

It's far better, and easier, to use a 3rd party application dedicated to this.

The other obvious advantage of using VideoExplorer in this case is that it does a full MediaInfo interrogation of every Movie it loads, up front. and shows all that information in one big list on the grid, making it incredibly easy to compare movies.

In the case of duplicates, it will therefore be very easy for you to see, on the grid, the full path names of every movie file found, and all metadata for each movie, including bitrates, file sizes, video formats, audio formats etc., all right there on the grid, one row below the other. Which means comparison of the different video files is never going to get easier than VideoExplorer makes it, or more comprehensive.
Is there one which looks at the XBMC database and or the MySQL database?
Why is having an NFO file for all my movies good practice?

You're definitely correct about maintenance and updating of files to be on the filesystem itself, I agree - otherwise if you lose your DB, all the corrections you made are useless, it will detect incorrectly again.
@AbRASiON

There are really multiple reasons why you don't want to use the XBMC database for this kind of thing, but key issues are:

1. The concept of "Source of Truth". If you are doing file level maintenance and cleanup, you really want to working with the files themselves, because they are the absolute source of truth. You don't want to be working via an abstraction layer like the XBMC database because that database could actually be lying to you about what's on your disks at any given point in time, for various reasons.

Furthermore, if you are doing duplication analysis and trying to isolate duplicates and delete the ones that are poorer quality, you definitely don't want to rely on the XBMC database because it might have old out of date info in it. You NEED to be seeing the 100% correct audio and video stream information as per what's physically on disk in order to make a proper decision about what to keep, and what to delete.

2. Following on from point 1 above, the XBMC database can very easily lie to you. Right now I know for a fact that several movies in my XBMC database have incorrect metadata stored in that database because I've upgraded those movies from 720p version to 1080p version with better quality audio streams, but I haven't had the time to force XBMC to re-import my entire library and unfortunately you can only force refresh 1 movie at a time in XBMC at the moment, you can't refresh in batches. I'm hoping Gotham improves that situation.

3. As you've already stated, if you lose your database, or if you want to use your media with some other app or playback system, you want all your metadata and your custom updates and cleanup to be retained.

NFO files are the accepted standard way of storing media metadata on disk with the media, and many applications and systems will honor NFO files and know how to read and interpret them, particularly the type created by XBMC.

It is therefore an excellent idea to let XBMC import your library and scrape as much info as it can from the internet about your media and then immediately go and use the built in Export funtionality in XBMC (Settings) to Export all that metadata and the artwork into the folder, with the media, so that you have permanent hard copies of everything in the folders.

Once you have that, you don't have to worry about losing the XBMC database, or if you need to do a fresh XBMC install for some reason and start from scratch you can just tell XBMC to use the local metadata and not re-scrape everything from the internet. And, 3rd party apps can use that on disk metadata too, without needing any knowledge of the XBMC database at all.

I personally like to maintain a strict separation of concerns, especially with a big library like yours.

You want to do all your on disk file maintenance and cleanup in a nice 3rd party app built for that. Then you go into XBMC and re-add your source and tell it to scrape the local metadata if it can, and it will import:

a. Much faster.

b. Exactly what you have put in the folders on disk. So you can explicitly control what artwork it uses and what metadata it uses.

You have absolute control that way.

And I like absolute control.

The source of truth is what is on the disk. The XBMC database belongs to XBMC and in my opinion should never be used by anything else to determine anything. An application like VideoExplorer MUST go directly to the source to guarantee you that it's showing you accurate and current information.

On that note, if you want to make absolutely sure that VideoExplorer is showing you 100% dead accurate information on every file it loads, you should untick the "Use Info Cache" tickbox in VE too, that will force VE to re-interrogate every single file and not use it's own cache.

But that's only necessary if you are replacing files with files that have different audio and video properties and are not changing the name and location of the files.
Hi all,

I'm still running into duplicate movies and I'd love a way to list these, EVEN IF IT'S A MYSQL COMMAND (I do use Xbmc / Kodi via MySQL)
I realise that as discussed earlier in this thread, Kodis matching of movies may lead to incorrect results, where it's misunderstood what a movie is and I, infact do NOT have a duplicate.

That being said, with 2,300 movies, it'd be nice to see a list regardless, at least then I know what to search for.
What's the unique identifier for a movie in the Kodi MySQL database (IMDB code? File location on filesystem?) - and if so, is there a MySQL command to identify identical entries?

Anyone?
Here's a command that works and will list all your duplicate movies:

Code:
texturecache.py duplicates

Much easier than a SQL query. I gave you this solution 6 months ago.
Use the Find Duplicates option on the right click menu in VideoExplorer, as explained in detail above.

Get it here: www.code4effect.com

As already explained in detail above, there are numerous reasons you do not want to use the MySQL database for this.

Load the entire lib into VideoExplorer and let it find them for you. It couldn't be easier.

Assuming of course that you are on a Windows PC that can access your library.
(2015-10-22, 03:14)Milhouse Wrote: [ -> ]Here's a command that works and will list all your duplicate movies:

Code:
texturecache.py duplicates

Much easier than a SQL query. I gave you this solution 6 months ago.

The texture cache maint tool is .............. extremely complicated and confusing. :/ Hence me looking for a simpler tool to identify them. (although the texture cache tool is doing it correctly, IMDB ref#)
EDIT: I do appreciate the help offer of course but that tool feels like I'm building a rocket ship, to hammer in a nail.
(2015-10-22, 07:27)SiliconKid Wrote: [ -> ]Use the Find Duplicates option on the right click menu in VideoExplorer, as explained in detail above.

Get it here: www.code4effect.com

As already explained in detail above, there are numerous reasons you do not want to use the MySQL database for this.

Load the entire lib into VideoExplorer and let it find them for you. It couldn't be easier.

Assuming of course that you are on a Windows PC that can access your library.

Video Explorer needs to match all my movies and it too may make mistakes in identifying things.
MySQL query will allow me to find the dupes then manually investigate.
Yes, because a SQL query is so much less complicated or confusing...
Pages: 1 2 3