TMDb to introduce movie hashing search
#16
phash.org looks like a more reasonable approach.
Reply
#17
phash.org seems really nice, is there somebody who tried it in real world? I dont know about implementation, but it should work.

for "opensubtitles hash", I suggest you to store IP address in database as hash (MD5(MD5(IP)) - not possible to restore IP address), so you will get rid off of duplicate posts from same user.
Reply
#18
As a note, TheTVDB is planning on doing this during our rewrite as well. A few points to make about it:

1. It's completely optional, meaning that projects or individuals that don't want to make use of it don't have to.
2. It's 100% accurate. The only other method that can claim this is the NFO one, which requires that users hang on to the NFO files for their downloaded files.
3. Not all downloaded files are going to be copyright infringements. Shows downloaded from iTunes and similar will have the same hash and are completely legitimate.
4. It's fast and automatic. One of our admins ran a test with his users (he has an MCE plugin) and found the algorithm was able to process over 10 files per second (IIRC). It requires no user intervention, which is important.
5. It can't lead to copyright lawsuits. Remember, in almost every country (including the US) it's not illegal to possess a "scene" version of a show or movie. It's why people downloading from Usenet feel safe... unlike torrents, they're not uploading part of the file during their download. So, the information that someone has 500 scene movies doesn't really matter unless they're sharing them with others.

We'll also be implementing better cross-reference lookups, but those still depend on an NFO file. Until one of the sites is up and running using hashes and at least one of the big projects implements it, there's no way to tell how much it's being used. However, I will say that it solves a number of sticky issues on our end internally and should result in better overall results.
Reply
#19
szsori - my words. I started like this, when I asked Gabest - coder of MPC player, if he can give me his database (of subtitles, I am not sure, if he got there IMDB number, I must make a script for that...). He says yes, there was around 3000 hashes. I check his algo, it is far from the best, but it is pretty stable for now.

What I suggest is, to make another hashing method, much better, than this one, but using "old one" crc64 (which I use, and now you, and later somebody else...) will be default, and if some client implement "new" hashing, those hashes will be sent together. Later, can be switched only to new hashing method. This is for long discussion.

Anyway, important thing is to share these hashes between sites, so we can make much better services.
Reply
#20
We've started adding this in to UMM as part of the searching options.

For subtitles it's the default method used (with opensubtitles.org) and seems to be very accurate

It'll be added into the tmdb options and later tvdb options. The tvdb options, when available, will be a great way to automate moving episodes into the correct folder for the show as the file names have way too many naming conventions (as well as lack of folks following them) .. one note on those, please do return the proper show name (and/or id) and season and episode numbers with those results).

- fekker
Reply
#21
fekker Wrote:one note on those, please do return the proper show name (and/or id) and season and episode numbers with those results).

Sure thing. Smile
Reply

Logout Mark Read Team Forum Stats Members Help
TMDb to introduce movie hashing search0