FanArt and Thumbnails Naming-Standard and File-Structure Convension Rationalisation? - Printable Version
+- XBMC Community Forum (http://forum.xbmc.org)
+-- Forum: Development (/forumdisplay.php?fid=32)
+--- Forum: Feature Suggestions (/forumdisplay.php?fid=9)
+--- Thread: FanArt and Thumbnails Naming-Standard and File-Structure Convension Rationalisation? (/showthread.php?tid=49801)
- ccMatrix - 2009-06-24 07:40
I'm not sure if this has been considered:
On Windows there are some characters which are not allowed in filenames e.g. : and /
I think / is not allowed on Linux either.
So when you have a show like "Star Trek: The Next Generation" or a studio like "The Kennedy/Marshall Company" this makes it impossible to create files for them independent of naming scheme.
Therefore it might be useful to restrict all prefixes for the naming scheme to [a-zA-Z0-9] or maybe even [a-z0-9] since this would eliminate the case-sensitive issue. XBMC would either have a function which does the conversion or separate listitem attributes that are "cleaned". This will also give some issues with non-latin characters like e.g. chinese. In those cases there might be alternatives in the database e.g. localized movie/tvshow names.
- xexe - 2009-06-24 09:38
AnalogKid Wrote:Filenaming Schema
I havent had time to fully study this long thread but I have some concerns over this naming scheme.
Firstly it is built around the premise that <moviename> and <tvshow> etc are unique. Unfortunately they are not. There is absolutely nothing stopping two completly differernt movies or tvshows having the same name.
Scraper sources normally cover this by adding uniquenes to the end of the name with tags such as (TV) (YEAR) (V) etc but we cannot not rely on that. A scraper source would be perfectly within their rights to have many entries with the same name. Also two differernt scraper sources will often call one show or movie completely different names.
This is because the scraper shows use a unique id as their uniqie identifier and not the show or movie name.
To this end I suggest <moviename> be replaced with a scraper id such as:
This is less useful to humans but is far more scalable.
This also brings me to a concern I have had for a long time. There should be no reason to redownload an image from a scraper souce more than once. Currently we have to do this as the on disk name has no reference to the original scraper souce name with the uniqueness being discarded in a rename. meaing without the XBMC databse their is no way to tie back the file to the source.
For this reason i suggest that the original scraper source filename be added to the on disk tag. So what we would have now is something like
notice there is no need for the movie tag anymore as the imdb id is unique. We no longer have problems with filesystem unprintable characters as well. Also notice there needs to be a tag for where the art came from to cater for multi source scraping.
I also considerably dislike the delimiting hack using "^". It will work most of the time but not all the time and as such I suggest it is not suitable. Better would be XML type extrys i.e. <tag>blah</tag>
We would have to be very careful with the length of the strings but this or a variation on this method is scalable. It also does not rely on order of entrys which allows for future logical expansion.
The end goal should be a completely stand alone filesystem that can be decoded with no need to have access to the XBMC database. This will allow for other tools to manage and more importantly use the raw data.
It would also allow for XBMC users to share their image caches directly reducing the burdon on scraping sites even further.
Obviosuly the naming scheme i suggest is no where near ratified but you get the general idea.
- bidossessi - 2009-06-24 09:47
Two major qualms with xexe's proposal:
Remember we're talking about actual filenaming schemes.
1- what do you do about movies that don't exist on the imdb?
In as much as we've come to rely heavily on IMDB for movie info scraping, it's by no means the end-all reference, and we could decide to move to a new schema tomorrow. What good would a whole library with only imdb-specific names be then?
2- how human manageable is that? tt8347362 is meaningless to a non-imdb-aware app, and to any human for that matter.
If for some reason, i want to remove, say, Final "Fantasy VII" from mi movie HDD, well I'd be hard pressed to find it without going though a special application.
Unless you propose XBMC be given the right to touch files (something i believe devs are firmly against, and i would second it).
If for some reason i want to dump XBMC and move to another media-center, I'd be a massive mess.
You have to remember that while this proposal is based around XBMC, it still works with other media centers.
This could be useful in an xml file stored with the movie, but nowhere else.
Now, to deal with the uniqueness factor, i believe this is something that can be easily solved, (and is being solved easily), with the addition of date tags.
Notorious (1965) != Notorious (2009).
Keeping media human-readeable/manageable should always be first. I know it doesn't solve the special characters issue. But, what you're proposing creates far more problems than it solves, imho.
- TerranQ - 2009-06-24 09:52
Agreed. That is one of the things that drove me buggy about MeediOS-all fanart had to use the imdb id. It makes it a nightmare.
- bidossessi - 2009-06-24 10:16
if you'd read this thread through, you would have seen how many times devs have objected to making filenaming xbmc-specific, and their reasons why.
- xexe - 2009-06-24 10:23
Agreed IMDB is not the source of all knowledge but that ID is only an example and could easily be any other scraper source id.
If you are not using library view at all and therefore not using the dbase then moviename etc is essentially foldername which would be unique (so that solves the on disk uniqueness problem). however it does not solve the uniqueness problem as a means of knowing what the image relates to on the internet.
There can and will never be a 100% reliable means of making a descriptive name of a movie, tvshow etc
using your example:
Notorious (1965) != Notorious (2009).
I totally accept that but even on IMDB it is also called
Notorious B.I.G. Germany
Untitled Notorious B.I.G. Project USA (working title)
Then we get to the other 12 odd scraper sources + multilanguage + other sources being added in the future.
For this cache to be of any use to other applications it needs to be defined in such a way as there is a 100% correclation from the image name to what the tvshow/movie actually is. Since the name is different all over the internet and all over the world there within lies the problem that needs solved.
This naming would not be XBMC specific however it does need other tools to know how to lookup things. I totally accept that.
Journey to the Center of the Earth (2008)
is also known as:
Viaje al centro de la tierra Argentina / Peru / Venezuela
Resan till jordens medelpunkt 3D Finland (Swedish title) / Sweden
Путешествие к центру Земли Russia
Center of the Earth Japan (English title)
Dünyanin merkezine yolculuk Turkey (Turkish title)
Die Reise zum Mittelpunkt der Erde Germany
Journey to the Center of the Earth 3D USA (3-D version)
Journey to the Centre of the Earth Australia
Matka maan keskipisteeseen 3D Finland
Matka maan uumeniin Finland (pre-release title)
Putovanje u srediste Zemlje Croatia
Reis maailma südamesse 3D Estonia
Rejsen til jordens indre Denmark
Sentâ obu ji âsu Japan
Taxidi sto kentro tis Gis Greece
Viagem ao Centro da Terra Portugal
Viagem ao Centro da Terra - O Filme Brazil
Viaggio al centro della terra Italy
Voyage au centre de la terre Canada (French title)
Voyage au centre de la terre 3D France
But we also have:
Journey to the Center of the Earth (2008) (V)
also known as:
Journey to Middle Earth UK (DVD title)
Voyage au centre de la terre France (TV title)
which is a completely different movie
and then we also have
Journey to the Center of the Earth (2008) (TV)
which is yet again a completely different movie.
On top of that these movies are named slightly different on many internet sites and a user would be perfectly within their rights to use and know one of these alternative titles.
You soon end up in a right old mess using the name as a unique identifier.
- bidossessi - 2009-06-24 15:13
OK. from what i gather, your real worry is the false positives gotten on a large collection scan. I've also had that problem. But what percentage of false positive do you really get?
I've got 40 movies on my laptop right now, since i use it as a test bed for XBMC before i finally put up my HTPC. out of those 40 i've had 2 false positives (including Notorious ). Now if the date had been used on that specific one, i would have gotten the right movie.
Long filenames are an issue, granted. adding more chars to a lengthy filename will break detection on some OSes, granted as well. But at least i can use the filename to correct the false positive.
Following your reasoning, unless you've got a mental IMDB database mapper, You will 1. scrape your movies using a third party app to get their imdb id, 2. check for false positives, 3. rename them all using their imdb id, 4. scan them into XBMC or another media center (assuming your other media center can read nfo files).
By the end of step 2. the hard work is done. why bother with step 3. at all?
Using conventional names (strings) is as generic as you can get. it's locale-aware and it's portable. imdb (or any other website for that matter)'s indexing conventions will be specific to that site alone: not portable.
If you've got an algorythm hidden up your sleeve, now the time to pull it out.
- xexe - 2009-06-24 16:00
If it is a trade of between human readable and 100% accuracy i know which one I would choose.
Of the top of my head i showed an example that would cause all sorts of grief, you yourself needed to add year to 5% of you collection to make it match. Multiply that by a planet full of XBMC users speaking dozens of languages and using a myriad of data source and you have an unsupportable situation. I dont like that that is the case but unfortunately it is.
To my eye the end filename should be done in such a way as it can be 100% accurately (and I mean 100% not 99.9999%) attributed to the movie/tvshow it represents and also be tied back to the actual filename its was scraped at from source. This should be possible in isolation of all other things other than access to the scraper sources.
I should be able to hand my entire image cache to any other XBMC user and they should be able to use them.
As for steps one through 4 you site they would all be handled by XBMC normal adding to library addiditions. If you dont want to use XBMC library then the helper apps would have to accommodate but thats their problem to deal with. XBMC first, everyone else can follow the leader.
- AnalogKid - 2009-06-24 16:22
xexe Wrote:I havent had time to fully study this long thread but I have some concerns over this naming scheme.Absolutely isn't... this is totally covered by the namespace protection detailed in my scheme...i.e. "movie" "tvhow" designation.
Quote:Scraper sources normally cover this by adding uniquenes to the end of the name with tags such as (TV) (YEAR) (V) etc but we cannot not rely on that.My schema works the same way. This is how folks have to name their movies today in order for any scraper to work. No scrapers today are reading meta info, only filename info.
Quote: A scraper source would be perfectly within their rights to have many entries with the same name. Also two differernt scraper sources will often call one show or movie completely different names.Not a problem, if one movie goes by multiple names, the file needs to only match any one of those names.. exactly as it works today.
My schema works EXACTLY the same as today's solution, BUT adds extra detail about what the art is FOR. Resolving the media name / scraping is not affected in any way.
Quote:This is because the scraper shows use a unique id as their uniqie identifier and not the show or movie name.I believe this to be flawed for the following reasons.
1) tt0844708 is impossible to resolve without an internet connection. There is no way for a media center to know what this belongs to without going to the internet first
2) it is not universal across all scraping sources
3) it's harder to visually read if the file isn't stored side by side with the media
Quote:This is less useful to humans but is far more scalable.Have to totally refute this. It's a string, it's no more or less scalable.
I think what you're trying to say is... "ttxxxxxx" is easier to 'lookup'. However, the whole point of the art is NOT to scrape it. It's already been scraped. This scheme is to let XBMC rapidly know which art to find related to a given movie. Since the movie will NOT be named "ttxxxxx.avi", XBMC now has to maintain an extra table to figure Starwars = ttxxxxxx.
Quote:This also brings me to a concern I have had for a long time. There should be no reason to redownload an image from a scraper souce more than once. Currently we have to do this as the on disk name has no reference to the original scraper souce name with the uniqueness being discarded in a rename. meaing without the XBMC databse their is no way to tie back the file to the source.Erm, this is the whole point of my schema, see above. This is for already downloaded artwork. Not for scraper look ups. Once a cover is downloaded, it's named according to my schema and need never change (unless the user wishes to change the art).
Quote:For this reason i suggest that the original scraper source filename be added to the on disk tag. So what we would have now is something like
This isn't scraper source agnostic. The filenames cannot be predicted since every scraper source can use it's own scheme, and some source might not even have a scheme (some sources offer 'image.jpg') for every search you do.
Quote:notice there is no need for the movie tag anymore as the imdb id is unique. We no longer have problems with filesystem unprintable characters as well. Also notice there needs to be a tag for where the art came from to cater for multi source scraping.The movie tag was there to resolve the namespacing issue you've raised. IMDB is movie only, not music, or tv etc.
We need a scheme that is totally agnostic across all media types.
Quote:I also considerably dislike the delimiting hack using "^". It will work most of the time but not all the time and as such I suggest it is not suitable. Better would be XML type extrys i.e. <tag>blah</tag>I'm not 100% ok with it either, but it can work all of the time with some escapes... but you're forgetting that the actual media itself has to have a name... as long as the art matches the media, everything is fine. So weird international characters are not affected... the true media name is in in the NFO.
Putting all this in an XML file was discussed, and has merits... but also has some efficiency issues and usability issues.
The number of support requests is already a nightmare for fanart... it's eveb worse with existing NFO hacks for mediaflags (now resolved by abandoning that idea). I think the same issue would arise using XML for artwork.
Architecturally, the XML is a nicer bet I'd say, but in practice is messier.
Quote:We would have to be very careful with the length of the strings but this or a variation on this method is scalable. It also does not rely on order of entrys which allows for future logical expansion.Agreed
Quote:The end goal should be a completely stand alone filesystem that can be decoded with no need to have access to the XBMC database. This will allow for other tools to manage and more importantly use the raw data.I agree hence the schema in the first place
Quote:It would also allow for XBMC users to share their image caches directly reducing the burdon on scraping sites even further.
Basically we agree on everything, except you'd prefer the database index (IMDB) and I'd prefer the name that matches the actual media file.
My way is more optimal, since it doesn't require an extra IMDB to moviename crossreference and it's portable across all media types and scrapers.
Your way reduces the filename size quite a lot.
We're not a millions miles apart really, that's a good thing!
- AnalogKid - 2009-06-24 16:36
xexe Wrote:If it is a trade of between human readable and 100% accuracy i know which one I would choose
100% accuracy is guaranteed. An exact 1 to 1 match with the physical media. Portable across ALL media types (including future ones not supported yet).
The issue with matching two movies of the same name already exists and is resolved. The filenaming scheme neither improves nor worsens it. It remains as is. The IMDB numbering solution neither improves nor worsens it either.
The Wicker Man.avi is impossible to resolve without human intervention (since there are two movies of that name). Any scraper HAS to ask the user which version to get art for.
But this issue isn't about scraping... it's about matching a given (already downloaded art) to a media file... in this instance TT203405 is impossible to match with The Wicker Man.avi without an internet lookup, or a locally stored cross reference. Plus, it's flawed if no IMDB exists.
In addition, the TTxxxx scheme places demands on the scraper source to name files this way. Many sources do not do so.